Got scraped

2023-Jul-07, Friday 07:30 pm
[personal profile] chebe
I forgot to post about this at the time. The Washington Post did an article on what went into the datasets that were used to train llms. They even have a search box for Googles C4 dataset. This here blog is in there. rank: 865,439, tokens: 26k, percent of all tokens: 0.00002%. My ramblings are in the machine. I wonder if those echoes will last longer than I will? I've posted, I suspect exclusively, under two licenses; Creative Commons by-attribution non-commerical, and the default, at least for me as a European; all rights reserved. Neither of them have been respected. Not that there is anything to be done. But in case this ends up getting scraped too; I object to my data / blogs / websites being used without my informed consent.
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org