It's a sliding scale, and it depends on how creepy and invasive vs well-behaved a scraper is, and capabilities change all the time, so it's so hard to give you a conclusive answer that doesn't need a billion caveats and clauses because I don't want people to rely on "likely" and "probably" as definite, you know?
The only solid, definite answer I can give you is that a locked post will only be readable by accounts on your access list (or accounts on the filter, if you filter it). But even with that, I can think of at least one way that content could theoretically make it into an unscrupulous actor's training dataset without any malicious intent on the part of anyone on your access list. (Compromise, purchase, or infiltrate a major browser extension that's widely installed and requires text-of-page access permissions to work, send all data the browser extension sees back to the unscrupulous actor. It would probably be detected relatively quickly! Probably. Okay, maybe.)
It is a very very hard technical problem to prevent scraping without causing barriers for genuine people, basically, sigh. The adult-content, journal style change, turning off indexing, etc, will maybe slow down the incompetent ones and the ones that are smart enough to go "maybe if we keep trying to circumvent all these attempts to stop us, people will be even more angry, so we shouldn't do that" (so, almost none of them, lolsob): all of those will stop, like, about 5% of the bottom tier. The backend stuff we do to block scrapers likely catches, IMO, a pretty big range of them. But we can't ever guarantee 100%, because some scrapers are absolutely shameless at trying to conceal themselves. Locking your posts is probably the closest you can get to a guarantee, but even that's not completely preventative. It's a very difficult and depressing thing, sigh.
no subject
It's a sliding scale, and it depends on how creepy and invasive vs well-behaved a scraper is, and capabilities change all the time, so it's so hard to give you a conclusive answer that doesn't need a billion caveats and clauses because I don't want people to rely on "likely" and "probably" as definite, you know?
The only solid, definite answer I can give you is that a locked post will only be readable by accounts on your access list (or accounts on the filter, if you filter it). But even with that, I can think of at least one way that content could theoretically make it into an unscrupulous actor's training dataset without any malicious intent on the part of anyone on your access list. (Compromise, purchase, or infiltrate a major browser extension that's widely installed and requires text-of-page access permissions to work, send all data the browser extension sees back to the unscrupulous actor. It would probably be detected relatively quickly! Probably. Okay, maybe.)
It is a very very hard technical problem to prevent scraping without causing barriers for genuine people, basically, sigh. The adult-content, journal style change, turning off indexing, etc, will maybe slow down the incompetent ones and the ones that are smart enough to go "maybe if we keep trying to circumvent all these attempts to stop us, people will be even more angry, so we shouldn't do that" (so, almost none of them, lolsob): all of those will stop, like, about 5% of the bottom tier. The backend stuff we do to block scrapers likely catches, IMO, a pretty big range of them. But we can't ever guarantee 100%, because some scrapers are absolutely shameless at trying to conceal themselves. Locking your posts is probably the closest you can get to a guarantee, but even that's not completely preventative. It's a very difficult and depressing thing, sigh.