AI and Dreamwidth
We've seen some questions lately about AI and how it relates to Dreamwidth, especially around scraping and training. Rather than answer piecemeal, I wanted to talk through how
denise and I are thinking about this and try to be explicit about some things.
Dreamwidth is a user-supported service. We don't build the service around monetizing user data, and that informs how we approach AI just like it informs everything else we do.
Your content and AI training
Dreamwidth does not and will not sell, license, or otherwise provide user content for AI training. We have not and will not enter into data-access agreements for AI training purposes.
We will continue taking reasonable technical steps to discourage large-scale automated scraping, including known AI crawlers, where it is practical to do so. No public website can prevent scraping with absolute certainty, but we will keep doing what we reasonably can on our side.
AI features on Dreamwidth
Dreamwidth will not introduce AI features (and we have no current intention of doing so) that use or process user content without a public discussion with the community first.
We're only phrasing it like this because we can't predict the future and who knows what will be possible and available in five or ten years, but right now there's nothing we can see wanting to add.
If that ever changed, the conversation would happen openly before any decisions were made.
Site admin uses of AI
Keeping Dreamwidth usable means dealing with things like spam and abuse, and that sometimes requires automated admin tools to be more efficient or effective.
We are not currently using AI-driven systems for moderation or similar decisions.
If we ever decide that an AI-based tool would help address a site admin problem like spam, we will explain what we are doing and how it works (and ask for feedback!) before putting it into use. Any such tools would exist only to make it easier and more efficient for us to do the work of running the site.
AI and code contributions
Dreamwidth is an open-source project, and contributors use a variety of tools and workflows.
Contributors may choose whether or not to use AI-assisted tools when writing or reviewing code. Dreamwidth will not require contributors to use AI tools, and we will not reject contributions solely because AI-assisted tools were used.
For developers: if you use any AI-assisted development tools for generating a pull request or code contribution, we expect you to thoroughly and carefully review the output of those tools before including them in a pull request. We would ask the community not to submit pull requests from automated agents with no human intervention in the submission process.
I think it's important and I want to be able to review, understand, and maintain any contributions effectively, and that means humans are involved and making sure we're writing code for humans to work with, even if AI was involved.
Important note: this applies to code only. We expect any submitted images or artwork (such as for styles, mood themes, or anything else) to be the work of a human artist.
And to be very explicit, any AI-assisted development does not involve access to Dreamwidth posts or personal content.
In short summary
- Dreamwidth does not and will not provide user content for AI training
- Dreamwidth have not and will not enter data-sharing agreements for AI training and we will do what we can to prevent/discourage automated scraping by AI companies
- Dreamwidth will not introduce AI features without a public discussion first
- Any site admin use of AI tools will be explained openly and part of a public conversation
- Contributors can choose their own development tools for code, but we do not accept images or artwork generated by AI
Oh, and we'll probably mention this (or a subset of this that isn't code related) in an upcoming
dw_news post, but will defer to
denise on that!

no subject
Yes, I should have a reason for a news post sometime in the next few weeks that I can add a link to this on to! Probably.
no subject
no subject
no subject
Regarding "doing all you can to prevent/discourage automated scraping" - there has been various advice going around over the past several years over ways individual users can prevent this - locking their journal, turning off indexing, editing their journal style, turning on adult content warnings, etc. Can you give any information on whether any of those are worth doing above and beyond what you as a site are doing to prevent AI scraping? I'd love to feel more like I knew which of the advice going around was in any way accurate, and some of those have major downsides too.
(On many websites, people are increasingly recommending "lock to logged-in users only" to prevent scraping - does DW have any plans to introduce that level of lock? Would it actually help with keeping out AI scrapers here?)
no subject
I do like the idea of supporting a logged-in only view mode, but we have not currently built that. I'll see if we can do that / talk about it with the team!
In general, if the content is publicly visible, then it will be possible for AI crawlers to get them if they ignore things we put in place. For example, we can try to block on the user agent and such -- but if the content is public, it's all advisory.
I imagine that Anthropic respects these things... but there's other, less scrupulous entities.
no subject
Yeah, and a lot of the less ethical scrapers just make user accounts, so now you have to try to detect that, sigh. The escalation spiral is real :/
no subject
It's a sliding scale, and it depends on how creepy and invasive vs well-behaved a scraper is, and capabilities change all the time, so it's so hard to give you a conclusive answer that doesn't need a billion caveats and clauses because I don't want people to rely on "likely" and "probably" as definite, you know?
The only solid, definite answer I can give you is that a locked post will only be readable by accounts on your access list (or accounts on the filter, if you filter it). But even with that, I can think of at least one way that content could theoretically make it into an unscrupulous actor's training dataset without any malicious intent on the part of anyone on your access list. (Compromise, purchase, or infiltrate a major browser extension that's widely installed and requires text-of-page access permissions to work, send all data the browser extension sees back to the unscrupulous actor. It would probably be detected relatively quickly! Probably. Okay, maybe.)
It is a very very hard technical problem to prevent scraping without causing barriers for genuine people, basically, sigh. The adult-content, journal style change, turning off indexing, etc, will maybe slow down the incompetent ones and the ones that are smart enough to go "maybe if we keep trying to circumvent all these attempts to stop us, people will be even more angry, so we shouldn't do that" (so, almost none of them, lolsob): all of those will stop, like, about 5% of the bottom tier. The backend stuff we do to block scrapers likely catches, IMO, a pretty big range of them. But we can't ever guarantee 100%, because some scrapers are absolutely shameless at trying to conceal themselves. Locking your posts is probably the closest you can get to a guarantee, but even that's not completely preventative. It's a very difficult and depressing thing, sigh.