dw_dev | AI and Dreamwidth

We've seen some questions lately about AI and how it relates to Dreamwidth, especially around scraping and training. Rather than answer piecemeal, I wanted to talk through how denise and I are thinking about this and try to be explicit about some things.

Dreamwidth is a user-supported service. We don't build the service around monetizing user data, and that informs how we approach AI just like it informs everything else we do.

Your content and AI training

Dreamwidth does not and will not sell, license, or otherwise provide user content for AI training. We have not and will not enter into data-access agreements for AI training purposes.

We will continue taking reasonable technical steps to discourage large-scale automated scraping, including known AI crawlers, where it is practical to do so. No public website can prevent scraping with absolute certainty, but we will keep doing what we reasonably can on our side.

AI features on Dreamwidth

Dreamwidth will not introduce AI features (and we have no current intention of doing so) that use or process user content without a public discussion with the community first.

We're only phrasing it like this because we can't predict the future and who knows what will be possible and available in five or ten years, but right now there's nothing we can see wanting to add.

If that ever changed, the conversation would happen openly before any decisions were made.

Site admin uses of AI

Keeping Dreamwidth usable means dealing with things like spam and abuse, and that sometimes requires automated admin tools to be more efficient or effective.

We are not currently using AI-driven systems for moderation or similar decisions.

If we ever decide that an AI-based tool would help address a site admin problem like spam, we will explain what we are doing and how it works (and ask for feedback!) before putting it into use. Any such tools would exist only to make it easier and more efficient for us to do the work of running the site.

AI and code contributions

Dreamwidth is an open-source project, and contributors use a variety of tools and workflows.

Contributors may choose whether or not to use AI-assisted tools when writing or reviewing code. Dreamwidth will not require contributors to use AI tools, and we will not reject contributions solely because AI-assisted tools were used.

For developers: if you use any AI-assisted development tools for generating a pull request or code contribution, we expect you to thoroughly and carefully review the output of those tools before including them in a pull request. We would ask the community not to submit pull requests from automated agents with no human intervention in the submission process.

I think it's important and I want to be able to review, understand, and maintain any contributions effectively, and that means humans are involved and making sure we're writing code for humans to work with, even if AI was involved.

Important note: this applies to code only. We expect any submitted images or artwork (such as for styles, mood themes, or anything else) to be the work of a human artist.

And to be very explicit, any AI-assisted development does not involve access to Dreamwidth posts or personal content.

In short summary

Dreamwidth does not and will not provide user content for AI training
Dreamwidth have not and will not enter data-sharing agreements for AI training and we will do what we can to prevent/discourage automated scraping by AI companies
Dreamwidth will not introduce AI features without a public discussion first
Any site admin use of AI tools will be explained openly and part of a public conversation
Contributors can choose their own development tools for code, but we do not accept images or artwork generated by AI

Oh, and we'll probably mention this (or a subset of this that isn't code related) in an upcoming dw_news post, but will defer to denise on that!

Flat | Top-Level Comments Only

Yes, I should have a reason for a news post sometime in the next few weeks that I can add a link to this on to! Probably.

Thanks for this!

Thank you and

denise for the quick and detailed response on this. The transparency and user-centric approach is greatly appreciated!

Thank you for the very clear information here!

Regarding "doing all you can to prevent/discourage automated scraping" - there has been various advice going around over the past several years over ways individual users can prevent this - locking their journal, turning off indexing, editing their journal style, turning on adult content warnings, etc. Can you give any information on whether any of those are worth doing above and beyond what you as a site are doing to prevent AI scraping? I'd love to feel more like I knew which of the advice going around was in any way accurate, and some of those have major downsides too.

(On many websites, people are increasingly recommending "lock to logged-in users only" to prevent scraping - does DW have any plans to introduce that level of lock? Would it actually help with keeping out AI scrapers here?)

I do like the idea of supporting a logged-in only view mode, but we have not currently built that. I'll see if we can do that / talk about it with the team!

In general, if the content is publicly visible, then it will be possible for AI crawlers to get them if they ignore things we put in place. For example, we can try to block on the user agent and such -- but if the content is public, it's all advisory.

I imagine that Anthropic respects these things... but there's other, less scrupulous entities.

Yeah, and a lot of the less ethical scrapers just make user accounts, so now you have to try to detect that, sigh. The escalation spiral is real :/

Thanks for the quick answer, I know you've been very busy! I generally tell people the only way to make sure AI won't get something is to never record it or write it down. I'm not sure if a logged-in-only mode would actually help much with scrapers (and it sounds like you and Denise aren't either) but it does increasingly feel like something DW is missing from its otherwise best-on-the-net locking options and I keep having to stop myself from suggesting it to people who want to prevent scrapers without locking entirely...

It's a sliding scale, and it depends on how creepy and invasive vs well-behaved a scraper is, and capabilities change all the time, so it's so hard to give you a conclusive answer that doesn't need a billion caveats and clauses because I don't want people to rely on "likely" and "probably" as definite, you know?

The only solid, definite answer I can give you is that a locked post will only be readable by accounts on your access list (or accounts on the filter, if you filter it). But even with that, I can think of at least one way that content could theoretically make it into an unscrupulous actor's training dataset without any malicious intent on the part of anyone on your access list. (Compromise, purchase, or infiltrate a major browser extension that's widely installed and requires text-of-page access permissions to work, send all data the browser extension sees back to the unscrupulous actor. It would probably be detected relatively quickly! Probably. Okay, maybe.)

It is a very very hard technical problem to prevent scraping without causing barriers for genuine people, basically, sigh. The adult-content, journal style change, turning off indexing, etc, will maybe slow down the incompetent ones and the ones that are smart enough to go "maybe if we keep trying to circumvent all these attempts to stop us, people will be even more angry, so we shouldn't do that" (so, almost none of them, lolsob): all of those will stop, like, about 5% of the bottom tier. The backend stuff we do to block scrapers likely catches, IMO, a pretty big range of them. But we can't ever guarantee 100%, because some scrapers are absolutely shameless at trying to conceal themselves. Locking your posts is probably the closest you can get to a guarantee, but even that's not completely preventative. It's a very difficult and depressing thing, sigh.

Thank you! This is very helpful (and a good place I can send people who are floundering.)

As always, thank you!

Thank you for this!

Thanks!

Thank you!

❤️

I appreciate your transparency, and this seems like a sensible position to take.

This makes sense to me; thank you for this. Personally as a developer, I like a policy that developers who use AI tools disclose how much, simply because I code review differently when I know it's an AI tool (because the truthiness/verisimilitude problem is not just one in human-aimed prose), but I think that's just me. And while it's implicit in everything you say as about code contributions, I like an explicit statement that a human must be responsible for submitting every line of code, whether they wrote it or not -- but you basically say exactly that, just not in so many words.

❤️💖💚💛

It is good to have an official statement on this issue, so thanks!

Appreciate all of this!

Sounds good, and very reasonable. It will be an island of humanity, I'm afraid.
Thank you!

I really appreciate this post!

Thank you for being clear and open about this!

Much appreciated.

Thank you for comment and for always being so transparent and open with us. It's one of the many reasons I appreciate Drewmwidth so much as a service.

AI and Dreamwidth

Your content and AI training

AI features on Dreamwidth

Site admin uses of AI

AI and code contributions

In short summary

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject