TL;DR: We are making some changes to how we do authentication (how you log in) that will unfortunately break a number of older clients that you might be using to talk to Dreamwidth. This is very unfortunate, but we think that the tradeoffs in improved security are very much worth it. This post talks about where we were, what we're changing, and the path forward for clients/APIs.
Historical Context
Hi all -- the times, they are a changin'.
Dreamwidth is, of course, based on LiveJournal, which was started in 1999. The codebase is old enough to buy a beer in the United States, and has been able to in much of the rest of the world for years. I'm sure there are people reading this post who were born after Brad started slinging lines of bad Perl1 back in the halcyon days2 of his University of Washington dorm room.
Anyway. We start our story back in ancient times.
Things looked very different back then. Server side CPU was constrained, hardware encryption was not a thing yet, browsers were all over the map, TLS was the new kid on the block, there was no Cloudflare/AWS/GCP (cloud computing wasn't a thing), etc.
Back at the beginning, the way you accessed LiveJournal was over a good old fashioned unencrypted HTTP connection and everything that went on between you and your long deceased3 LJ was completely visible to anybody who had the means or desire to sniff your traffic. This was obviously not great.
One of the biggest concerns we had back in those days wasn't that people would read your journal, but that they would be able to get access to log in as you. I'm not actually sure who built the original version, but it was very early on, LiveJournal added client side password hashing using MD5.
Now, the advantages of this system were that basically, your password never left your browser. When you tried to log in, the server would give you some personalized bit of information (a challenge) and then your browser would, using JavaScript, construct a response that was crafted based on cool mathematics (the one-way hash MD5).
This change let us guarantee that your password was never sent in the clear. This approach also allowed us to protect against so-called man-in-the-middle attacks and replay attacks by carefully crafting the challenge. The details of that are outside the scope of this blog post, but it was the right way to do things back in the early 2000s.
Anyway, that was good enough for the web browser, but LiveJournal also had a custom set of API interfaces (what we called "protocols") that enabled people to write non-browser clients. Originally we built something called the flat protocol and later came the XML-RPC protocol.
We used the same authentication plans here as we did in the web browser:
These two systems have the same pros and cons of the web browser authentication schemes. Sending a password in the clear over an unencrypted HTTP connection is all sorts of bad, so we steered people towards using the challenge-response authentication scheme. At this point, I think most clients do this and don't even provide options for using the plain version.
A Digression, or, Why Not Challenge-Response?
Now I want to take a small digression into the main downside of the challenge-response authentication scheme:
It requires us to store your password.
Since most of us (including me!) aren't cryptographers, let me elaborate4 on this. Challenge-response authentication basically works by doing some fancy math with black boxes. The main thing to understand here is that we will use a "black box" (technical term is: a cryptographic hash function) and this box takes an input number and returns an output number.
So for example, if we gave the box a 7 it might tell us 12. And if we gave it an 8 it might respond with a 4. You could go on forever and the responses it would give you are never predictable and never duplicated. These two properties are very important.
To put this another way, if I told you that the number I got from the black box was 351, it's important that you not be able to guess what the input was that resulted in that 351. While this is obviously easy to guess in my example with small numbers here, but in the real world these black box (hashing functions) will give you a very large random number. In fact for this one particular function (bcrypt, with a digest size of 2184), you'll get a number in the range:
2,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2.45e+55)
Okay so that's big. I'm going to use some simplified numbers here to make the example palatable but let's pretend that your password is the number 10. Logging in would look like this:
Server: Hey, your challenge is the number 5.
challenge = 5 (random number created by the server)
Client: Ok. Let me put my password (10) in the black box... it says 13. I'm going to now add that to the challenge (13 + 5 = 18) and put that in the black box... my response is now 25.
response = black_box( challenge + black_box( password ) )
Server: The challenge was 5, let me put their password in the black box... 5 (challenge) + 13 = 18, now to black box that... 25! Yup, let them in.
test response = black_box( challenge + black_box( password from database ) )
see if response from client is the same as test response
The meat of this algorithm is that the client is proving that they know some secret (your password) by doing some math on it and showing us the result. We do the same math on our side and we can check that the results match. The security is added by us giving them some controlled random inputs (the challenge), which forces the client to do "fresh math" (they can't reuse a previous answer).
This is a super simplification of the process but hopefully it describes it well enough. The main point I'm trying to make is that this entire algorithm requires that we know your password. Which sucks. I don't want to know your password. The reasons for this are many -- but one of the big ones is that if someone were to steal our database, I don't want them to suddenly have millions of email addresses and passwords that they can use to log in to your bank or something more important than your journal!
Digression: We do not believe, and have no evidence of, our database ever being leaked or accessed other than by the 3 staff members who maintain Dreamwidth's infrastructure. We are making these changes not because of some extrinsic motivation but because we believe that they're the right thing to do.
Anyway, let's move on from authentication for now. There is also the fact that we were using unencrypted HTTP. This was fine for your journal (well, "fine"), but was less good for things like entering your credit card and things. At some point in the past, LiveJournal did actually build in support for using HTTPS and encrypting your connection, but it was only used for pages like the shop and other areas that were considered sensitive.
LJ did move logins to use SSL at some point too, but of course the unencrypted version was still available if for some reason you couldn't use the encrypted version. This remained the case for many years and the propensity to use unencrypted connections was substantially unchanged for 15+ years, but we'll get into that soon.
Encrypting the Tubes
Let's fast forward to the late teens (the twenty-teens)...
CPUs are much, much faster, and more importantly, services like Cloudflare have built a business on encrypting the world's traffic. Literally everybody wants you to use encryption and they make it very easy (and, important to Dreamwidth, very cheap!) to do so.
Over the past few years, we here at Dreamwidth embarked on a project to make HTTPS the default and, more than that, to require it. It's not even possible anymore to browse Dreamwidth unencrypted -- we detect and redirect anybody who tries. This is the way it should be, security should be the default! Since it is technically and economically feasible, there's zero reason for us not to do this for all of our users, wherever you are and whether or not you pay.
So we did that. That's great, this definitely improves the security of the system and the risk of our users getting their data exposed, but fundamentally we still have the same problem we had above re: knowing your password. Those are still stored in the database and, if the worst were to happen, we don't feel comfortable having that data around when we really don't need to.
(But as above, we have no evidence of any compromise of our database having ever happened. We're making these changes proactively though, to ensure that if it ever does, we can try to ensure that passwords are not easily compromised.)
How Not to Store Passwords
Before we get into what we're doing, let's talk about how you can provide a secure service without storing somebody's password. Again, I'm going to go back to that example I was using above, except this time we're going to leverage those black boxes again but slightly differently.
Assuming that we're using modern, strong connection encryption for everything, we no longer need our authentication layer to protect against man-in-the-middle and replay attacks. This is a pretty great trade since it means we can simplify the approach taken in our authentication system quite substantially... and we get to go back to sending passwords plain5!
But more importantly than how we send the password, the fact that we no longer need to provide protection against those two attacks means that we can store the end result of the black boxes in the database instead of your password. Let me break that down... let's talk about creating an account and setting your password for the first time.
Client: Ok, I'd like to make a new account, my password is bob
.
Server: Great! Welcome to Dreamwidth, let me just do some math here...
salt = make a very big random number
database_password = black_box( password + salt )
The server then stores that database password! It never writes down the user's actual password. (Remember that these black boxes are one-way, which means that you can't6 figure out the user's password from the database password.)
Next time, when the user goes to log in, this is what it looks like:
Client: Ok, let me log in, my password is bob
.
Server: Great, welcome back! Let me see...
test_password = black_box( client password + salt from database)
compare test_password with database_password
Notice that we basically re-did the math we did in the create account step, but we used the salt7 (stored only on the server) and the password (that the client gave us). Then we checked the result of our math with the result we stored in the database.
There's a lot of technical depth to this area, but the TL;DR is that by using an appropriate black box here, we guarantee that even if our database gets stolen, it will be very, very hard for the attackers to figure out anybody's password. It's still doable -- but the attacker has to guess every possible password, for every single user, which is... very hard.
But the astute reader will point out that even this scheme is only as resilient as your password, so pick strong passwords! If your password is actually bob
an attacker will be likely to find it pretty quickly.
But, What About Challenge-Response?
Now we get to the hard part.
Since the server no longer knows your password, we can no longer implement challenge-response. We have no way of verifying that the response you're sending us is actually correct, so this improvement in password storage will break existing clients.
This sucks, but there are paths forward.
If you are a client maintainer: you can immediately switch to using plaintext authentication over HTTPS. This is secure and, by sending the password, we can validate you just the same as we validate any browser login. This is the short term path and you should do this.
However, the better path is to start using our new API. This post won't go into too many details, but we've been working on building out an OpenAPIv3 compatible API. Check out the API spec! We don't have all of the methods you will need to build a full client yet, but we're working on it.
What's Next
This week, we are rolling out strong password storage (see pull request #2621) which will have the impact of breaking clients that are using challenge-response authentication. If you have the ability, we recommend you switch over to using plain authentication -- using HTTPS, of course!
Coming soon, we are are also building the ability to optionally use two-factor authentication (2FA) (see pull request #2624) to further increase the ability to protect your account, if you want (it will be opt-in). There are a lot more edge cases here, though, and we need to consider the various authentication flows, so this is still "coming soon."
Finally, we are going to continue to invest in our modern API and work to deprecate and remove all of the older ones ... better security, functionality, and a simpler codebase! Many wins.
We care a lot about your security and privacy, and while some of these changes will definitely have a negative impact on some functionality, we make them with the goal of honoring our pledge to respect your data. And, honestly, this was too long coming.
Thanks for reading, and as always, please let us know if you have any questions/comments.
From vault 37.566329, -122.323479, this is mark, signing off.
1 Up to the reader whether this is 'bad as in cool' or 'bad as in not good,' it really depends on your point of view, and the author is not trying to imply anything here.
2 I have no idea on whether or not Brad actually looks back fondly on those days.
3 Because honestly, anything I wrote in my LJ in the early 2000s better well be gone and buried by now and I pray never sees the light of day again.
4 I am suuuuuuuper simplifying here, and obviously eliding over the properties of a one-way hash. I understand that addition is trivially reversible here, but hey now it's an example.
5 Well, it's not actually plaintext. Since we're forcing everybody to be using HTTPS all of the data (including your password) is strongly encrypted and is already safe against all sorts of attacks.
184 That's an exponent, not a superscript. But hey, made you look!
6 I'm using words like can't, but when it comes to cryptography the actual worlds are things like "infeasible." I just wanted to keep it simple, but I understand that there's a big difference here. It's still possible to figure out a user's password it just takes millions of years and millions of today's servers...
7 This post is already way too long, so I chose not to go into details about what a salt is and why it matters. I would have left it out of the examples but then someone would have commented that we sucked at security for not using salts. Yes, we're using salts. Random, one per bcrypted thingy.