API updates - looking for testers
Apr. 26th, 2020 11:25 pmInstructions:
1) Go to http://www.momiji.hack.dreamwidth.net/create and create an account
2) Go to http://www.momiji.hack.dreamwidth.net/api/
3) Paste the bolded token into the box in the authenication section
4) Click on paths to expand them and use the fields to try the requests out.
5) Report errors or unexpected behavior here!
Note 1: Tropospherical layout CSS is not loading properly for reasons unrelated to the API code that I haven't sorted out yet. Other themes work, and site functionality is fine, just kind of ugly
Note 2: The API demo stuff doesn't match the rest of the styling, and will probably only work on relatively recent browsers (the last year or so). This isn't the final form, I'm just using a third-party package so I can get all the backend stuff working before spending time on making the page pretty and matching to out site styles
Note 3: I'm going to try to get the content-importer going for people who want to import some test data, but the workers are fussy to run in development
Stuff what needs fixing
-
-
-
-
- Posting entries currently missing mood and preformatted attributes
Hi all,
I've got a few updates from the recent post on security changes. First, thanks everybody for the thoughts! Loved it. :)
So the main changes:
After consultation with some actual experts in the security space, I've added a pepper phase to the authentication storage. The TL;DR here is that we will be symmetrically encrypting the hashes before we store them in the database.
The reason to do this is that it means a database breach (if one were to occur) would not provide the attacker with any useful data. In order to even get the hashes to start attacking them, the attacker would have to mount a second successful attack in order to exfiltrate the encryption key (which is not stored in the database at all).
While in practice, the bcrypt hashes are probably all anybody really needs, it's very low cost for us to add this additional measure to the system. For the technically curious, we are using AES-256 encryption on our web servers and we have the ability to rotate keys over time, should we choose to do so.
Secondly, we have implemented
pinterface and
momijizukamorki's idea to use API keys as 'app passwords' to enable clients to continue to authenticate against Dreamwidth. This means Semagic and other clients can continue to work, but you will need to reconfigure them. See below!
Those are the main changes. Otherwise, tomorrow's code push will deploy the underpinnings of our new authentication storage and bring us into the modern age. At least when it comes to authentication storage. :)
Supporting Semagic (AFTER password changes are deployed)
Ok, so to use Semagic (and other clients), you will need to:
- Navigate to the Mobile Post Settings page.
- Click the
Generate New API Keybutton in theManage API Keyssection. - Copy the API key that was generated.
- Change your Semagic password to the API key you copied.
- That's it, have fun!
If you have any trouble with this, please let us know.
TL;DR: We are making some changes to how we do authentication (how you log in) that will unfortunately break a number of older clients that you might be using to talk to Dreamwidth. This is very unfortunate, but we think that the tradeoffs in improved security are very much worth it. This post talks about where we were, what we're changing, and the path forward for clients/APIs.
Historical Context
Hi all -- the times, they are a changin'.
Dreamwidth is, of course, based on LiveJournal, which was started in 1999. The codebase is old enough to buy a beer in the United States, and has been able to in much of the rest of the world for years. I'm sure there are people reading this post who were born after Brad started slinging lines of bad Perl1 back in the halcyon days2 of his University of Washington dorm room.
Anyway. We start our story back in ancient times.
Things looked very different back then. Server side CPU was constrained, hardware encryption was not a thing yet, browsers were all over the map, TLS was the new kid on the block, there was no Cloudflare/AWS/GCP (cloud computing wasn't a thing), etc.
Back at the beginning, the way you accessed LiveJournal was over a good old fashioned unencrypted HTTP connection and everything that went on between you and your long deceased3 LJ was completely visible to anybody who had the means or desire to sniff your traffic. This was obviously not great.
One of the biggest concerns we had back in those days wasn't that people would read your journal, but that they would be able to get access to log in as you. I'm not actually sure who built the original version, but it was very early on, LiveJournal added client side password hashing using MD5.
Now, the advantages of this system were that basically, your password never left your browser. When you tried to log in, the server would give you some personalized bit of information (a challenge) and then your browser would, using JavaScript, construct a response that was crafted based on cool mathematics (the one-way hash MD5).
This change let us guarantee that your password was never sent in the clear. This approach also allowed us to protect against so-called man-in-the-middle attacks and replay attacks by carefully crafting the challenge. The details of that are outside the scope of this blog post, but it was the right way to do things back in the early 2000s.
Anyway, that was good enough for the web browser, but LiveJournal also had a custom set of API interfaces (what we called "protocols") that enabled people to write non-browser clients. Originally we built something called the flat protocol and later came the XML-RPC protocol.
We used the same authentication plans here as we did in the web browser:
- Plain (cleartext) authentication, where you send us your password.
- Challenge-response authentication, where you do some client-side MD5 hashing.
These two systems have the same pros and cons of the web browser authentication schemes. Sending a password in the clear over an unencrypted HTTP connection is all sorts of bad, so we steered people towards using the challenge-response authentication scheme. At this point, I think most clients do this and don't even provide options for using the plain version.
A Digression, or, Why Not Challenge-Response?
Now I want to take a small digression into the main downside of the challenge-response authentication scheme:
It requires us to store your password.
Since most of us (including me!) aren't cryptographers, let me elaborate4 on this. Challenge-response authentication basically works by doing some fancy math with black boxes. The main thing to understand here is that we will use a "black box" (technical term is: a cryptographic hash function) and this box takes an input number and returns an output number.
So for example, if we gave the box a 7 it might tell us 12. And if we gave it an 8 it might respond with a 4. You could go on forever and the responses it would give you are never predictable and never duplicated. These two properties are very important.
To put this another way, if I told you that the number I got from the black box was 351, it's important that you not be able to guess what the input was that resulted in that 351. While this is obviously easy to guess in my example with small numbers here, but in the real world these black box (hashing functions) will give you a very large random number. In fact for this one particular function (bcrypt, with a digest size of 2184), you'll get a number in the range:
2,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2.45e+55)
Okay so that's big. I'm going to use some simplified numbers here to make the example palatable but let's pretend that your password is the number 10. Logging in would look like this:
Server: Hey, your challenge is the number 5.
challenge = 5 (random number created by the server)Client: Ok. Let me put my password (10) in the black box... it says 13. I'm going to now add that to the challenge (13 + 5 = 18) and put that in the black box... my response is now 25.
response = black_box( challenge + black_box( password ) )Server: The challenge was 5, let me put their password in the black box... 5 (challenge) + 13 = 18, now to black box that... 25! Yup, let them in.
test response = black_box( challenge + black_box( password from database ) )see if response from client is the same as test response
The meat of this algorithm is that the client is proving that they know some secret (your password) by doing some math on it and showing us the result. We do the same math on our side and we can check that the results match. The security is added by us giving them some controlled random inputs (the challenge), which forces the client to do "fresh math" (they can't reuse a previous answer).
This is a super simplification of the process but hopefully it describes it well enough. The main point I'm trying to make is that this entire algorithm requires that we know your password. Which sucks. I don't want to know your password. The reasons for this are many -- but one of the big ones is that if someone were to steal our database, I don't want them to suddenly have millions of email addresses and passwords that they can use to log in to your bank or something more important than your journal!
Digression: We do not believe, and have no evidence of, our database ever being leaked or accessed other than by the 3 staff members who maintain Dreamwidth's infrastructure. We are making these changes not because of some extrinsic motivation but because we believe that they're the right thing to do.
Anyway, let's move on from authentication for now. There is also the fact that we were using unencrypted HTTP. This was fine for your journal (well, "fine"), but was less good for things like entering your credit card and things. At some point in the past, LiveJournal did actually build in support for using HTTPS and encrypting your connection, but it was only used for pages like the shop and other areas that were considered sensitive.
LJ did move logins to use SSL at some point too, but of course the unencrypted version was still available if for some reason you couldn't use the encrypted version. This remained the case for many years and the propensity to use unencrypted connections was substantially unchanged for 15+ years, but we'll get into that soon.
Encrypting the Tubes
Let's fast forward to the late teens (the twenty-teens)...
CPUs are much, much faster, and more importantly, services like Cloudflare have built a business on encrypting the world's traffic. Literally everybody wants you to use encryption and they make it very easy (and, important to Dreamwidth, very cheap!) to do so.
Over the past few years, we here at Dreamwidth embarked on a project to make HTTPS the default and, more than that, to require it. It's not even possible anymore to browse Dreamwidth unencrypted -- we detect and redirect anybody who tries. This is the way it should be, security should be the default! Since it is technically and economically feasible, there's zero reason for us not to do this for all of our users, wherever you are and whether or not you pay.
So we did that. That's great, this definitely improves the security of the system and the risk of our users getting their data exposed, but fundamentally we still have the same problem we had above re: knowing your password. Those are still stored in the database and, if the worst were to happen, we don't feel comfortable having that data around when we really don't need to.
(But as above, we have no evidence of any compromise of our database having ever happened. We're making these changes proactively though, to ensure that if it ever does, we can try to ensure that passwords are not easily compromised.)
How Not to Store Passwords
Before we get into what we're doing, let's talk about how you can provide a secure service without storing somebody's password. Again, I'm going to go back to that example I was using above, except this time we're going to leverage those black boxes again but slightly differently.
Assuming that we're using modern, strong connection encryption for everything, we no longer need our authentication layer to protect against man-in-the-middle and replay attacks. This is a pretty great trade since it means we can simplify the approach taken in our authentication system quite substantially... and we get to go back to sending passwords plain5!
But more importantly than how we send the password, the fact that we no longer need to provide protection against those two attacks means that we can store the end result of the black boxes in the database instead of your password. Let me break that down... let's talk about creating an account and setting your password for the first time.
Client: Ok, I'd like to make a new account, my password is
bob.Server: Great! Welcome to Dreamwidth, let me just do some math here...
salt = make a very big random numberdatabase_password = black_box( password + salt )
The server then stores that database password! It never writes down the user's actual password. (Remember that these black boxes are one-way, which means that you can't6 figure out the user's password from the database password.)
Next time, when the user goes to log in, this is what it looks like:
Client: Ok, let me log in, my password is
bob.Server: Great, welcome back! Let me see...
test_password = black_box( client password + salt from database)compare test_password with database_password
Notice that we basically re-did the math we did in the create account step, but we used the salt7 (stored only on the server) and the password (that the client gave us). Then we checked the result of our math with the result we stored in the database.
There's a lot of technical depth to this area, but the TL;DR is that by using an appropriate black box here, we guarantee that even if our database gets stolen, it will be very, very hard for the attackers to figure out anybody's password. It's still doable -- but the attacker has to guess every possible password, for every single user, which is... very hard.
But the astute reader will point out that even this scheme is only as resilient as your password, so pick strong passwords! If your password is actually bob an attacker will be likely to find it pretty quickly.
But, What About Challenge-Response?
Now we get to the hard part.
Since the server no longer knows your password, we can no longer implement challenge-response. We have no way of verifying that the response you're sending us is actually correct, so this improvement in password storage will break existing clients.
This sucks, but there are paths forward.
If you are a client maintainer: you can immediately switch to using plaintext authentication over HTTPS. This is secure and, by sending the password, we can validate you just the same as we validate any browser login. This is the short term path and you should do this.
However, the better path is to start using our new API. This post won't go into too many details, but we've been working on building out an OpenAPIv3 compatible API. Check out the API spec! We don't have all of the methods you will need to build a full client yet, but we're working on it.
What's Next
This week, we are rolling out strong password storage (see pull request #2621) which will have the impact of breaking clients that are using challenge-response authentication. If you have the ability, we recommend you switch over to using plain authentication -- using HTTPS, of course!
Coming soon, we are are also building the ability to optionally use two-factor authentication (2FA) (see pull request #2624) to further increase the ability to protect your account, if you want (it will be opt-in). There are a lot more edge cases here, though, and we need to consider the various authentication flows, so this is still "coming soon."
Finally, we are going to continue to invest in our modern API and work to deprecate and remove all of the older ones ... better security, functionality, and a simpler codebase! Many wins.
We care a lot about your security and privacy, and while some of these changes will definitely have a negative impact on some functionality, we make them with the goal of honoring our pledge to respect your data. And, honestly, this was too long coming.
Thanks for reading, and as always, please let us know if you have any questions/comments.
From vault 37.566329, -122.323479, this is
mark, signing off.
1 Up to the reader whether this is 'bad as in cool' or 'bad as in not good,' it really depends on your point of view, and the author is not trying to imply anything here.
2 I have no idea on whether or not Brad actually looks back fondly on those days.
3 Because honestly, anything I wrote in my LJ in the early 2000s better well be gone and buried by now and I pray never sees the light of day again.
4 I am suuuuuuuper simplifying here, and obviously eliding over the properties of a one-way hash. I understand that addition is trivially reversible here, but hey now it's an example.
5 Well, it's not actually plaintext. Since we're forcing everybody to be using HTTPS all of the data (including your password) is strongly encrypted and is already safe against all sorts of attacks.
184 That's an exponent, not a superscript. But hey, made you look!
6 I'm using words like can't, but when it comes to cryptography the actual worlds are things like "infeasible." I just wanted to keep it simple, but I understand that there's a big difference here. It's still possible to figure out a user's password it just takes millions of years and millions of today's servers...
7 This post is already way too long, so I chose not to go into details about what a salt is and why it matters. I would have left it out of the examples but then someone would have commented that we sucked at security for not using salts. Yes, we're using salts. Random, one per bcrypted thingy.
Code tour: 2020 Q1 edition!
Apr. 20th, 2020 11:03 pmContinuing on from the last code tour!
Hi all, some changes coming your way -- figured I'd make the noise!
Cool beans, that's what I've got for now! Bye bye!
Question thread #86
Apr. 5th, 2020 05:41 pmThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
But Seriously, Why Aren't You Using Dreamwidth?
There are a lot of good reasons to use Dreamwidth as our primary social media: It respects our privacy; it doesn't treat us as products to be sold to advertisers; it shows us everything, in exactly the order we want it; it allows short or long-form posting. Yet most of us aren't using it. Is there anything that can be done to change that? Are we all just doomed to use social media platforms that have no respect for us?
If you want to attend that session, or participate as a speaker, now's a good time to create a WisCon website account and sign up for the panel. (As of right now the con is planned to happen in-person this year but they'll update the community again on March 20th.)
(A previous WisCon had a DW power tips session - notes here.)
Question thread #85
Mar. 4th, 2020 04:00 pmThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
Errors on a first step
Feb. 18th, 2020 09:52 pmI practically didn’t work with API before, so I took an example from here
http://wiki.dwscoalition.org/wiki/index.php/XML-RPC_Protocol_Method:_postevent
as it is, and and got an error
XML or text declaration not at start of entity at line 1, column 1, byte 1 at /home/dw/current/extlib/lib/perl5/x86_64-linux-gnu-thread-multi/XML/Parser.pm line 187.
Could anyone please tell me which client is needed to work with and in what field ?
Thank you all in advance
UPD
Fixed, by replace ESC 0D 0A by
$ReturnTextPostTestMessage01.Replace("`r`n","")
Question thread #84
Feb. 4th, 2020 05:49 pmThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
Python: PyCon North America, mid-April in Pittsburgh, Pennsylvania (I will be at this) goes April 15-23:
- Tutorials (for which you pay extra) and some specialized Python summits: April 15 & 16.
- Main talks: Friday, April 17-Sunday, April 19.
- Open source sprints (you don't have to pay a PyCon registration fee to attend): Monday, April 20, 2020 – Thursday, April 23, 2020. Example sprints in 2019.
PyCon offers financial assistance to people who would like to attend (deadline for requesting financial assistance is 31 January). I have successfully requested financial aid in the past. It's a good process.
Registration starts at the student/academic rate: $125 per person.
Conference hotel rates: $164-$180/night, plus taxes/fees (about ~14% taxes/fees, so, $187-$205/night). There is a hotel room sharing page on the PyCon site.
PyCon loves to cross-pollinate with other free and open source movements, and I know there are many Python developers in Dreamwidth tech. If Dreamwidth folks want to use the April 20-23 in-person sprints to work on Dreamwidth-related Python tools together, that would be cool!
OpenHumans and other grant possibilities
Jan. 9th, 2020 04:29 pmOne of the options for a project type is to "provide valuable new/novel data sources". I wonder whether they'd be interested in enabling import from Dreamwidth? This could be something an individual works on using the existing API/exporter tool and doesn't have to involve DW Studios. "This is open to both US-based and international individuals. No institutional or organizational affiliation is required."
I know Mad Price Ball, the co-founder and Executive Director, and IMO they are cool and trustworthy.
(I also wrote up a list of other grants available for open source work in case any of you are interested, for DW or other projects you work on!)
Question thread #83
Jan. 4th, 2020 02:36 amThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
BML files are not handled by Perl
Dec. 22nd, 2019 12:14 amI'm trying to install Livejournal engine from scratch.
If I point my web browser at my server, it shows '403 Forbidden' error.
If I go to localhost/index.bml, it outputs source of the page without handling it with Perl.
Can you help me please?
My httpd.conf:
NameVirtualHost *:80
PerlSetEnv LJHOME /home/neva_blyad/lovecrypt/cvs/lovecrypt/
PerlPassEnv LJHOME
StartServers 1
PerlRequire /home/neva_blyad/lovecrypt/cvs/lovecrypt/cgi-bin/modperl.pl
<VirtualHost *:80>
ServerName www.lovecrypt7k5p7uh.onion
DocumentRoot /home/neva_blyad/lovecrypt/cvs/lovecrypt/htdocs/
ErrorLog /1.err
CustomLog /1.custom common
</VirtualHost>
<Directory /home/neva_blyad/lovecrypt/cvs/lovecrypt/>
Option FollowSymLinks
AllowOverride FileInfo
Order deny,allow
Allow from all
</Directory>
UPDATE:
The problem is solved. I've just copied content of Apache->httpd_conf() functions from mod_perl.pl to my httpd.conf.
What I should say is that original Livejournal code from 2003 works now in 2019!
Link:
https://web.archive.org/web/20070214155614/http://www.livejournal.org:80/download/code/livejournal-2003082500.tar.gz
How it looks:
https://web.archive.org/web/20030830211018/http://www.livejournal.com/
Code tour for 2019-09-30 to 2019-12-12
Dec. 14th, 2019 01:49 pmBut anyway! Here's the stuff that's been going on in the Eldritch Depths of the codebase that should hopefully make things even better than they already are around here!
I'm not sure when it's going to get pushed live, but this's what's on its way!
( strap in! )
35 total issues resolved
Contributors:
Dreamwidth Canary Server!
Dec. 7th, 2019 10:54 pmHi all!
As part of all the work I've been doing on the backend, I've also made it possible for us to "preview" the latest code before we roll it out to everybody. In parlance, this is called a "canary server" and it can be updated to the latest code and enable us to test (most) things before they get in front of all of the users.
You're all welcome to start hitting canary right now, if you want to live on the bleeding edge. It's a little tricky to set up, but if you're familiar with manipulating cookies, you'll be fine. If not -- maybe sit tight and wait until we write up a button to toggle it on/off easily!
But, if you do like cookies, all you need to do is add a new cookie like this:
- Name:
ljcanary - Value:
1 - Domain:
.dreamwidth.org - Expiration: Session or whatever you want
If you present that cookie in your request, we will route you to the canary servers which are running the latest code. Usually a few minutes after something lands on master, it'll be available on canary... now that's living on the edge. :-)
For now though, if you want to help test the changes to Markdown or recent formatting changes made by the inestimable
roadrunnertwice, we'd appreciate you popping over to canary and helping out!
Question thread #82
Dec. 2nd, 2019 09:07 pmThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
As far as we both know (please correct us if we're wrong!), you can only filter your posts by a particular tag. For example, to see all of my posts tagged with "meta", I'd visit https://alexwlchan.dreamwidth.org/tag/meta.
What if you want to search by more than one tag? For example:
- Which posts have I tagged with reviews and quotes? (An AND query)
- Which posts have I tagged with at least one of person:alex or person:lexie? (An OR query)
I already had code that uses the XML-RPC API to get all my posts (to get a backup of my Dreamwidth entries). I added some extra filtering, and now it can search posts using the queries of the form above.
Usage
- You need Python installed (downloads page). Python 2.7 or 3.x is fine; if you already have Python installed on your computer, that should be fine.
- Copy the code below into a file, for example
search_dreamwidth_posts_by_tag.py. - Run the script with Python, for example by typing
python search_dreamwidth_posts_by_tag.pyin a terminal.
Reusing the code
If you know a bit of Python, you should be able to pull out bits of this code and reuse it elsewhere -- the XML-RPC API client, downloading all your posts, checking a user's password. You could modify it to find posts by different criteria: posts within a particular date range, or posted at the weekend, or that don't contain the letter e.MIT license.
The code
( Code behind a cut tag )A while back, I submitted a patch that sometimes shrinks images within post and comment bodies, to make sure they fit inside their container and fit on a single screen.
It seems to mostly be working, but there's also times when you don't WANT it to work; for example, it's perfectly fine for a long vertical comic strip to flow off the page.
So I'm messing with a way to let people click-to-zoom on individual images!
»» Here's the demo. ««
That demo has several different cases I thought about — for example, I'm not enabling click-to-zoom for images that are links, because links already do something when you click on them (and most of the time it's "go to the full size image" anyway). I'm also not zooming poll bar graph images, because those are special and weird.
Anyway, the trick with this is to make it do what people want most of the time, but hopefully never do anything really nonsensical. So I need help thinking of things that people do with images in their posts/comments that might cause nonsense, and then I can figure out which nonsense to mitigate and which nonsense to just accept. The only ones I came up with so far are:
- If an image didn't need to shrink, it gets a "click to zoom" cursor on hover but it doesn't actually do anything.
- If someone used the
width="x"/height="x"attributes to make an image bigger than its natural size, the zoom cursor hints will be reversed (i.e. "zoom in" will actually shrink it).
What else have you got?
Question thread #81
Oct. 27th, 2019 02:19 pmThe rules:
- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.