dw_dev | Interest data API change

Next week's code push will see the removal of the /misc/interestdata script and the addition of its replacement, a journal-side /data/interests API that outputs in JSON format. (note: This is not part of the XML-RPC API.)

Currently, interest data can be obtained using an URL like http://www.dreamwidth.org/misc/interestdata?user=sophie , which uses a custom format that has to be parsed specially. For example, here are my first five interests in this custom format:

# Note: Polite data miners cache on their end.  Impolite ones get banned.
# <intid> <intcount> <interest ...>
10847 2 #!/usr/bin/perl
867 294 80's music
741 284 80s music
296200 5 a11y
273 118 acceptance

After the code push, this interest data will be obtainable from http://sophie.dreamwidth.org/data/interests instead (obviously, the username will be different depending on who you want the interest data for), and will be in JSON format. The five interests above will be represented by the equivalent of the following JSON (note that the interests will be in no particular order, which I am simulating by putting the interests in a random order):

{
    "interests": {
        "296200": {
            "count": 5,
            "interest": "a11y"
        },
        "867": {
            "count": 294,
            "interest": "80's music"
        },
        "273": {
            "count": 118,
            "interest": "acceptance"
        }
        "10847": {
            "count": 2,
            "interest": "#!/usr/bin/perl"
        },
        "741": {
            "count": 284,
            "interest": "80s music"
        }
    },
    "name": "sophie",
    "account_type": "P",
    "account_id": "324"
}

The actual JSON will not be beautified, and you can see an example of actual parseable output on my Dreamhack at http://sophie.sophie.hack.dreamwidth.net/data/interests . (Note that my Dreamhack is very empty and all of the interest counts are therefore 1. If you'd like an account on my Dreamhack to test this new API before it goes live, let me know and I'll give you an invite code. If you already have a Dreamhack, you can also pull the newest code and test it yourself.)

Notice that the new JSON also includes basic user metadata - the username, account type and account ID. This should save people the bother of having to obtain this information from elsewhere.

The former /misc/interestdata script had an additional mode of operation where you could give it a single interest name and it would return the count of that interest. For now, this functionality will no longer exist after the next code push. (It was only of very limited use and as far as I know was never widely used.) If this is an issue for you, let me know; I'm thinking of ways to expand on that functionality, and while I can make no promises, I have Ideas.

If you have any questions or comments about this, please feel free to ask in the comments!

Flat | Top-Level Comments Only

So would this enable some kind of, like, data mining to build a database and then querying it? Like, to see who has interest A and B, or how many people there are, or what the most popular interests are.

It's always been possible to see the most popular interests and the number of people there are via the site statistics text file, although due to some counting bugs you can ignore the interests with a count of 16777215 and 16777214. (The same information is also available via the Web interface, although Dreamwidth's bot policy states that bots shouldn't screen-scrape HTML output.)

Data mining: Yes, this allows a form of data mining, but nothing you couldn't have gotten before if you were willing to spend a bit of time; data on which interests a given user has has always been accessible in a machine-friendly manner, and given data on enough users it would be possible to gain a good representation of which users have a given interest.

Currently, it's not yet possible to see which users have a given interest in a machine-friendly manner, but this information is available via the Web interface (example), though again, the bot policy states that this shouldn't be screen-scraped. The Web interface is limited to the 500 most recently-active users.

The changes described in this post would not give a machine-friendly access to this data, although the Ideas I mention in the post do revolve around making such access possible, though again limited to the 500 most recently-active users.

None of this (including the Ideas) would allow people to specifically enumerate every user and interest on the entire site; you would not be able to get this information using the user ID or interest ID, only by names. It is true that a fairly large database could be obtained, however.

What are your views?

I'm not sure. >.> Most of my app ideas involve using the proposed Dreamwidth API to actually get your full reading page and make posts and stuff, although it might also be interesting to do some kind of "match my interests list" thing where you try to find people who like a lot of the same things as you.

Probably my next biggest concern besides the API is Dreamwidth's use of SSL, or rather its lack thereof. >.>

Well, the functionality of finding people who like a lot of the same things as you is already available via the Web interface - here's my list, for example. It's not available via JSON yet, and that wasn't part of what I was planning to do, though given the way it's coded it might not be too difficult.

That page, btw, is accessed using the Interests page, which is the first option in the Explore menu.

Edited 2014-07-07 12:38 (UTC)

Interest data API change

no subject

no subject

no subject

no subject