I'm trying to crawl and parse comments on a community for a fandom event (http://hs-worldcup.dreamwidth.org , if you're curious). I've run into a bunch of issues and lack of API documentation, and talked to DW Support a couple of times, and feel like I am further away from successfully doing anything than when I started. Before I say anything else, here is What I Am Really Trying To Do:
- take an individual community post (example: http://hs-worldcup.dreamwidth.
- download all of the comments with some sort of threading information --- the data I need in particular is comment subject, comment author, comment content, whether or not it's a reply and if so to what
- parse out that data and do transformations to it and add it to a database (which is not super relevant to this question I don't think but I can go into more detail if necessary)
If that is what I should do, how do I get around the adult content warning? Is there a flag I can pass with the URL or something? Do I need to do something more complicated than just using curl to grab the pages? Is there something I can pass to say "just give me one piece of HTML with all 5000 comments on it it will be easier for both of us probably?"
Thank you for any suggestions or advice you might have.
I've been told there have been "some internal conversations about deprecating the XML-RPC API -- keeping it for backwards compatability, but moving to a much more modern second-gen API", but that nobody has had both the time and the inclination to work on designing such a thing.
Well, this is me, volunteering. To that end, I'm looking for input on what exactly such a new API needs to provide, and whether there's a preferred underlying technology to build on (exempli gratia, stick with XML-RPC? Change to SOAP? Use JSON? RESTful or not? et cetera). What I'm getting at here is that I'm entirely happy to take point, as it were, and to make decisions (especially where there's little or no consensus and someone has to make the call), draw up specs, write docs, and so forth, but the result is highly unlikely to be a really useful API unless I get input from more sources than my own experience and looks at the code.
At this stage, therefore, I want everything you, the reader, have to say on the subject. Use cases especially.
Dreamwidth apparently no longer uses authenticated RSS for its reading page. So if I want to create an app (for, say, Windows Phone) that lets you keep up with your Dreamwidth reading, I need to create an XMLRPC client.
Here's the part where I try to figure out how to do so.
This DW Github link is for an "XMLRPC transport that supports DW::Request".
And this DW Github link seems to have all the functions for displaying Dreamwidth content, including the inbox and (on line 284) your reading page.
So if I want to write an app that shows your reading page, I figure out how to use XMLRPC + DW::Request to send a getreadpage request? And then it returns your "entries" so I figure out how to parse that on the receiving end? And then I basically do the same thing for stuff like your inbox and posting, I guess ... right?
Can anyone clarify this process for me?
addcommentsnow has a journal parameter, a feature adopted from LiveJournal code
getreadpageis a new function that's been added, and is equivalent to
getfriendspagedoes on LiveJournal.
getfriendspageis deprecated, and will return an error message telling you to use
If you're writing a client and there's some data you're interested in that's not currently available through the protocol, let us know!
For example, 'getfriendspage' returns no error, also no information. Since DW uses a different concept around reading and subscribing, the idea that the mode doesn't work isn't too surprising.
Looking at Protocol.pm is confusing in that it appears to support getfriendspage (at least, there's code to implement it). I don't see that there is any equivalent interface for retrieving reading page events.
Since the functions of Protocol.pm call internal routines that may or may not do anything, is there a better way of discovering what things are actually supported and what they return?
(I've seen http://wiki.dwscoalition.org/notes/XML-
I'm the primary developer for ElJay, an Android client for sites based on the LiveJournal codebase. Right now, the client has a pretty decent feature set for livejournal.com, but varies in ability for sites based on that code.
Dreamwidth, obviously, deviates from the LJ code a great deal. I am currently working on getting DW circles working as first class filter providers in the application, and would like to eventually support some of the other less documented features like viewing the friends list and supporting the inbox. Generic posting for DW works just fine.
Generally, on LJ, I look through ljprotocol.pl to decipher how to get these features working, and that works well on LJ and those who have updated to the latest LJ code. However, this file disappeared at some point in the Dreamwidth Mercurial repository, and now I'm lost. It seems there are a lot of DW fans using Android, and I'm getting badgered. :)
So, to make this short: It looks like ljprotocol.pl became LJ/Protocol.pm, but some of the calls don't seem to actually map to what I see in that file. Are there certain options shut off for DW in production? Is there another file I should be looking at that corresponds to /interface/xmlrpc?
I appreciate any help, and forgive me if this is the wrong community to post this.
Two things that you'll need to do to run this on your dev server:
1.) make sure that memcache is running. Otherwise duplicate-request-detection code doesn't kick in, and authentication will fail
2) make sure that the files are deleted from your live code area in $LJHOME. These modules are the old version; the new versions of the modules are already installed if you're using a 'hack, or would have been installed if you'd followed the directions in Dreamwidth Scratch Installation.
As a reminder, here's how to delete the files using cvsreport:
for i in `bin/cvsreport.pl -n -1`; do rm $i; done
(restart your webserver)
Is it better to consider the service document on a per-journal, or on a per-account basis? The service document is a URL you enter for your client to discover which URLs to use to post to a journal or a community.
I have seen two separate sets of suggestions, one that says to put the service document in:
which lists your journal as well as any communities you have posting access to.
The other train of thought says to to put the service document in:
And list only the collections (entries and eventually media such as images) that you can manipulate with that particular journal.
In both cases, entry posting will under journal space, such as http://username.dreamwidth.org/
I'm going back and forth on this one. I currently have the latter implemented, but am beginning to talk myself into the former. Before I tweak my code, though, I'm interested in hearing any more informed opinions on which option is the more standard.
On that note, the atom interface code is up on my public dev server, which is open for testing for anyone interested in trying it out with any clients you use. And if someone can point me to your favored client or service which uses APP, I'd like to try testing that on my own.
I'm trying to create a client that can add a comment to a community post using the XMLRPC addcomment method.
There is discussion about the method here: http://community.livejournal.com/
When I use it on LiveJournal, it works perfectly with free accounts, both in personal journals and communities. On Dreamwidth, it only works with a paid account and only when posting to one's own journal. If I try to send a comment to a community, I get this error:
No such entry. at /home/dw/current/cgi-bin/ljprotocol.pl line 223
Does anyone know how to resolve this or if Dreamwidth even allows the method to be used on communities?
**edited to add a PHP example** ( code here )
I've got most of what I want to acheive working, including, finally, Icons (the keyword is case sensitive, obvious when you think about it). However, I haven't managed to get post security working--this isn't an issue for me, and therefore lower priority, but I'd imagine some would like it, and would like to include it. I've read through the Wiki entries on both XML-RPC Protocol and XML-RPC Protocol Method: postevent, and worked through what I think the code should be, but it isn't working. My current code is here:
Happy to hear any concerns! In particular, any huge objections to breaking it for clients used to the old verison of the atom interface? And thoughts on using the http://yourusername.dreamwidth.org/
Changes not yet live on the site but should be for future code push, but here's the run down:
checkfriendsno longer works. Instead, use
- required arguments to
checkforupdatesis only authentication information
- optional argument
lastupdate(in "0000-00-00 00:00:00" format). This is the last update time you have, from previous calls to checkforupdates
- optional argument
filter. This is the name of a content filter whose members you want to filter to.
- return value:
new: 1 or 0. 1 only if you pass in a
lastupdateand there are new items since then. 0 in all other cases.
- return value:
interval: number of seconds before you can next check for updates. If you check before time expires, you'll get a cached value
- return value:
lastupdate: time someone last updated, in "0000-00-00 00:00:00" format
It's almost exactly the same as checkfriends; the only differences from the frontend are the name change, and replacing the
maskargument with a
filterargument because the trustmask is no longer relevant since we split up access and subscription.
ETA: Added preliminary documentation on the wiki.
Not many people commented on the bug, though, so catness didn't get a ton of feedback on the proposed modifications, and none at all as far as I can tell from other people who actually use the API to develop client applications. I recently began moving over the old LJ Server Manual's XML-RPC chapter into the DW wiki, so I read through the patch in an effort to put together some pages for the new getcircle, editcircle and gettrustgroups methods. When I looked through the code, though, I found myself thinking of a few things that should maybe be changed before the new methods get pushed live. Once an API starts getting used by client applications, it becomes much harder to modify, and I thought it was worth having another look before the new methods got put into use.
I first detailed my thoughts in a comment on the bug, but denise suggested I open a discussion here, so I'm going to repost it all and hopefully provoke a few more people into looking over the patch and talking about whether anything would benefit from some tweaks.
It is worth noting here, as I did in my bug comment, that I am almost totally new to DW development, so maybe there's stuff going on that I am not fully grasping yet -- if that proves to be the case, please do not hesitate to set me straight!
( My long rambly thoughts on bug 2451! )
So, draft 1 of http://www.chiark.greenend.org.uk/~
It runs through DW posts, finding LJ entries that have already been cross-posted or imported to DW, and for some subset of them based on things like their visibility, date, length, whether they contain polls, and so on, edits them so that their text is commented out (possibly with a snippet of the original post), adding a note saying that the post has been moved to such-and-such a DW post which is thataway. It optionally disables or locks the comments on the LJ posts.
I've tested it on my test account, and it seems to do what it's supposed to over a small number of entries, but more testing is needed; bug reports would be most welcome; also at this stage it'd be useful to know if it works just fine.
It still needs a way of telling which xposting account is which if someone has more than one of them.
I don't know what'd be most use for the community - I can dump it in the wiki or version-control as-is with a do-as-you-will-with-this license, maybe stick a CGI front end on it if anyone wants to host it and there aren't major problems with having a website asking for people's passwords. I don't have much desire to support this long-term once bugs have been ironed out, so am loath to distribute/host it myself.
The getevent call returns the eventtime parameter, which is the poster-specified datetime, not the log time. The syncitems call returns a list of all the items that have been created or updated for a user, along with the server time -- but once an item has been updated, it only returns the update time, and the create time cannot be obtained.
pauamma thought there would be value in returning the log time and the update time separately, and I agree, but suggested we get some opinions on that change here. So: thoughts?
1. Is there any way at all to get the system-recorded UTC post time of an entry? The eventtime value is the user-specified post date and time, not the date and time actually recorded by the system when the entry was created. I feel like this should be an easy value to get hold of, but I just can't see where.
2. For entries that were imported to DW from another journaling site, the poster value returned by getevents is that of an OpenID account, and looks like ext_110540 or similar. Is there any easy way of mapping it to the offsite user's name & journal system, like the way it gets displayed within Dreamwidth? The only way I can think of doing it is looking up each one's profile page like so: http://ext-110540.dreamwidth.org/
And then scraping the string I want out of the page's html. So, possible, but not really ideal.
Thanks for any answers you can provide!
I'm trying to add xmlrpc posting to Xpostulate, not only to work better with LJ, DW and (insert favorite LJ clone), but also to get wordpress crossposting worked in (since that also uses xmlrpc).
I seem to have sorted out my xml, and am certain I am sending a valid postevent, with one exception,
I've been missing something, which, I believe (thanks to catness to be LJ.XMLRPC.getchallenge data
so, I am now requesting a challenge from the server, then using tdom (tcl parser) to parse the response, but I think I'm feeding tdom the wrong data, because it keeps throwing a syntax error, as if I am not feeding it xml
I thought I was feeding it the server response, which, to my knowledge, should be an xml response, giving me some c0:balbalbalba value (which needs parsed out) to use in the postevent to send thereafter
here's my code: http://pastebin.com/ed4cTjaG
the result I get is simply error "syntax error" at line 1 character 0
"o <--Error-- k"
this leads me to believe that neither http::meta nor http::status is the variable I need to feed to the parser to get the challenge variable I need
I had asked a question about the missing XML-RPC custom groups support a while ago. Just now I've installed a local DW clone for testing, and hacked ljprotocol.pl to support a new method called "gettrustgroups" - it looks exactly like "getfriendgroups" but returns trust groups. My LJ-client (qtxpost - it's fully functional now, at least for posting/editing) calls this method in addition to "login", if the server is described in config as a "dreamwidth" code branch, to replace the empty groups list returned on login.
So far it seems to be working :) I wonder if you would accept a patch for it, or you've been planning to do it in a different way and it's anyway too cheeky for a newcomer to poke around important code? Also, is it possible to join DW development team, even if I can't guarantee how much time I would be able to dedicate to it? (I work a full-time job as a programmer/sysadmin.)