janinedog: (Default)
Janine ([personal profile] janinedog) wrote in [site community profile] dw_dev2009-07-25 11:07 am
Entry tags:

Determining active users

So I decided to give bug 211 a shot (allowing people to give a paid account to a random active free user). I figured it'd be good to get going on this, what with search being a paid-only feature. :)

I posted a comment in the bug about this, but I figured I'd post here too. The spec says the following for determining who is an active user:

* Privilege people who have been long-term active on Dreamwidth for the random choice selection, not simply people who've made one or two posts recently. (Active = any one or more of: posts regularly, comments regularly, posts to comms regularly, etc.) The idea is to identify the most-active free-user contributors.


I noted in the bug that I could see getting some sort of approximate answer for who posts entries in their own journal or who posts comments most often by doing some calculations to get the average number of entries or comments the user has made per day; i.e. ( number of entries posted / number of days the user has had their account ), or ( number of comments posted / number of days the user has had their account ). And then you could also determine who has posted in their journal most recently as well, or at least in the past X number of days. However, I don't think we have any data for who posts to communities regularly, nor do we have data for who has most recently commented (or who has commented in the past X number of days). Pretty much all of this could be retrieved through (probably complicated) queries, but obviously we don't want to do anything so complicated that it overwhelms the databases. Though maybe with enough caching it'd be okay, I'm not sure. I suppose the people who are most active probably will continue to be the people who are most active for a while.

For reference, the random user feature that we have (http://www.dreamwidth.org/random) does check to make it returns only someone who has posted in the last 7 days. But I think that's all it really does in terms of determining activity.

Anyway, does anyone have any ideas for how to go about this in a sane manner? This is a "should have" instead of a "must have", so I imagine it doesn't have to be complete/perfect, but it'd be good to get something that's at least somewhat accurate.
elf: Computer chip with location dot (You Are Here)

[personal profile] elf 2009-07-25 06:45 pm (UTC)(link)
Potential bug:

Shouldn't random user show someone who's made a public post in the last 7 days? (I clicked and got a journal that looked empty. And I remember this happening before.)

Othernotes:
I don't like calculating activity based on total entries/# days; it penalizes long-term users who've had a quiet period or took a six-month sabbatical. (The idea of favoring a user who started last week and posted 20 memes over someone who's been around for months or years, and posted every three days except for summer when they don't post at all, sounds bad.) If it's calculated over time, I'd rather it didn't check back farther than six months or so.

Or perhaps that it always measured activity over six months--which skews away from new accounts, which haven't had activity that long.

Also--how would it deal with imported posts? Does it have a way to exclude those from the activity levels?
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2009-07-25 06:50 pm (UTC)(link)
(Random note on random: it's not implemented exceedingly well, in the sense that, if someone posts and then deletes their post or if they post publicly and then 'oops that should be private' their post, the random system doesn't remove them from the list.)

We can tell which posts are imported, yes, we can exclude those fairly easily from any metrics that we use.

Agreed with not going back more than a few months.
elf: Rainbow sparkly fairy (Default)

[personal profile] elf 2009-07-25 07:43 pm (UTC)(link)
I don't expect the "random user" or "random post" features to be perfect, or, umm, particularly anything; the whole "HERE IS SOME RANDOM CONTENT" feature is a novelty of no practical use whatsoever, and I can't imagine getting upset because it provided unwanted content (blank journal, ugly/offensive stuff, deleted journal, whatever).

I like the random user/random post feature; I've had good conversations because of it, and occasionally made a friend that way. But it's not like anyone (sane) could say, "Hey! This isn't working, and my DW experience is ruined because of that!" It's too weird to matter if it's non-glitchy--but I didn't know if it grabbed users who've posted locked things; it hadn't occurred that it might grab users who posted things that were later locked or deleted.

Short version: Thanks for explanation! My understanding of DW has been enhanced! I continue to be nondismayed at however the randomizer works, or does not work!

I wish I had nice suggestions for how to measure "all [username] recent activity;" it hadn't occurred to me that journal posts, comm posts & comments were stored in such different ways that there's no simple way to count & compare them.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2009-07-25 06:52 pm (UTC)(link)
I would suggest that you add more logging/information that's useful to answering these questions. I.e., figure out exactly how we mark users as active, and then add more to that. "Every time someone comments, posts to a community, or posts to their journal, we record it."

Then you can design that table to be easily queried. Of course, this extends the project, and runs the risk of MBDing it. If you'd rather keep it simple, then just go by people who are posting in their own journal: much easier.
foxfirefey: A fox colored like flame over an ornately framed globe (Default)

[personal profile] foxfirefey 2009-07-25 07:42 pm (UTC)(link)
One variable to consider is how many accounts are subscribed to a user--probably not OpenID ones, but DW ones.

It might also be good to be able to randomly give to active communities as well as active users--and number of members/subscribers as well as post and received comment activity might be a good metric there.
highlander_ii: 4 Musketeer swords saluting together from the Disney version of the movie ([3 Musk] All For One)

[personal profile] highlander_ii 2009-07-26 12:37 am (UTC)(link)
I'm also wondering how this would relate to the 'account linking' that's in the pipeline for later. (The parent/child account set-up.) Will that make a difference in how you set this up now?
highlander_ii: Chris Pine kneeling on the floor holding a camera to his face (Default)

[personal profile] highlander_ii 2009-07-26 01:17 am (UTC)(link)
Not really. I was wondering how an account would be considered 'active' - would the 'children' account for the activity level of the 'parent' journal or would they all be considered independent of one another?
alierak: (Default)

[personal profile] alierak 2009-07-27 02:46 am (UTC)(link)
I'm not sure if it's sane, exactly, but this post got the ol' gears spinning, and I started to think in terms of an activity score that would go up with posting, commenting, etc., and down with days since last contact. It would need a table of its own, just to associate userids with scores and last update time of the score, updated whenever the user "does something".

Then I thought, hmm, I wonder how Sourceforge does project activity rankings... They're comparing several 7-day stats for each project to the highest 7-day activity of all time, over all projects. Definitely some food for thought there.