Entry tags:
Determining active users
So I decided to give bug 211 a shot (allowing people to give a paid account to a random active free user). I figured it'd be good to get going on this, what with search being a paid-only feature. :)
I posted a comment in the bug about this, but I figured I'd post here too. The spec says the following for determining who is an active user:
I noted in the bug that I could see getting some sort of approximate answer for who posts entries in their own journal or who posts comments most often by doing some calculations to get the average number of entries or comments the user has made per day; i.e. ( number of entries posted / number of days the user has had their account ), or ( number of comments posted / number of days the user has had their account ). And then you could also determine who has posted in their journal most recently as well, or at least in the past X number of days. However, I don't think we have any data for who posts to communities regularly, nor do we have data for who has most recently commented (or who has commented in the past X number of days). Pretty much all of this could be retrieved through (probably complicated) queries, but obviously we don't want to do anything so complicated that it overwhelms the databases. Though maybe with enough caching it'd be okay, I'm not sure. I suppose the people who are most active probably will continue to be the people who are most active for a while.
For reference, the random user feature that we have (http://www.dreamwidth.org/random) does check to make it returns only someone who has posted in the last 7 days. But I think that's all it really does in terms of determining activity.
Anyway, does anyone have any ideas for how to go about this in a sane manner? This is a "should have" instead of a "must have", so I imagine it doesn't have to be complete/perfect, but it'd be good to get something that's at least somewhat accurate.
I posted a comment in the bug about this, but I figured I'd post here too. The spec says the following for determining who is an active user:
* Privilege people who have been long-term active on Dreamwidth for the random choice selection, not simply people who've made one or two posts recently. (Active = any one or more of: posts regularly, comments regularly, posts to comms regularly, etc.) The idea is to identify the most-active free-user contributors.
I noted in the bug that I could see getting some sort of approximate answer for who posts entries in their own journal or who posts comments most often by doing some calculations to get the average number of entries or comments the user has made per day; i.e. ( number of entries posted / number of days the user has had their account ), or ( number of comments posted / number of days the user has had their account ). And then you could also determine who has posted in their journal most recently as well, or at least in the past X number of days. However, I don't think we have any data for who posts to communities regularly, nor do we have data for who has most recently commented (or who has commented in the past X number of days). Pretty much all of this could be retrieved through (probably complicated) queries, but obviously we don't want to do anything so complicated that it overwhelms the databases. Though maybe with enough caching it'd be okay, I'm not sure. I suppose the people who are most active probably will continue to be the people who are most active for a while.
For reference, the random user feature that we have (http://www.dreamwidth.org/random) does check to make it returns only someone who has posted in the last 7 days. But I think that's all it really does in terms of determining activity.
Anyway, does anyone have any ideas for how to go about this in a sane manner? This is a "should have" instead of a "must have", so I imagine it doesn't have to be complete/perfect, but it'd be good to get something that's at least somewhat accurate.
no subject
Shouldn't random user show someone who's made a public post in the last 7 days? (I clicked and got a journal that looked empty. And I remember this happening before.)
Othernotes:
I don't like calculating activity based on total entries/# days; it penalizes long-term users who've had a quiet period or took a six-month sabbatical. (The idea of favoring a user who started last week and posted 20 memes over someone who's been around for months or years, and posted every three days except for summer when they don't post at all, sounds bad.) If it's calculated over time, I'd rather it didn't check back farther than six months or so.
Or perhaps that it always measured activity over six months--which skews away from new accounts, which haven't had activity that long.
Also--how would it deal with imported posts? Does it have a way to exclude those from the activity levels?
no subject
We can tell which posts are imported, yes, we can exclude those fairly easily from any metrics that we use.
Agreed with not going back more than a few months.
no subject
I like the random user/random post feature; I've had good conversations because of it, and occasionally made a friend that way. But it's not like anyone (sane) could say, "Hey! This isn't working, and my DW experience is ruined because of that!" It's too weird to matter if it's non-glitchy--but I didn't know if it grabbed users who've posted locked things; it hadn't occurred that it might grab users who posted things that were later locked or deleted.
Short version: Thanks for explanation! My understanding of DW has been enhanced! I continue to be nondismayed at however the randomizer works, or does not work!
I wish I had nice suggestions for how to measure "all [username] recent activity;" it hadn't occurred to me that journal posts, comm posts & comments were stored in such different ways that there's no simple way to count & compare them.
no subject
Then you can design that table to be easily queried. Of course, this extends the project, and runs the risk of MBDing it. If you'd rather keep it simple, then just go by people who are posting in their own journal: much easier.
no subject
It might also be good to be able to randomly give to active communities as well as active users--and number of members/subscribers as well as post and received comment activity might be a good metric there.
no subject
no subject
no subject
no subject
no subject
Then I thought, hmm, I wonder how Sourceforge does project activity rankings... They're comparing several 7-day stats for each project to the highest 7-day activity of all time, over all projects. Definitely some food for thought there.
no subject