Entry tags:
where to put functions
So, I am currently in the first stages of working on http://bugs.dwscoalition.org/show_bug.cgi?id=1574 ( "Last active entries" module) and it's actually going quite nicely :)
However, I am a bit stumped on where to put a function I need. Namely, I need a function that, depending on what journal the module is in and who is viewing it, returns the last x active entries in that journal (maybe not the full Entry objects, though, still working on that).
My first thought was cgi-bin/LJ/User.pm , as it would be good to have $u->get_recent_active($remote) or something like that. But now I am leaning more towards putting it in cgi-bin/DW/Logic/LogItems.pm , which has the functions $u->watch_items and $u->recent_items . However, my idea only semi-fits the module description.
Or maybe it should be in cgi-bin/LJ/S2.pm?
Yes, I am still confused by where to put stuff. Any ideas?
However, I am a bit stumped on where to put a function I need. Namely, I need a function that, depending on what journal the module is in and who is viewing it, returns the last x active entries in that journal (maybe not the full Entry objects, though, still working on that).
My first thought was cgi-bin/LJ/User.pm , as it would be good to have $u->get_recent_active($remote) or something like that. But now I am leaning more towards putting it in cgi-bin/DW/Logic/LogItems.pm , which has the functions $u->watch_items and $u->recent_items . However, my idea only semi-fits the module description.
Or maybe it should be in cgi-bin/LJ/S2.pm?
Yes, I am still confused by where to put stuff. Any ideas?
no subject
And how are you implementing this? Where are you pulling the data from?
no subject
Great, thanks!
And how are you implementing this? Where are you pulling the data from?
Latest comments from talk2 and security from log2. I hope that's okay, since they are both on the user's cluster?
no subject
mysql> explain select journalid, nodeid, datepost
from talk2 where journalid = 35 order by datepost desc limit 5\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: talk2
type: ref
possible_keys: PRIMARY,journalid
key: PRIMARY
key_len: 4
ref: const
rows: 36862
Extra: Using where; Using filesort
1 row in set (0.00 sec)
If it does, you can't use it. See that "using filesort" bit? Yeah, that means it's doing an awkwardly large scan of data: for this query, every single comment on a journal has to be loaded from disk and then sorted.
As a general rule, you must have an index on the data you're searching and ordering on. In this case, there's no index that covers datepost so there's no way to do a "sorted by time" search of the comments using talk2.
I can't think of any other "easy" way of doing it, either. Sorting by the logprop "commentalter" won't work for similar reasons (you'd have to load every logprop for every entry and sort).
Only/best way to do it is probably going to involve very painful ALTER TABLE statements on log2 to add (commenttime) and then INDEX (journalid, commenttime) and then doing your sorts based on that... but that's pretty painful. (But probably the best way...)
no subject
my $lrow = $logrow{"$u->{userid} $r->{nodeid}"} ||= LJ::get_log2_row($u, $r->{'nodeid'});
But, basically, any suggestions welcomed. I have an early version of an SQL query so I can see what I actually need, how to sort it, etc., but I am far from set on actually *using* it.
no subject
SELECT * FROM
(SELECT talk2.*, logtext2.subject FROM talk2 INNER JOIN log2 INNER JOIN logtext2
ON logtext2.journalid=log2.journalid and talk2.journalid=logtext2.journalid and talk2.nodeid=log2.jitemid and log2.jitemid=logtext2.jitemid
WHERE logtext2.journalid=$userid AND talk2.state<>"D" AND talk2.state<>"S" ORDER BY talk2.datepost DESC) innertable
GROUP BY innertable.nodeid ORDER BY innertable.datepost DESC LIMIT 10;
[Yes, three-table join *winces* - I would love to have the subject from logtext2 without getting the whole Entry object first - but that can definitely be substituted!]
I tried with just substituting datepost with jtalkid and for my test case (http://activeentries.yvi.hack.dreamwidth.net/) the results it shows are fine and in the correct order. I am not sure about the nested table thing, though, which I found is necessary since I only want one comment for each post and only the most recent one and GROUP BY and ORDER BY don't play nicely together :(. Is that a problem?
And reading back over it it's probably really horrible and if I shouldn't use any joins I'll be more than happy to do without. This was more for me testing how the databases work together and where what is stored. recent_comments.bml does have some stuff that might work otherwise.
I'll shut up now.
no subject
no subject
no subject
Something to keep in mind, though, is that we don't really use joins (or subselects):
[dw @ mark-dev - ~/current] -> grep -r "SELECT " cgi-bin | wc -l
558
[dw @ mark-dev - ~/current] -> grep -r " JOIN " cgi-bin | wc -l
3
(That's skewed a lot since it only counts explicit JOINs and not implicit inner joins. But you get the idea...)
There's a lot of reasons not to do joins like that. Database performance can be pretty terrible (in MySQL) when you start doing tough queries like that, it's much hard to memcache the data, and writing (and figuring out/modifying later) the SQL is so hard. (Granted, a lot of this is because MyISAM uses table level locking, so table joins is a quick path to pain. But even InnoDB, in my experience, is subject to deadlock conditions that are no fun when you start getting fancy with the SQL.)
What you should do in a situation like this is break the process down into small steps, and then see how you can do it simply and easily. For example, we already have things that load lots of comments and entries quickly and efficiently on the database, relying on memcache, etc, so the only part you have to write is the code that figures out which comments to load.
This can be a tricky problem, though. There is no efficient way, right now, to find the "10 most recently commented on posts" since we don't have that index. What you can find, however, is the N most recent comments, and see what posts they're on. If you select enough comments from the database you can be reasonably certain you're going to find a few posts.
So, for example, I would implement this function like this:
I haven't tested the above, but that should work. It digs up the last 50 comments, finds what entries they were on, picks the first 5 unique entries (first five unique = five newest guaranteed because we ordered by jtalkid), and then returns LJ::Entry objects which you can use to do whatever.
This is easy on the database (one query, no joins, it's memcached), and it's fairly minimal as far as the code goes. Easy to debug and work on.
The drawback to this method is that it's "handwavey". If you have a really popular journal, you might have 50 comments on your most recent post. Selecting only the most recent 50 will end up with the "recently commented on posts" box having only a single entry in it. It's pretty easy to deal with that, though. If you really wanted, you could make that 'SELECT' into a 'SELECT DISTINCT' and change the LIMIT to 5. That guarantees you get five unique entries (if they exist), but it is theoretically unlimited in how many rows it sorts through in the database.
In our case, we know it won't be TOO terrible because we limit the number of comments on a single entry. So, at most, SELECT DISTINCT ... LIMIT 5 will go through 25,000 rows. Which is a lot, but is still much less than 'SELECT ... ORDER BY datepost' on some accounts.
(You also have to make sure that whenever someone comments on an entry, you delete the 'recently_commented_posts' memcache key for the journal they're commenting on. Otherwise, the person viewing the site won't see 'live' data on that box...)
...
I've now rambled on for half an hour. Thoughts, questions, comments, critiques?
no subject
3
Heh, yeah, I tried that, too. Found a few more, though, since there are some lowercase joins ;)
when you start doing tough queries like that, it's much hard to memcache the data, and writing (and figuring out/modifying later) the SQL is so hard.
Memcache, one of the last big misteries left on Earth ;)
If you select enough comments from the database you can be reasonably certain you're going to find a few posts.
Sounds good.
# try memcache first
my $recent = $u->memc_get( 'recently_commented_posts' );
return $make_entry_objs->( $recent ) if $recent;
...
$u->memc_set( 'recently_commented_posts', \@postids );
Ah, now I finally get how *that* works!
In our case, we know it won't be TOO terrible because we limit the number of comments on a single entry. So, at most, SELECT DISTINCT ... LIMIT 5 will go through 25,000 rows. Which is a lot, but is still much less than 'SELECT ... ORDER BY datepost' on some accounts.
*nods* So you think that would be fine? Limit 5 is good (I was planning on either 5 or 10). I'll take another look when it's not 7:30 and I am actually logged into my Dreamhack, but this sounds very reasonable. Thanks for the explanation!
no subject
no subject
no subject
SELECT DISTINCT * FROM talk2 WHERE journalid = ? AND state <> 'D' and state <> 'S' ORDER BY jtalkid DESC LIMIT 5;
does the trick, thank you. But only if all entries are public or the user has access to all 5 entries will all of those later show up in the list, as I will obviously discard ones that can't be viewed at a later step. So the list might come up very short for partially locked journals if we start from 5. Anyway, going through all is obviously data-base intensive, so we don't want to do that.
But, as a compromise, would it be possible to either a) pull out ten instead, or b) pull out 5 if the user is owner/maintainer and 10 if the user isn't and then narrow it down to no more than 5 later? I hope you get what I mean here. I'd just hate to see that module only come up with one entry regularly because I do think it's shiny :) [And if I had a biggish community it would totally be a reason to buy a paid community for it]
We can also do it with 5 first, though, and maybe talk about this a while after implementation again?
(You also have to make sure that whenever someone comments on an entry, you delete the 'recently_commented_posts' memcache key for the journal they're commenting on. Otherwise, the person viewing the site won't see 'live' data on that box...)
Okay, that needs to be done manually in talkpost then? memcache is a big mystery for me and I can't test it...
Yvi, lost in code and thought.
no subject
AND state <> 'D' and state <> 'S'
Prefer AND state NOT IN ('D', 'S')
Why can't you test memcache?
no subject
Prefer AND state NOT IN ('D', 'S')
Oh, new syntax, shiny :) Will use.
Why can't you test memcache?
I thought it wasn't set up on the Dreamhacks?
*searches on the Wiki* Oh, I can set it up. Cool. Will contact
no subject