dw_dev | where to put functions

Entry tags:

coding

where to put functions

So, I am currently in the first stages of working on http://bugs.dwscoalition.org/show_bug.cgi?id=1574 ( "Last active entries" module) and it's actually going quite nicely :)

However, I am a bit stumped on where to put a function I need. Namely, I need a function that, depending on what journal the module is in and who is viewing it, returns the last x active entries in that journal (maybe not the full Entry objects, though, still working on that).

My first thought was cgi-bin/LJ/User.pm , as it would be good to have $u->get_recent_active($remote) or something like that. But now I am leaning more towards putting it in cgi-bin/DW/Logic/LogItems.pm , which has the functions $u->watch_items and $u->recent_items . However, my idea only semi-fits the module description.

Or maybe it should be in cgi-bin/LJ/S2.pm?

Yes, I am still confused by where to put stuff. Any ideas?

Flat | Top-Level Comments Only

Wow, that's some serious hardcore SQL! Nice!

Something to keep in mind, though, is that we don't really use joins (or subselects):

[dw @ mark-dev - ~/current] -> grep -r "SELECT " cgi-bin | wc -l 558 [dw @ mark-dev - ~/current] -> grep -r " JOIN " cgi-bin | wc -l 3

(That's skewed a lot since it only counts explicit JOINs and not implicit inner joins. But you get the idea...)

There's a lot of reasons not to do joins like that. Database performance can be pretty terrible (in MySQL) when you start doing tough queries like that, it's much hard to memcache the data, and writing (and figuring out/modifying later) the SQL is so hard. (Granted, a lot of this is because MyISAM uses table level locking, so table joins is a quick path to pain. But even InnoDB, in my experience, is subject to deadlock conditions that are no fun when you start getting fancy with the SQL.)

What you should do in a situation like this is break the process down into small steps, and then see how you can do it simply and easily. For example, we already have things that load lots of comments and entries quickly and efficiently on the database, relying on memcache, etc, so the only part you have to write is the code that figures out which comments to load.

This can be a tricky problem, though. There is no efficient way, right now, to find the "10 most recently commented on posts" since we don't have that index. What you can find, however, is the N most recent comments, and see what posts they're on. If you select enough comments from the database you can be reasonably certain you're going to find a few posts.

So, for example, I would implement this function like this:

# called $u->get_rec..blahblah, returns array of LJ::Entry objects, so you can do
# things like 'foreach my $entry ( $u->get_rec... ) { ... }'
#
sub get_recently_commented_on_entries {
    my $u = shift or die 'no user';

    # take arrayref input, return LJ::Entry objects
    my $make_entry_objs = sub {
        return map { LJ::Entry->new( $u, jitemid => $_ ) @{$_[0]};
    };

    # try memcache first
    my $recent = $u->memc_get( 'recently_commented_posts' );
    return $make_entry_objs->( $recent ) if $recent;

    # not in memcache, step load some post ids from comments
    my @ids = $u->selectcol_array(
        q{SELECT nodeid FROM talk2
          WHERE journalid = ? AND state <> 'D'
          ORDER BY jtalkid DESC LIMIT 50},
        undef, $u->id
    );

    # now we have nodeids in @ids, you can find the first five unique values
    # through whatever method seems best to you, here assuming @postids is the
    # output of "find five uniques"
    FIXME: code this

    # now memcache and return the results.  never memcache LJ::Entry objects,
    # just the ids...
    $u->memc_set( 'recently_commented_posts', \@postids );
    return $make_entry_objs->( \@postids );
}

I haven't tested the above, but that should work. It digs up the last 50 comments, finds what entries they were on, picks the first 5 unique entries (first five unique = five newest guaranteed because we ordered by jtalkid), and then returns LJ::Entry objects which you can use to do whatever.

This is easy on the database (one query, no joins, it's memcached), and it's fairly minimal as far as the code goes. Easy to debug and work on.

The drawback to this method is that it's "handwavey". If you have a really popular journal, you might have 50 comments on your most recent post. Selecting only the most recent 50 will end up with the "recently commented on posts" box having only a single entry in it. It's pretty easy to deal with that, though. If you really wanted, you could make that 'SELECT' into a 'SELECT DISTINCT' and change the LIMIT to 5. That guarantees you get five unique entries (if they exist), but it is theoretically unlimited in how many rows it sorts through in the database.

In our case, we know it won't be TOO terrible because we limit the number of comments on a single entry. So, at most, SELECT DISTINCT ... LIMIT 5 will go through 25,000 rows. Which is a lot, but is still much less than 'SELECT ... ORDER BY datepost' on some accounts.

(You also have to make sure that whenever someone comments on an entry, you delete the 'recently_commented_posts' memcache key for the journal they're commenting on. Otherwise, the person viewing the site won't see 'live' data on that box...)

...

I've now rambled on for half an hour. Thoughts, questions, comments, critiques?

[dw @ mark-dev - ~/current] -> grep -r " JOIN " cgi-bin | wc -l
3

Heh, yeah, I tried that, too. Found a few more, though, since there are some lowercase joins ;)

when you start doing tough queries like that, it's much hard to memcache the data, and writing (and figuring out/modifying later) the SQL is so hard.

Memcache, one of the last big misteries left on Earth ;)

If you select enough comments from the database you can be reasonably certain you're going to find a few posts.

Sounds good.

# try memcache first
my $recent = $u->memc_get( 'recently_commented_posts' );
return $make_entry_objs->( $recent ) if $recent;

...

$u->memc_set( 'recently_commented_posts', \@postids );

Ah, now I finally get how *that* works!

In our case, we know it won't be TOO terrible because we limit the number of comments on a single entry. So, at most, SELECT DISTINCT ... LIMIT 5 will go through 25,000 rows. Which is a lot, but is still much less than 'SELECT ... ORDER BY datepost' on some accounts.

*nods* So you think that would be fine? Limit 5 is good (I was planning on either 5 or 10). I'll take another look when it's not 7:30 and I am actually logged into my Dreamhack, but this sounds very reasonable. Thanks for the explanation!

Yeah, SELECT DISTINCT ... LIMIT 5 would be fine. If it turns out to be a heavy load on the database in the future, we can always tune it!

Hmm. Your ideas are intriguing to me and I wish to subscribe to your newsletter...

Okay, I have further questions (and I might comment here occasionally, if that is fine for you?):

SELECT DISTINCT * FROM talk2 WHERE journalid = ? AND state <> 'D' and state <> 'S' ORDER BY jtalkid DESC LIMIT 5;

does the trick, thank you. But only if all entries are public or the user has access to all 5 entries will all of those later show up in the list, as I will obviously discard ones that can't be viewed at a later step. So the list might come up very short for partially locked journals if we start from 5. Anyway, going through all is obviously data-base intensive, so we don't want to do that.

But, as a compromise, would it be possible to either a) pull out ten instead, or b) pull out 5 if the user is owner/maintainer and 10 if the user isn't and then narrow it down to no more than 5 later? I hope you get what I mean here. I'd just hate to see that module only come up with one entry regularly because I do think it's shiny :) [And if I had a biggish community it would totally be a reason to buy a paid community for it]

We can also do it with 5 first, though, and maybe talk about this a while after implementation again?

(You also have to make sure that whenever someone comments on an entry, you delete the 'recently_commented_posts' memcache key for the journal they're commenting on. Otherwise, the person viewing the site won't see 'live' data on that box...)

Okay, that needs to be done manually in talkpost then? memcache is a big mystery for me and I can't test it...

Yvi, lost in code and thought.

Sure, 10 is fine.

AND state <> 'D' and state <> 'S'

Prefer AND state NOT IN ('D', 'S')

Why can't you test memcache?

10 it is then :)

Prefer AND state NOT IN ('D', 'S')

Oh, new syntax, shiny :) Will use.

Why can't you test memcache?

I thought it wasn't set up on the Dreamhacks?

*searches on the Wiki* Oh, I can set it up. Cool. Will contact

sophie about it then.

Yep, and memcache is pretty easy. I'll try to write up a post for

dw_dev_training about how it works, how to use it, when to use it, etc. :)

Flat | Top-Level Comments Only

where to put functions

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject