Entry tags:
Stats: Active accounts by payment level
One of the stats we need to provide is active accounts by account level (free, paid, premium, seed). Currently, the active accounts come from a per-cluster table with the user id as primary key and the time of last activity, in which users are added/updated whenever they do something that makes them active.
DW::StatData::ActiveAccounts handle collecting those stats, basically running something like the following request to retrieve the number of accounts active in the last 30 days:
To avoid that, the obvious solution is to add the account level to that table. However, this could cause discrepancies if the account level changes after the latest activity. For instance, if a paid account expires after the last activity, it would still be counted as "active paid" in the breakdown of active accounts, but free in the accounts by level stats, possibly leading to more accounts counted as "active paid" than as "paid". The only way to fix that is to have the account level transitions (paid account expiring, payment received, etc...) "correct" the table, but that gets rather ugly as the number of stats that needs a breakdown by account level goes up (and may not be possible for some stats at all, depending on how they're collected). Also, it can lead to biased data. For instance, if a free account gets paid for the first time after the last activity but before the stats are collected, it will be counted as paid even though all its activity was as a free account.
Adding the account level to the primary key (and thus, keeping separate activity records for each user id) isn't feasible, because that field would be null for existing activity records, which makes it unsuitable for a primary key. In addition, that wouldn't really help, because we'd still need to choose between activity records for the same account, lest we count it twice (or more). If having (according to stats) more active paid accounts than paid accounts (or than active accounts) makes stats look unreliable, think how much worse it would be to have more active paid accounts than total accounts. (Can you hear the "DW is inflating its stats" cries from here? :-) )
Anyway: the best compromise (IMO) seems to capture the account level in the activity table at the time that table is updated, and to leave it unchanged afterward. Opinions? Questions? Blindingly obvious solutions that I missed?
DW::StatData::ActiveAccounts handle collecting those stats, basically running something like the following request to retrieve the number of accounts active in the last 30 days:
SELECT COUNT(*) FROM clustertrack2 WHERE timeactive > UNIX_TIMESTAMP()-30*86400That table doesn't include the account level, because (so far) it doesn't need to. This means that to break down active accounts by account level, I'd need to retrieve the complete list of active accounts and check the account level for each, which is much more costly than the current SQL.
To avoid that, the obvious solution is to add the account level to that table. However, this could cause discrepancies if the account level changes after the latest activity. For instance, if a paid account expires after the last activity, it would still be counted as "active paid" in the breakdown of active accounts, but free in the accounts by level stats, possibly leading to more accounts counted as "active paid" than as "paid". The only way to fix that is to have the account level transitions (paid account expiring, payment received, etc...) "correct" the table, but that gets rather ugly as the number of stats that needs a breakdown by account level goes up (and may not be possible for some stats at all, depending on how they're collected). Also, it can lead to biased data. For instance, if a free account gets paid for the first time after the last activity but before the stats are collected, it will be counted as paid even though all its activity was as a free account.
Adding the account level to the primary key (and thus, keeping separate activity records for each user id) isn't feasible, because that field would be null for existing activity records, which makes it unsuitable for a primary key. In addition, that wouldn't really help, because we'd still need to choose between activity records for the same account, lest we count it twice (or more). If having (according to stats) more active paid accounts than paid accounts (or than active accounts) makes stats look unreliable, think how much worse it would be to have more active paid accounts than total accounts. (Can you hear the "DW is inflating its stats" cries from here? :-) )
Anyway: the best compromise (IMO) seems to capture the account level in the activity table at the time that table is updated, and to leave it unchanged afterward. Opinions? Questions? Blindingly obvious solutions that I missed?
no subject
r
PS
Miss you!
no subject
no subject
Not that I'm a dev - the "codiest" thing I've done so far is write CSS that's being turned into a journal theme by
no subject
no subject