pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)
Res facta quae tamen fingi potuit ([personal profile] pauamma) wrote in [site community profile] dw_dev2009-12-25 08:41 pm
Entry tags:

More brainstorming on stats: payment stats

So I'm still chipping away at bug 124, and I've reached:
* Number of payments in last 1d/2d/5d/7d/1m/3m/1y
-- broken down by which payment level/payment item chosen
-- xxx and divided into new payments vs. renewals
-- xxx and expressed as a dollar amount taken in during that time
Now the first thing to figure out is where and how payments are stored in the database, and whether (and how) I can use the stored data for my purposes.

Since I need to split data by what was ordered, I need to use shopping carts (table shop_carts, module DW::Shop::Cart), since that's where that information is. However, shopping carts don't tell me when payment was received, only when the cart was opened/checked out/whatever. For payment by check or money order, the difference could be substantial (several days to over 1 week), esp. if you want to wait for checks to clear until you record the payment. Since this is longer than the periods over which we want stats, it won't work, so I either need to base the stats on the order date instead of the payment date, or to use something else in addition to cart data.

It's tempting to use pp_trans (see DW::Shop::Engine::PayPal), but that's only for PayPal payments (as far as I can tell), so it won't work for that purpose, since there are no equivalents for other payment methods (credit card, check, money order). Plus, I just care when the payment was received, not how, so I don't really want to get into that, with the likely maintenance nightmare that would come when another payment method is added, or worse, when payment for a given cart may be split between methods (hello, gift certs).

So it looks like I need to add some data to shopping carts. All I need at that stage is a single timestamp for the last change of state, which could be indexed, which is nice (I can retrieve carts for which payment was received (state is PAID or PROCESSED - see $STATE_* in DW::Shop) in the last (mumble) days. For existing carts that won't have that timestamp, I can either set it from the creation timestamp or use other means, such as scavenging through pp_trans. I don't really need to keep a complete history of state changes, and I can't think of a use case for one (but then, I'm not an accountant, not a lawyer, and not a developer of financial/accounts receivable/ecommerce software).

Anyone wants to laugh at/tear into the above, or suggest alternatives or stuff I overlooked?
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2009-12-26 01:22 am (UTC)(link)
Hmmm, I haven't put a LOT of thought into yet, but wanted to toss this out here:

Have you considered just making the payment system have some hooks when it changes state? Every state change makes a call to the 'state' method on the DW::Shop::Cart object, just put in a hook 'shop_cart_state_changed' and then you can process it in the stats system fairly easily.

I think for the purposes of this statistic, the only thing you care about is transitions into the PAID state. All other transitions don't matter for this particular statistic, since if we refund something we want to have that count separately. Reversals are rare, so for now at least, let's only care about when something becomes PAID.

If you throw the hook in, you can probably implement this stat in a few minutes. It would be ideal if there's some way to go back and find all PAID carts and backdate the stats, but that may not be very possible. (I'm happy to write that script too, actually, since it's just a one-off.)
thorfinn: <user name="seedy_girl"> and <user name="thorfinn"> (Default)

[personal profile] thorfinn 2009-12-29 11:50 pm (UTC)(link)
I haven't looked at the code, so some of this might not be relevant at all to the particular circumstances. Sorry if it isn't!

1. Keep every state transition (keep every datum that might conceivably be useful). Disk is cheap. Too much data can always be sliced before analysis.

2. I have no idea where the data is stored, but don't do statistical analysis by making SQL queries, assuming it's stored in a database. Dump the data you need into csv/tsv as a point-in-time snapshot, then process it as a bulk lot.

3. Relating to 2, if you need to dump multiple tables, then you're going to need some way to snapshot the database for processing, assuming there is relational data you need to match up.
thorfinn: <user name="seedy_girl"> and <user name="thorfinn"> (Default)

[personal profile] thorfinn 2009-12-29 11:55 pm (UTC)(link)
Realised I didn't explain the reason for 2.

Databases are really poor at statistics collation - they're not designed for it, and performance will be awful and likely to impact other things the database is doing. May not matter on a small dataset, but the dataset is only likely to grow.
thorfinn: <user name="seedy_girl"> and <user name="thorfinn"> (Default)

[personal profile] thorfinn 2010-01-05 02:03 am (UTC)(link)
Ah, good, re: 2 and 3. :-) It's the most common bad thing I see developers do when they try to aggregate statistics (do it in the database instead of outside), so I thought I'd mention it.

As for 1, yeah, that's pretty much why I recommend the approach of "log as much as possible". Disk is cheap, and you can always slice data for analysis if you have too much, but you can't make up for data you don't have.