momijizukamori: Green icon with white text - 'I do believe in phosphorylation! I do!' with a string of DNA basepairs on the bottom (Default)
Cocoa ([personal profile] momijizukamori) wrote in [site community profile] dw_dev2013-06-16 09:25 pm
Entry tags:

Data and query structure for faceted styles search

I have been researching this, and while I've found a lot of articles on how to set up faceted searching using existing search engines (mostly Solr, though a few with Sphinx), and a bunch of articles on frontend design for faceted search.... there is not a lot on optimizing data structure or queries that I can find. And I know basically nothing about code optimization so - tossing this out here!


Data Structure - we already sort of have a faceted data structure for styles in the forms of categories. There is a great big SQL table (don't ask me which one, I don't remember) but what it boils down to is something like this (assuming more categories than we actually have live):

Style ID - Category ID
Style 1 - Color: Red
Style 1 - Accessiblity: light on dark
Style 1 - Base: Practicality
Style 2 - Color: Blue
Style 2 - Base: Modish
Style 2 - Accessibility: Muted
Style 2 - Accessibility: Dark on Light

A lot of the tutorial/design stuff seems to expect a structure where facets are split, ie:

Style ID - Color - Base Layout - Accessibility
Style 1 - Red - Practicality - light on dark
Style 2 - Blue - Modish - Muted

...which then brings about the obvious problem that not all tags within a facet will be mutually exclusive. The second one seems like it might have a slight speed advantage in querying (because queries over a single facet don't need to look at all the data), but I'm not sure how to handle multiple tags on an item within one facet (aka polyheirarchy according to the internet), and I'm not sure if the increase in speed would merit having to rewrite the entire category backend, instead of just writing a new interface for it.



Query Structure - This is the point where I am running on naive newbie programmer logic! Thus, I can think of a simple solution, but I have no idea if the simple solution is the best solution. Basically, facets are a fancy interface for constructing long AND/OR queries. So - write function for searching categories by AND, write one for searching by OR, call in sequence as necessary. I have no idea if this makes sense in the real world.


Tragically the already existing options are 'use Solr', 'use something written on a Java server backend', or 'use a client-side Javascript library (which won't work with JS off and will probably not work on a 1300+ collection, but we didn't stress-test it to see)'
randomling: A wombat. (Default)

[personal profile] randomling 2013-06-17 10:19 am (UTC)(link)

Of course you will! <3 waves pompoms