Cocoa (
momijizukamori) wrote in
dw_dev2013-06-16 09:25 pm
Entry tags:
Data and query structure for faceted styles search
I have been researching this, and while I've found a lot of articles on how to set up faceted searching using existing search engines (mostly Solr, though a few with Sphinx), and a bunch of articles on frontend design for faceted search.... there is not a lot on optimizing data structure or queries that I can find. And I know basically nothing about code optimization so - tossing this out here!
Data Structure - we already sort of have a faceted data structure for styles in the forms of categories. There is a great big SQL table (don't ask me which one, I don't remember) but what it boils down to is something like this (assuming more categories than we actually have live):
Style ID - Category ID
Style 1 - Color: Red
Style 1 - Accessiblity: light on dark
Style 1 - Base: Practicality
Style 2 - Color: Blue
Style 2 - Base: Modish
Style 2 - Accessibility: Muted
Style 2 - Accessibility: Dark on Light
A lot of the tutorial/design stuff seems to expect a structure where facets are split, ie:
Style ID - Color - Base Layout - Accessibility
Style 1 - Red - Practicality - light on dark
Style 2 - Blue - Modish - Muted
...which then brings about the obvious problem that not all tags within a facet will be mutually exclusive. The second one seems like it might have a slight speed advantage in querying (because queries over a single facet don't need to look at all the data), but I'm not sure how to handle multiple tags on an item within one facet (aka polyheirarchy according to the internet), and I'm not sure if the increase in speed would merit having to rewrite the entire category backend, instead of just writing a new interface for it.
Query Structure - This is the point where I am running on naive newbie programmer logic! Thus, I can think of a simple solution, but I have no idea if the simple solution is the best solution. Basically, facets are a fancy interface for constructing long AND/OR queries. So - write function for searching categories by AND, write one for searching by OR, call in sequence as necessary. I have no idea if this makes sense in the real world.
Tragically the already existing options are 'use Solr', 'use something written on a Java server backend', or 'use a client-side Javascript library (which won't work with JS off and will probably not work on a 1300+ collection, but we didn't stress-test it to see)'
Data Structure - we already sort of have a faceted data structure for styles in the forms of categories. There is a great big SQL table (don't ask me which one, I don't remember) but what it boils down to is something like this (assuming more categories than we actually have live):
Style ID - Category ID
Style 1 - Color: Red
Style 1 - Accessiblity: light on dark
Style 1 - Base: Practicality
Style 2 - Color: Blue
Style 2 - Base: Modish
Style 2 - Accessibility: Muted
Style 2 - Accessibility: Dark on Light
A lot of the tutorial/design stuff seems to expect a structure where facets are split, ie:
Style ID - Color - Base Layout - Accessibility
Style 1 - Red - Practicality - light on dark
Style 2 - Blue - Modish - Muted
...which then brings about the obvious problem that not all tags within a facet will be mutually exclusive. The second one seems like it might have a slight speed advantage in querying (because queries over a single facet don't need to look at all the data), but I'm not sure how to handle multiple tags on an item within one facet (aka polyheirarchy according to the internet), and I'm not sure if the increase in speed would merit having to rewrite the entire category backend, instead of just writing a new interface for it.
Query Structure - This is the point where I am running on naive newbie programmer logic! Thus, I can think of a simple solution, but I have no idea if the simple solution is the best solution. Basically, facets are a fancy interface for constructing long AND/OR queries. So - write function for searching categories by AND, write one for searching by OR, call in sequence as necessary. I have no idea if this makes sense in the real world.
Tragically the already existing options are 'use Solr', 'use something written on a Java server backend', or 'use a client-side Javascript library (which won't work with JS off and will probably not work on a 1300+ collection, but we didn't stress-test it to see)'

no subject
pft, well, I knew it was going to be complicated going in. So you've actually been helpful in giving me the language I need to research and figure out exactly what dimensions of 'complicated' I'm looking at.
I do sort of get the idea of running against an index rather than the actual database, though I will confess I am not fully clear on what the distinction is yet.
no subject
(Everything I'm about to say is about solr, but may well also apply to Sphinx, I dunno.) Solr is also easier for webby people to use rather than needing DBAs to build clever SQL queries. It has REST-like APIs, so you can build the search directly in the URL, eg http://dreamwidth.org/stylesearch?q=fluffy&fq=accessibility:ON&fq=color:blue&fq=inheritance:zesty
That being said we probably don't need it for performance reasons; our set of styles is small as database datasets go.
no subject
I feel Educated! Though thankfully I am fairly comfortable with direct SQL queries as a result of working what was a essentially database maintenence at a previous job.
I poked mark on IRC earlier because he is the Master of Scaling and may have a better sense of how hardcore we need to be about efficiency - that is a department I know very little about (other than efficiency = good, but that sometimes it means trade-offs)