pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)
Res facta quae tamen fingi potuit ([personal profile] pauamma) wrote in [site community profile] dw_dev2011-09-30 06:36 pm
Entry tags:

Performance of the copying part of the sphinx indexer

Currently, the sphinx indexer works by looping over users and copying news and edited entries into the sphinx database. Those copies are scheduled periodically using bin/schedule-copier-jobs, and also per-user when entries are created or edited. But in the latter case, the copier still searches the whole log2 table for that user's entries instead of only copying the affected entry. Does anyone know what the cost of that is compared to the indexing itself? If it's significant, http://dw-dev.dreamwidth.org/97273.html may also be used to trigger per-entry copying. Anyone has performance/resource use figures, or opinions?
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2011-09-30 10:35 pm (UTC)(link)
sphinx-copier is only comparing times, which is just some SELECT. Yeah, it would be hard on the big accounts, so that's not very ideal. I don't know how "significant" it is, but if you are feeling up to fixing it, go for it.

The reason it was implemented this way is because I didn't feel like depending on TheSchwartz/Gearman to make 100% sure an entry was put into the search index. As implemented, the system self-repairs every time someone posts, so in the long run, it's likely to be nearly 100% accurate.