Entry tags:
performance tutorials?
I'm really interested in learning more about what kind of performance things we need to keep in mind while coding, and how to design for a well-behaved high-availability web app. I know the very basics, like using memcache whenever possible and using direct assignments instead of shifts, but not much else!
So, is anyone interested in writing tutorials on things to keep in mind when coding stuff for performance factors? As we grow, this is going to start to be stuff we should keep in mind, and I think it's a great opportunity for us all to share knowledge and learn together. No matter how much you know, even if you think it's not much, it's probably more than someone else (ie, me) knows...
So, is anyone interested in writing tutorials on things to keep in mind when coding stuff for performance factors? As we grow, this is going to start to be stuff we should keep in mind, and I think it's a great opportunity for us all to share knowledge and learn together. No matter how much you know, even if you think it's not much, it's probably more than someone else (ie, me) knows...
no subject
I'm not sure how DW handles this, but at $werk we use a caching layer called Redis that has a lot more features and robustness than Memcache or Membase. We've written a queueing mechanism for it that allows workers to queue jobs in the caching layer, and it handles parallel jobs such that any item taken out of the queue has a time limit before it can be taken out of the queue again if it's not deleted. Several workers can then grab these entries in parallel.
For developing for scale, some issues that should definitely be discussed are:
- race conditions and db row locking
- managing bulk operations as a transaction -- instead of doing individual commits, commit when the job is complete, unless you have a huge number of records, then commit every so many records. This potentially allows for rollback if something goes awry and is especially useful if updating several tables at once.
- splitting things up into discoverers and workers
- leveraging the caching layer for things like keeping track of the number of active workers and queuing
- logging (Log4Perl is a great module for this)
- error handling (use eval, try catch, Autodie)
no subject
no subject
Benchmark in Production, then work out what you need to optimise after that.
I would be extremely leery of recommending "coding for performance" - it is far too easy to make bad sub-optimal decisions that do not actually reflect the reality of usage.
It's especially easy to make sub optimal code "for performance" that is really hard to maintain...
And arguably, maintenance cost trumps computation cost, always. Human time is a very limited resource that does not really increase, whereas computer time follows Moore's law.
That said... one should put *some* thought into whether you think the code in question is likely to be a performance hotspot or not, and the scale of the data/throughput/whatever is likely to actually have performance effects.
no subject
no subject