Entry tags:
state of production: software!
For the curious... there's actually a lot more that goes into running a production setup than just running the Dreamwidth code. In particular, here's a list of the software we're using to manage the infrastructure:
* Puppet, a configuration management system. This software is responsible for installing packages, updating configuration files, and basically keeping all of the production machines in sync. Most of the work here has been done by
xenacryst.
* Cacti, a performance/graphing system. Cacti is great, you can configure your servers and tell it to start graphing. It's actually fairly intense to setup (took me a dozen hours or more to get it working for our setup), but once you get it going it's amazing. We have graphs of bandwidth (internal and external), CPU/disk/memory usage, even non-system things such as Perlbal requests per second, how many items are in each of the memcached instances, and the replication lag in MySQL.
* Nagios, the gold standard in monitoring and alerting. This is the software you will hear me cursing at 3AM because it has found a failure in some part of the infrastructure and started paging me. Oh yes, there will be cursing. Generally, Nagios is a tool that does one thing really well: keep an eye on things, make sure they're up and running, and tell someone if they're not.
These tools are fairly standard in the industry. I've used all of them at previous jobs and have gotten fairly familiar with their ins and outs. As always, the configuration we're using is available in our source repository:
http://hg.dwscoalition.org/dw-ops
If you are particularly interested in this end of the system, in the esoteric details that go into running a production cluster, let me know. I'm looking for a few people who like this sort of thing and who are wanting to help make sure that our servers are the best they can be. :)
* Puppet, a configuration management system. This software is responsible for installing packages, updating configuration files, and basically keeping all of the production machines in sync. Most of the work here has been done by
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
* Cacti, a performance/graphing system. Cacti is great, you can configure your servers and tell it to start graphing. It's actually fairly intense to setup (took me a dozen hours or more to get it working for our setup), but once you get it going it's amazing. We have graphs of bandwidth (internal and external), CPU/disk/memory usage, even non-system things such as Perlbal requests per second, how many items are in each of the memcached instances, and the replication lag in MySQL.
* Nagios, the gold standard in monitoring and alerting. This is the software you will hear me cursing at 3AM because it has found a failure in some part of the infrastructure and started paging me. Oh yes, there will be cursing. Generally, Nagios is a tool that does one thing really well: keep an eye on things, make sure they're up and running, and tell someone if they're not.
These tools are fairly standard in the industry. I've used all of them at previous jobs and have gotten fairly familiar with their ins and outs. As always, the configuration we're using is available in our source repository:
http://hg.dwscoalition.org/dw-ops
If you are particularly interested in this end of the system, in the esoteric details that go into running a production cluster, let me know. I'm looking for a few people who like this sort of thing and who are wanting to help make sure that our servers are the best they can be. :)
no subject
I've been a Linux sysadmin for the last 6 or so years; currently I write about being a Linux sysadmin instead (see links at http://the.earth.li/~juliet/ though one of my regular outlets is print-only/mostly). I ran Nagios & Puppet on my systems at my old job; haven't met Cacti before but I liked the graphs you linked me to yesterday :)
I'd be interested in helping out a bit with the SA stuff; although I'm kind of enjoying coding, as I haven't done that much (other than the practical SA sort of coding) for a while.
no subject
Yay, Puppet!
no subject