mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)
Mark Smith ([staff profile] mark) wrote in [site community profile] dw_dev2009-04-06 12:20 am
Entry tags:

state of production: software!

For the curious... there's actually a lot more that goes into running a production setup than just running the Dreamwidth code. In particular, here's a list of the software we're using to manage the infrastructure:

* Puppet, a configuration management system. This software is responsible for installing packages, updating configuration files, and basically keeping all of the production machines in sync. Most of the work here has been done by [personal profile] xenacryst.

* Cacti, a performance/graphing system. Cacti is great, you can configure your servers and tell it to start graphing. It's actually fairly intense to setup (took me a dozen hours or more to get it working for our setup), but once you get it going it's amazing. We have graphs of bandwidth (internal and external), CPU/disk/memory usage, even non-system things such as Perlbal requests per second, how many items are in each of the memcached instances, and the replication lag in MySQL.

* Nagios, the gold standard in monitoring and alerting. This is the software you will hear me cursing at 3AM because it has found a failure in some part of the infrastructure and started paging me. Oh yes, there will be cursing. Generally, Nagios is a tool that does one thing really well: keep an eye on things, make sure they're up and running, and tell someone if they're not.

These tools are fairly standard in the industry. I've used all of them at previous jobs and have gotten fairly familiar with their ins and outs. As always, the configuration we're using is available in our source repository:

http://hg.dwscoalition.org/dw-ops

If you are particularly interested in this end of the system, in the esoteric details that go into running a production cluster, let me know. I'm looking for a few people who like this sort of thing and who are wanting to help make sure that our servers are the best they can be. :)
juliet: My old PowerBook in pieces all over the desk (tech mac insides)

[personal profile] juliet 2009-04-06 02:39 pm (UTC)(link)
/me sticks a hand up.

I've been a Linux sysadmin for the last 6 or so years; currently I write about being a Linux sysadmin instead (see links at http://the.earth.li/~juliet/ though one of my regular outlets is print-only/mostly). I ran Nagios & Puppet on my systems at my old job; haven't met Cacti before but I liked the graphs you linked me to yesterday :)

I'd be interested in helping out a bit with the SA stuff; although I'm kind of enjoying coding, as I haven't done that much (other than the practical SA sort of coding) for a while.
exor674: Computer Science is my girlfriend (Default)

[personal profile] exor674 2009-04-07 12:27 pm (UTC)(link)
/me goes "Me Me Me Me Me Me!!!!!"
eagle: Me at the Adobe in Yachats, Oregon (Default)

Yay, Puppet!

[personal profile] eagle 2009-04-11 03:16 am (UTC)(link)
It's great to see that you guys are using Puppet. I'm not at all sure yet how much time I'm going to have to help out, but we use Puppet very extensively and have funded some of Luke's development work on it. If you have any weird Puppet problems, I'll try to help out as I have time.
jamie: bitter panda saying not quite zen (community)

[personal profile] jamie 2009-04-11 08:19 am (UTC)(link)
I'm mostly a windows sysadmin but have done my share of hacking around in Nagios and have done 24/7 system support. If you need more ops/infrastructure people I'm happy to pitch in.