Flutterby™! : A guide to systems for syadmins? 2010-02-03 23:32:56.30178+00

A guide to systems for syadmins?

2010-02-03 23:32:56.30178+00 by Dan Lyke 12 comments

I've got a problem. I'm on the City of Petaluma Technology & Telecommunications Advisory Committee. One of the things that I'm interested in is promoting more open data. The folks in the city's IT department are strong believers too, every time I go poking through DataSF or PortlandMaps I get to thinking "heck, Petaluma has more of their data online".

One of the problems, however, is that a lot of the data is hosted in ways that aren't necessarily easy to mine. Petaluma buys a lot of drop-in software, some of it is hosted elsewhere. Often the problem with getting data out of the city is that the processes aren't computerized, or where they are they're not published in an easy to mine format.

I'm starting to write some Perl to convert what is available online to a form that I hope mine some of that data. I've written some bots to datamine Accela Citizen Access, draw building permits on a map. I've extracted voting records from Granicus hosted meeting minutes, so that I can look for voting patterns (oddly, I haven't gotten much excitement for that data set...). The latest hack, and one that might actually have some interest, takes meeting announcements and dumps them into Twitter.

Tim at the city is, rightfully, reluctant to take these hacks under his wing. He's got limited budget and limited staff and doesn't want to take on more responsibility without an economic justification or a clear directive from the town council. He's also worried about my projects pushing support load over on his department; the city's gotten phone calls requesting support and clarification and such that stemmed from a third party pothole tracking database. And they're a Microsoft shop, which means that they're not always able to be out there on the bleeding technology edge.

The upshot is: I'm happy to host and run these on my own machines. I want to make these processes as reliable as possible so that I'm an ally to Tim, Trae et al, rather than yet another yahoo spawning support calls for systems that aren't theirs. If I can build reliable systems, I can build trust and work towards getting some publicity on the city web page. However, I'm not a sysadmin. I'm a coder with a different sort of release process who sometimes dabbles in continuous deployment and sysadminnery as a hobby.

Any of you sysadmins out there have suggestions on some reading for building and running robust systems? How should I set up test procedures so that I catch when apt-get upgrade causes Moose to decide that name is now a reserved member name and some obscure Perl script that I never thought to check the return from because it's been working flawlessly and has no real failure mode decides to spew trash? What's the best practice for a process that deals with an email that may only be sent once a month, making the test cycle rather long? How do I best track all the cron jobs and interconnected email addresses and web spiders so that when I'm moving some system to another machine I get everything?

Does this book exist? Or is that list about the best run-down on where to start building such systems you know of?

This topic is something I've been thinking about a lot recently, trying to bootstrap a Linux environment at $NEW_WORK. I think you need to decompose the problem into (at least) two parts.

Part the first: being able to reproducibly build identical systems, whether bare metal or virtual instances. For bare metal stuff, I'm looking at Cobbler, which is really just a convenient wrapper around Kickstart, PXE, DHCP, et al., that lets you do network installs. For virtual machines, it'll probably be the same thing with a bit of up-front shell scripting to actually create the disk image and "boot" the "machine".

From past experience, you want to keep the Kickstart layer extremely thin -- just enough to get the machine up and booted to the point where it can kick off your configuration management layer. I'm a fan of Puppet, although I also hear people say nice things about Chef. If you get this set up to the point where it runs automatically at the end of your install procedure, you can just put all your configuration management stuff into a revision control system and manage stuff thru there. Bit of trouble to set up; really nice once it's done.

Part the second: managing your software deployments. It sounds like you're using Perl, so the first step is building/packaging your own dedicated Perl tree. I usually do /opt/perl/, but YMMV. I would NOT use the system Perl; there's too much chance of a routine software upgrade causing weird difficult-to-track-down issues. I wish I had a good pointer to making a custom Perl package, but I don't, and I haven't written up my notes yet (I'll try to, when I get to this point). The easiest thing is probably to get the Perl package for the distro you're using, rip it open, and start munging.

The second half is managing CPAN deps. There are a couple of options in this area, and I haven't played with either of them enough to have a strong opinion. Option #1 is to use local::lib and then just include the modules you're using in your source tree/revision control system. Option #2 is to use the standard Makefile.PL way of listing the stuff you depend on, then deploy via the standard make/make install cycle, but pointed at a DPAN instead of the standard CPAN. Then you manage what versions are in the DPAN, and presumably have that in a separate repo in your RCS. brian d foy just put a presentation about the DPAN idea.

I'd be very interested in hearing about whatever you come up with, so please post something about the solution you come up with...

Dan, I think you're on the right track. I've been pushing for this kind of thing (mirrored, virtualized dev & test environments, standardized/packaged deployments, etc.) here at work for the last 5 years or so. We've come close a couple of times, but unfortunately, the only developers we have left are the ones who were never really on board with it.

The biggest suggestion I'd make it to make sure you're communicating closely with the techs/admins that are going to use/maintain the system, but it sounds like you're already doing that. Awhile back I wrote several apps that logged errors to the EventLog (for those not familiar; the standard logging system on Windows) only to find out that the server admins almost never looked there. (WTF?) Instead, they eventually asked me to create a database to write log info to.

#Comment Re: made: 2010-02-05 11:22:25.467706+00 by: meuon [edit history]

scp476, you are at the hard core level we had back in the early ISP days. I especially remember:"Never upgrade anything unless it's a security exploit that affects your setup".. and good advice about the e-mails and logs, I just do the email config so fast I don't even thnk about it.

My approach is to install as base a system as possible (using Linux, basically the stuff you find under /bin, /sbin, /usr/bin and /usr/sbin) and for anything else I'm running, installing from source if possible, and only those services actually needed.

Why source? Because 1) if there's a security hole in some package, I don't want to wait for the distribution to make a patch (DeadHat I'm looking at you! Also, Gentoo! You moved the damned repositories on me! No cookie for you!) and most packages today are just a "configure;make;make install" away.

Second, sanitize *every* input. Make sure that whatever comes in is never executed (Windows, I'm looking at you!).

Never upgrade anything unless it's a security exploit that affects your setup, or there's a very compelling reason to upgrade (functionality you can't live without). Upgrading for upgrading's sake is foolish and prone to breakage.

Syslog is your friend. Make sure everything logs through syslog (possible exception: Apache non-error logs) and generate a system that scans through syslog output (I just finished writing a syslog that's scriptable via Lua, if you want a copy, just ask; it's very flexible and I have a few systems using it that send emails when certain conditions are met).

Make sure that root's email goes to a valid account and that it's regularly checked. I have all our servers send root email to one account (mine, but behind an alias) and I use procmail to sort (and delete) the email as it comes in. Two approaches---delete stuff that's useless, or search for the bad stuff and delete everything else (I do the former just in case).

Actually, now that I think about it, check out Infrastructures.Org—they have some good advice there.

what you just described is enough work that two or three servers worth is calculated to be an FTE worth at > $50k per year. I tend to code simply enough that normal updates are not an issue. I see being reliant on non-standard libraries as a weakness (and yet I use libserial for Perl.. sigh..)

but I think it comes down to the philosophies of server admining and data architecture. What works for one, rarely works for another.

I'm more worried about the sysadmin-ish stuff than the coding and data interchange stuff. In some cases, I'm looking at piping PDFs through ImageMagick to gocr, so I'll find a way to get the data out if I can get at it somehow. The question isn't how to stuff a computer on a serial printer to grab the output, it's about ways to make sure that that connection works for longer than the day I muck with it.

Seems like I probably need to set up git on my Perl library directories so that I can back out library updates. Anyone already written the glue scripts to tie that into CPAN? how about the PHP and Python sort of libraries?

I also need to run a virtual machine that mirrors what's on my physical machine so that I can do deployment testing. I'll have to see how that can be done, and what I can do to mirror changes: when I install packages on my staging virtual image, can I then just dump the dpkg output and pipe it into apt-get on the deployment machine, or do I have to do some smarter delta and order tracking? Has someone written that?

Again, it's not the physical extraction of the data, or even writing robust code to parse and reformat that data. It's adminning my systems so that I can write code, test it, and run it through a fairly automated staging, testing and deployment system without breaking things, and so that when something does break I can confidently call up Tim and say "you broke my system" (presumably by changing URLs or data formats or whatever), and have a reasonable scheme for disaster recovery (understanding that I don't need multiple hot spare servers, just the ability to rebuild the server easily in a day or two).

Nobody's gonna die if this system fails, but if I can run a system that's reliable and relatively hassle free for me then I'm going to make life easier for all involved.

#Comment Re: made: 2010-02-04 10:25:42.718675+00 by: meuon [edit history]

I woke up thinking about what you were asking about: Sysadmin-ing. It's something I still fake being, and I'm pretty good at it. My short version: Virtualization is good for taking snapshots and backups, but can slow down a database server on minimal hardware. Answer: Quad Cores and Ram. I prefer running on bare metal, but I have to run some real-time serial interfaces on USB (hardware encryption modules) that don't like virtualization, neither does Asterisk (although people do it). My thoughts:

Being a researcher who specifically looks at things like data.gov, I can tell you that you are just scratching the surface. Nat Torkington over at O'Reilly just blogged about similar issues. Giving out data isn't easy because:

If you want a jump start, give Andrew or Sean over at GeoCommmons/FortiusOne a tap. They have been developing a really nice (buzzword warning) cloud-based portal for local communities to get the data out. Their GeoCommons data visualization platform is quickly becoming the gold standard in getting open data out in a usable fashion. Of course, Andrew has a PhD in rocket science and Sean was smart enough to hire him ;)

This is also the purview of LinkedData - the latest incarnation of TBL's brain child, the Semantic Web. The basic idea is to convert raw data into triples and publish it via URIs - leaving the whole "ontology" bit out. I can see how, one day, even Microsoft will support providing an RDF store in SQL Server.

Dan, I've got a lot of thoughts on this but I just spent three hours pulling cables out of a hellhole of a datacenter I inherited with this new job.

If I don't get back here within 48 hours or so, bug me via email or twitter or whatever.

Extremely short version: virtualization, kickstart (or something like cobbler) and puppet, plus a moderate amount of tuits putting it all together.

#Comment Re: made: 2010-02-04 01:19:55.714892+00 by: meuon [edit history]

Innovative: includes a 1 minute lecture on why the format of their excel spreadsheet is nearly impossible to parse, how to reformat it it logically.. and 10 lines of perl can then parse the CSV export and import the data.

Dept-B exports the processed data as an XLS and e-mails it to Dept-A whom rekeys it into Dept-A's systems.

But the real answer was finally getting permission to allow Dept-A just enough access to Dept-B's system to enter it in and get the results themselves.

Dan, remember capturing the data from the serial printers in hospitals? That was magic that: A. I now know what it is really worth. B. Could probably make a business out just of such services.

Chortle. The city folks already admit that they're hampered by the fact that they're an all Microsoft shop, and thus they can't run a lot of the innovative stuff.

Which is all the more reason for me to do something with good testing systems and best practices, so that if we ever migrate processes to their servers there's a framework in place for them to understand it.

#Comment Re: made: 2010-02-03 23:44:30.69046+00 by: Shawn [edit history]

How about SharePoin<snrk!>...<giggle>...<snort>...<gasp>...whew!... I couldn't say it and keep a straight face ;-)

Seriously, though, I'll try to pass the word. I think my brain lives in the same general neighborhood as yours in regards to things like this. I might suggest looking at some Unit Test frameworks, though.