Flutterby™! : Chaos Monkey

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Chaos Monkey

2011-05-13 00:07:00.25659+00 by Dan Lyke 3 comments

Columbine tossed this in the tumblr feed, I think I saw it but I'm not sure it registered as solidly as it should have [Edit: we mentioned it in the recent Cloud Computing entry]: Coding Horror: Working with the Chaos Monkey:

Which, let's face it, seems like insane advice at first glance. I'm not sure many companies even understand why this would be a good idea, much less have the guts to attempt it. Raise your hand if where you work, someone deployed a daemon or service that randomly kills servers and processes in your server farm.

Now raise your other hand if that person is still employed by your company.

I think this gets wonderfully to the ideas that I was unable to articulate in the "Software Gardener" thread. Plan for failure. Heck, deliberately introduce things that make your system unstable so that you have to correct the system around those flaws. Modern complex systems cannot be modeled before hand. They're more complex than a 767, and they have a working life before the requirements completely change, or someone decides they need another wing or two, measured in months. We can't treat these things like bridges, we have to treat them like living breathing monsters that can turn on us.

[ related topics: Interactive Drama Weblogs Aviation Software Engineering Work, productivity and environment Archival Woodworking ]

comments in descending chronological order (reverse):

#Comment Re: made: 2011-05-13 15:49:51.192834+00 by: Dan Lyke

And on the other side of that particular system, if I remember right we were initially going to have a database system that was far less transaction locked than we ended up with, and the testing playback system incorporated lock management in a way that supported debugging those edge cases (if the testers ever ran across them), but in the end we decided that the additional speed that might have been gotten from a finer grained locking system wasn't worth attempting to deal with all of the complexity.

Especially since VMEM had no more bugs...

#Comment Re: made: 2011-05-13 14:39:54.042622+00 by: ebradway

The Chaos Monkey reminds me of Mega Murphy. If you are going to use a memory manager that requires locks, then move everything that's not locked and overwrite every byte not allocated as often as possible. Actually, Mega Murphy was more methodical and less random than the Chaos Monkey.

I think the lessons that process engineering brought to software gardening were important. Having a repeatable process is a must (and is reflected in various pop methodologies, like SCRUM). Of course, all programmers resist measurement almost as much as school teachers dislike standardize tests and performance benchmarks.

#Comment Re: made: 2011-05-13 10:05:28.796705+00 by: meuon

This is why I have a set of arcane troubleshooting skills. While not exactly always intentionally, I have been the, or fed the chaos monkey. And, being a hypocrite, I often violate the lessons I learned. At least I know I'm in danger of feeding the chaos monkey.