Flutterby™! : Cloud computing

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Cloud computing

2011-04-23 00:13:28.858522+00 by Dan Lyke 6 comments

QOTD: MeFi user eriko on the entry about the Amazon EC2 failure:

Can we all stop fucking saying "cloud" now?

Remember: Clouds are made of vapour.

[ related topics: Quotes Books ]

comments in descending chronological order (reverse):

#Comment Re: made: 2011-04-26 05:13:20.525893+00 by: ebradway

The Chaos Monkey is a set of scripts that run through Netflix’s AWS process and randomly shuts them down to ensure that the rest of the system is able to keep running. Think of it as a system where the parts are greater than the whole.

Sounds like a virtual memory regression tester we once loved and hated. After living in Tennessee for almost two decades, I've found the best way to avoid falling on my ass trying to walk on the icy Colorado sidewalks is to purposefully try to slide on the ice. Failing to intentionally induce a fail is not a fail when you really just want to stay upright. Worse case, you slide on the ice. Wheee!

#Comment Re: made: 2011-04-26 00:56:12.458869+00 by: Dan Lyke

Webmonkey: Lessons From a Cloud Failure: It’s Not Amazon, It’s You.

The EveryBlock Blog: A note about the site being down:

While the acute problem originated with AWS, EveryBlock is not without blame for this downtime. Frankly, we screwed up. AWS explicitly advises that developers should design a site’s architecture so that it is resilient to occasional failures and outages such as what occurred yesterday, and we did not follow that advice.

Both Via RC3

#Comment Re: made: 2011-04-24 15:16:37.650599+00 by: Dan Lyke

Okay, the other real issue is that we're moving towards a centralized net. This is both a sociological change as well as a physical one, but generally the consolidation towards single points of failure, either technical or political, should be a source of concern.

#Comment Re: made: 2011-04-23 15:04:58.860332+00 by: Dan Lyke

Yeah, the real issue is that a bunch of sites naively deployed on "the cloud" as though it would take care of their scaling and reliability issues. "The cloud" doesn't stop you from having to think about those things.

Kinda like that situation years ago when people (in Manhattan?) bought a bunch of circuits to different places, from different carriers, and found out the hard way that they were all virtual through the same piece of fiber...

#Comment Re: made: 2011-04-23 06:34:04.937613+00 by: ebradway

Ahh.... Here's my answer. Evidently they are still having trouble with EC2 and RDS in Virginia... But zero failures on the same systems elsewhere. That is, if you used the cloud as a way to inexpensively create a redundant, multi-homed system, as Netflix has done, your system was safe. Your applications automatically switched over the Northern California EC2 and RDS systems.

So the cloud worked exactly as it was supposed to in the event of a catastrophic failure. It's just some customers of Amazon's didn't have their stuff together.

#Comment Re: made: 2011-04-23 06:22:48.378069+00 by: ebradway

Oh my god! You mean some sites on the internet had down time? That never happened before all this cloud BS.

Amazon EC2 is a pretty big basket with lots of eggs in it. So failures are pretty high profile. But the fact that so many high-demand sites can run out of one basket is pretty freakin' cool. And exactly how long was it down?