Server down? Rub some dirt on it and get back out there!

When it comes to running some of the largest and most complex data center, time is money…not necessarily downtime, but the time it takes an IT worker to replace a piece of hardware within a server or storage system.

This stuff is pretty cool…they’re called “time-in-motion studies” and the Facebook Open Compute Project has done some serious analysis and improvement in the time it takes to repair / replace components within a server. I don’t know about you, but whenever I have embarked on upgrading the hard drive in my PC, it wasn’t anywhere close to 1 minute! It’s pretty easy, but not that easy. OCP has this down to a science.

Image by Open Compute Project

There are obvious steps companies take to maximize uptime – redundancy, replication, failover, call it what you want. Maximizing data center real estate is another thing to consider. At any point a server’s processor, memory chip, motherboard, fan, power supply or hard drive falls out of service, the real estate that server takes up becomes a waste. So, making sure a server is back up an running in the shortest amount of time is not necessarily about uptime, but money. Face it, in a cloud data center, if a server is not serving or storage not storing, they are not doing what they were purchased to do…and that is make money.

Remember those annoying minor injuries that kept us out of the game when we were a kids? I remember our coaches wanting to get us back out there as quickly as possible with the all to familiar remedy, “Rub some dirt on it and get back in there!” Well, it sounds like the cloud guys are saying the same thing to their servers.