Wanted: Hard drive boys for our new ginormous data center
In November, Google wrote in their official blog that they had done an experiment where they had sorted 1 PB (1,000 TB) of data with MapReduce. The information about the sorting itself was impressive, but one thing that stuck in our minds was the following (emphasis added by us):
An interesting question came up while running experiments at such a scale: Where do you put 1PB of sorted data? We were writing it to 48,000 hard drives (we did not use the full capacity of these disks, though), and every time we ran our sort, at least one of our disks managed to break (this is not surprising at all given the duration of the test, the number of disks involved, and the expected lifetime of hard disks).
Each of these sorting runs that Google did lasted six hours. So that would mean that hard drives would be breaking at least 4 times a day for every 48,000 hard drives that a data center is using.