The Failure Rates Of Enterprise Storage Hardware

Enterprise Storage Forum recently ran an article that poses the question of how many archival copies are required to establish acceptable reliability. They gathered the failure rates of common storage media into an interesting table which describes the amount of time it should take on average to encounter a hard error depending on how many devices are in use. As the number of devices increases, the number of hours that will pass before a hard error is encountered becomes surprisingly small (except in the case of enterprise tape).

 

1 Device 10 Devices 50 Devices 100 Devices 200 Devices
Hours to encounter hard error at sustained data rate.
Consumer SATA 50.9 5.1 1.0 0.5 0.3
Enterprise SATA 301.0 30.1 6.0 3.0 1.5
Enterprise SAS/FC 3.5 inch 2,759.5 275.9 55.2 27.6 13.8
Enterprise SAS/FC 2.5 inch 1,965.2 196.5 39.3 19.7 9.8
LTO-5 and 23,652.6 2,365.3 473.1 236.5 118.3
some Enterprise SAS SSDs 7,884.2 788.4 157.7 78.8 39.4
Enterprise Tape 1,379,737.1 137,973.7 27,594.7 13,797.4 6,898.7

 

 

The article goes on to state that, while interesting, these numbers really don’t answer the question of how many copies of archived data are required. All it really says is that one copy is insufficient.

MagnaStor’s Answer: Three Copies

It turns out that the factors they list that impact the decision of copy count are entirely mitigated by a three-pronged archival strategy with MagnaStor. We propose the following:

How MagnaStor Mitigates Data Impact Factors

Under this scenario, let’s address the factors that could impact data and how this strategy addresses them:

Silent Data Corruption

Natural bit-rot on all storage devices occurs over time and is unavoidable.

MagnaStor mitigates this issue with it’s active monitoring feature. Even the tiniest possible corruption will be detected by MagnaStor’s automated file system checking, and MagnaStor heals itself without user intervention. Metadata corruption on any MagnaStor volume is automatically reparable even without an external replication source. User data corruption is automatically reparable with the availability of a live MagnaStor replication peer or cloud source.

In the above scenario, user data corruption on A will be automatically healed from B or C. User data corruption on B will be automatically healed from A. The failure rate of S3 is such that corruption on C can be disregarded, but if it ever occurs, A can always be pushed again.

Under our MagnaStor scenario, silent data corruption can be safely disregarded.

A Bad Lot of Media

This factor applies to optical media and tape, not hard drives. Since our scenario uses two hard drives and the cloud, this factor can be safely disregarded.

Natural Disaster

The cloud node in our scenario is the natural disaster mitigator. The only way we could suffer data loss in this case is if the disaster occurred before cloud replication was completed, or if the disaster affected the physical cloud location as well. S3 buckets can be geographically located at facilities all around the world, so if a major disaster is a real concern, more than one S3 cloud replication can be defined.

Outside of the event of a worldwide catastrophe, the natural disaster factor can be safely disregarded.

Network Failure — Preventing Replication

A network failure with our scenario would prevent access to C, but you would still have the live local slave B and the original archive A onsite; If a need for the data arose or a corruption was discovered, it could be handled locally.

A combination of the loss of network, a failure on A and a failure on B might result in data loss if the failures on A and B were both catastrophic or happened to the same user data on both disks. We think it’s safe to assume that the chances of this happening are negligible, making the network failure factor one that can be safely disregarded.

Human Error

MagnaStor’s protection works in the operating system’s kernel, and unless specifically allowed by policy, all data on a MagnaStor volume is retained forever. Even if the user “deletes” a file, that file’s history and all the user data it ever contained still resides on disk. This protects data on MagnaStor volumes from human error. With our MagnaStor scenario the human error factor can be safely disregarded.

Intentional Data Damage

Similarly to the human error factor, the fact that MagnaStor’s protection operates in the kernel means that malicious changes made to files are merely added to that file’s history. The data that was there before the malicious change is still retained and retrievable. Again, with our MagnaStor scenario the intentional data damage factor can be safely disregarded.

Conclusion

Outside of the possibility of global disaster or astronomically unlikely coincedence, a simple “three pronged” strategy implemented with MagnaStor provides a fast, easy, and worry-free approach to archiving that clearly answers the question of how many copies of an archive are required for acceptable reliability.