On Fri, 20 Nov 2009, Brendan Minish wrote:
> Yes I have had raid 5 systems fail during rebuilds on 2 occasions (1 was
> using a hardware controller that 'helpfully' hid the smart status from
> the OS) the other was software based and I had a second drive fail
> during the rebuild.
> Since I am not responsible for very many raid arrays this seems like a
> high average (or perhaps I am just careless..) but for me it's Software
> Raid 1 and Raid 10 from here on in.
> I have always backed up RAID disks and will continue to do so.
>> Of course there are no guarantees that raid1 won't fail during a rebuild
> either but at least you have better odds and you are not stressing the
> drive nearly as much with a raid1 rebuild, the performance won't be as
> shit during the rebuild either
I guess we really need to be clear about what we are trying to achieve.
RAID doesn't actually give many benefits - it offers some degree of
protection against drive failure, sure, but not against filesystem
corruption, power outage, etc. It doesn't protect against accidental
deletion by user (snapshots on ZFS do). RAID isn't backup, let alone
OFFSITE backup. RAID gets up back up and going quickly on drive death -
it does not ensure you do not lose valuable data - you need a larger and
more encompassing backup strategy to achieve that.
If you are just worried about data loss, keep copies of your data on 3
drives, one of which is offsite. Storage Mojo is a great site for
discussing all storage related stuff. In particular, he would argue
against home RAID, suggesting that backups are more important
don't disagree with this, but I like to do both.
On the question of HW vs SW raid,
http://jeremy.zawodny.com/blog/archives/008696.html is a good article on
SW vs HW raid. Personally, I have my Linux md SW RAID / OpenSolaris ZFS
setups connected to a UPS, and cronjobs to scrub the Linux ones regularly.
I like that I can plug these drives into a new system and recover this -
without relying on proprietary replacement HW RAID cards (which might even
prove difficult to source)
I'm the paranoid sort with my data, and I also use ECC RAM in machines
that provide my storage :-)
As regards RAID-5 rebuild failure, the problem lies with the size of the
drives, and the relative probabiliy of errors is such that you're highly
likely to get an unrecoverable bit error on a large rebuild -- something
you can't recover from with RAID5. For consumer drives, you're looking at
an unrecoverable error rate (URE) of 10^14; for enterprise it is 10^15 (SATA)
or 10^16 (SCSI/SAS).
Where this kicks you is with RAID5 rebuilds if one drive dies. I haven't
done the maths recently on this, but IIRC I think there is a guaranteed
significant chance of a corrupted sector on a 1TB array rebuild if a
drive dies. Worse of all, I don't know what will happen if you hit this
corrupted sector - will the rebuild continue around it, or will it abort?
(See Death of RAID5 in 2009 - http://blogs.zdnet.com/storage/?p=162)
The storage mojo articles on ZFS
(http://storagemojo.com/zfs-threat-or-menace-pt-i/,http://storagemojo.com/zfs-threat-or-menace-pt-ii/) convinced me to use it
for the stuff I really care about (and backup), and I just leave media I'm
not too fussed
over on my Infrant now. With OpenSolaris ZFS I'm not too bothered about
UREs in my current setup, since there is end-to-end CRCing, but with the
Infrant (and every other consumer system) there isn't.
discusses real world hard drive failure rates. Damn entropy gets you
ultimately with digital media -
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!