Thanks for the replies, but I think you're missing my point.
Aeriel density on modern drives is so high that all modern drives ship
with a section of replacement sectors. When made in the factory, bad
sectors are discovered and stored into a bad sector map. The drive
firmware remaps these with good sectors from the reserve, and is
then fit for sale. This helps keep drive costs down.
During normal operation of the drive, the firmware on the drive will
replace sectors which the drive thinks may be going south with sectors
from this reserve.
Only when the reserve is full will the operating system notice bad sectors
and start allocating around them... (at which point the drive is totally
It actually is the drive firmware that does this, not the OS driver. Its
well below the level of fdisk, or ext2 - I'm actually talking about the
code running on the drive electronics itself.
Although I appreciate the code is vendor specific, I was wondering if
there were any techniques to encourage the drive to perform this low-level
When you say "it doesn't matter if the error is detected during a workout
or a normal read. If the data is gone, it's gone".
Again, modern drives experience errors on practical every sector read.
Its just a fact of the density. Error correction coding is used to
recover from these. I believe modern drives use CRCs, viterbi, and
Reed-Solomon block codes to detect and correct these minor errors, which
are occurring as a normal fact of using the drive.
As a consequence of this, the drive is usually able to tell itself that
the sectors start deteriorating long before the data is "gone" using
increasing bit error ratio counts.
RE increasing the chances of total failure of the drive, I've seen from my
own experience that drives that are used fairly regularly typically (i.e.
kept spun-up, at a semi-stable temperature) fail from bad sectors over
time - whereas drives that are not, or are power cycled regularl,
typically fail from mechanical failure. That has been my experience,
On Sat, 15 Sep 2007, Gary Pigott wrote:
> I don't see the point.... You're attempting to trigger the drive's data
> correction and sector marking functionality during your sweeps, correct? It
> doesn't matter if the error is detected during a workout or a normal read. If
> data is gone, it's gone, and you'll have to rely on your week old backups.
> Bit error frequency is a function of the number of times the bit is accessed.
> Your workout is only *increasing* the frequency of such errors, and putting
> even more wear on the moving parts, *increasing* the possibility of total
> failure of the drive.
>> The low level I/O you're looking to do has nothing to do with firmware. It's
> standard ATA/SCSI stuff. Take a peek into the source for fdisk and the ext2
> module for some code samples. Low level drive testing functionality is vendor
> specific. Manufacturer tools are pretty much your only way of getting
> visibility on what's happening on the platters. Take a look at
>http://linuxmafia.com/faq/Hardware/hdutils.html for some links.
> From: "Ivan Griffin" <ivan at skynet.ie>
> Sent: Saturday, September 15, 2007 3:20 PM
> To: "iLug Users Group" <ilug at linux.ie>
> Subject: Re: [ILUG] hard drive workouts - any ideas?
>>>>> Yes, I agree with the points on RAID not being a backup, and the risk of
>> accidental deletion etc. Thats why I have a separate external disk the
>> same size as my RAID array that I backup to.
>>>> This disk is kept offline mainly to reduce wear & tear, but regularly
>> powered up to do the backup and also to prevent stiction.
>>>> I've been around the block long enough to appreciate that... but thats not
>> my concern here.
>>>>>> What I am interested in is exploiting any techniques in drive firmware to
>> encourage them to remap good spare sectors for bad. Giving the drives as
>> much opportunity to do this as possible by regulary reading every sector,
>> or possible even bit-inverting every sector and the back again, seems to me
>> to be a reasonable strategy.
>>>> However, I don't have any experience directly writing drive firmware, so I
>> was hoping someone on the list might.
>>>>>> On Sat, 15 Sep 2007, Michael Watterson wrote:
>>>>> Yes RAID does not equal backups. It's a tool to reduce downtime and in
>>> case of RAID5 increase capacity and performance.
>>>>>> In the 8 years I was designing and responsible for IT systems in a wide
>>> range of business there were:
>>> 2 or 3 disk failures on RAID, no loss of data.
>>> 2 destroyed RAID systems (One the server knocked over by cleaning staff,
>>> and the other jolted while being moved by a so-called technician).
>>> 1 RAID system with serious downtime for rebuild as the Sales person
>>> thought you could unplug a drive from hotswap (without putting it off line
>>> first) physically to demo it could lose a drive. Back in 1996 not many
>>> Hotswap systems expected anyone to try that without informing the
>>> management system 1st. Unplugging a drive is NOT the same as a failure!
>>>>>> Restricted Physical access & mounting is important to stop these things
>>> but people won't listen.
>>>>>> Uncountable numbers of events of people deleting databases, spreadsheets,
>>> entire sets of accounts. Where we had got the company to pay for backup
>>> systems and training to use them, this was not a problem. Where the
>>> customer wanted to cut corners it was bad :(
>>>>>> Zero loss of data due to Viruses, Trojans, etc. But that's another story.
>>>>>> Gary Pigott wrote:
>>>> Hi Ivan,
>>>>>>>> to be honest, RAID is over rated when it comes to backup. RAID is for HA
>>>> where you need to be able to tolerate a failure and stay running, but it
>>>> isn't backup. Hard drives do just go bang and RAID will save your data,
>>>> but you're stuffed if you/others delete, or the OS corrupts, a file that
>>>> you created/modified since you last did your manual backup to an external
>>>> drive. The RAID controller will ensure data will get deleted/trashed on
>>>> the second disk at the same time. I see this happen a lot more than
>>>> dramatic drive failures.
>>>>>>>> Rather than have a pair of drives in a RAID set, I prefer to set them up
>>>> as individual drives with the second one *only* mounted during a backup
>>>> or restore. Having a permanently connected "backup" drive means you can
>>>> do more frequent, more automated, less intrusive backups. Write a script
>>>> that mounts hdb, does an rdiff-backup and then umounts it again. rdiff
>>>> will do an incremental backup and preserve the older versions rather than
>>>> overwriting them. Stick it in the crontab to run as often as you like and
>>>> you're nicely covered. If you want to be even more secure, use rdiff to
>>>> push the data to a remote site (like Skynet) too.
>>>>>>>> What you call "regular testing", I call "wearing out". I'd just leave the
>>>> disks alone and be sure my data is protected *when* a disk dies, rather
>>>> than putting additional wear and tear on them to make them fail earlier,
>>>> with the hope that SMART (or more manual methods) will detect the failure
>>>> in time. Only 30% of drive failures are detectable by SMART if you
>>>> believe Wikipedia (see
>>>>http://en.wikipedia.org/wiki/S.M.A.R.T.#Background). If you're running
>>>> hardware RAID, Linux will only see one "disk" as it'll all be hidden by
>>>> the RAID controller, so there's a limit to the efficacy of any disk
>>>> diagnostics you can script within the OS. A good RAID controller will
>>>> have an interactive diagnostic function in firmware that you can run
>>>> during boot.
>>>> From: "Ivan Griffin" <ivan at skynet.ie>
>>>> Sent: Friday, September 14, 2007 10:49 PM
>>>> To: <ilug at linux.ie>
>>>> Subject: [ILUG] hard drive workouts - any ideas?
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>> I've recently become very paranoid about my data, after having lost a
>>>>> drive to catastrophic failure.
>>>>>>>>>> My important docs are now in RAID, and backed up weekly to a drive I
>>>>> keep mostly offline and offsite. My home NAS box is Sparc (LEON) based,
>>>>> and runs Linux.
>>>>>>>>>> I run smarttools on the box, although I'm not expecting much from
>>>>>>>>>> I am interested in strategies suitable for running from a cronjob to
>>>>> give the drive firmware a good workout, and a chance to map out any bad
>>>>> blocks that show up.
>>>>>>>>>> Is there any merit in cron'ing something like dd if=/dev/hdX
>>>>> of=/dev/null bs=XXXX ?
>>>>>>>>>> What about going a step further, and running a tailored initrd to read
>>>>> each sector, xor with a bit pattern, write, compare, xor out the
>>>>> pattern, write, compare ...
>>>>>>>>>> I've searched for literature on this type of thing, but not found
>>>>> anything of note, other than some marketing blurb on GRC's spinrite.
>>>>>>>>>> Do people have experience of this? Anyone work directly on drive
>>>>> firmware? What works best for the drive?
>>>>>>>>>>>>>>> Best Regards,
>>>>> Irish Linux Users' Group mailing list
>>>>> About this list : http://mail.linux.ie/mailman/listinfo/ilug>>>>> Who we are : http://www.linux.ie/>>>>> Where we are : http://www.linux.ie/map/>>>>>>>>>>>>> --
>>> Irish Linux Users' Group mailing list
>>> About this list : http://mail.linux.ie/mailman/listinfo/ilug>>> Who we are : http://www.linux.ie/>>> Where we are : http://www.linux.ie/map/>>>>>>>> --
>> Irish Linux Users' Group mailing list
>> About this list : http://mail.linux.ie/mailman/listinfo/ilug>> Who we are : http://www.linux.ie/>> Where we are : http://www.linux.ie/map/>> --
> Irish Linux Users' Group mailing list
> About this list : http://mail.linux.ie/mailman/listinfo/ilug> Who we are : http://www.linux.ie/> Where we are : http://www.linux.ie/map/>>
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!