I don't see the point.... You're attempting to trigger the drive's data
correction and sector marking functionality during your sweeps, correct? It
doesn't matter if the error is detected during a workout or a normal read.
If data is gone, it's gone, and you'll have to rely on your week old
backups. Bit error frequency is a function of the number of times the bit is
accessed. Your workout is only *increasing* the frequency of such errors,
and putting even more wear on the moving parts, *increasing* the possibility
of total failure of the drive.
The low level I/O you're looking to do has nothing to do with firmware. It's
standard ATA/SCSI stuff. Take a peek into the source for fdisk and the ext2
module for some code samples. Low level drive testing functionality is
vendor specific. Manufacturer tools are pretty much your only way of getting
visibility on what's happening on the platters. Take a look at
http://linuxmafia.com/faq/Hardware/hdutils.html for some links.
Gary
--------------------------------------------------
From: "Ivan Griffin" <ivan at skynet.ie>
Sent: Saturday, September 15, 2007 3:20 PM
To: "iLug Users Group" <ilug at linux.ie>
Subject: Re: [ILUG] hard drive workouts - any ideas?
>> Yes, I agree with the points on RAID not being a backup, and the risk of
> accidental deletion etc. Thats why I have a separate external disk the
> same size as my RAID array that I backup to.
>> This disk is kept offline mainly to reduce wear & tear, but regularly
> powered up to do the backup and also to prevent stiction.
>> I've been around the block long enough to appreciate that... but thats not
> my concern here.
>>> What I am interested in is exploiting any techniques in drive firmware to
> encourage them to remap good spare sectors for bad. Giving the drives as
> much opportunity to do this as possible by regulary reading every sector,
> or possible even bit-inverting every sector and the back again, seems to
> me to be a reasonable strategy.
>> However, I don't have any experience directly writing drive firmware, so I
> was hoping someone on the list might.
>> Cheers,
> Ivan.
>>> On Sat, 15 Sep 2007, Michael Watterson wrote:
>>> Yes RAID does not equal backups. It's a tool to reduce downtime and in
>> case of RAID5 increase capacity and performance.
>>>> In the 8 years I was designing and responsible for IT systems in a wide
>> range of business there were:
>> 2 or 3 disk failures on RAID, no loss of data.
>> 2 destroyed RAID systems (One the server knocked over by cleaning staff,
>> and the other jolted while being moved by a so-called technician).
>> 1 RAID system with serious downtime for rebuild as the Sales person
>> thought you could unplug a drive from hotswap (without putting it off
>> line first) physically to demo it could lose a drive. Back in 1996 not
>> many Hotswap systems expected anyone to try that without informing the
>> management system 1st. Unplugging a drive is NOT the same as a failure!
>>>> Restricted Physical access & mounting is important to stop these things
>> but people won't listen.
>>>> Uncountable numbers of events of people deleting databases, spreadsheets,
>> entire sets of accounts. Where we had got the company to pay for backup
>> systems and training to use them, this was not a problem. Where the
>> customer wanted to cut corners it was bad :(
>>>> Zero loss of data due to Viruses, Trojans, etc. But that's another story.
>>>> Gary Pigott wrote:
>>> Hi Ivan,
>>>>>> to be honest, RAID is over rated when it comes to backup. RAID is for HA
>>> where you need to be able to tolerate a failure and stay running, but it
>>> isn't backup. Hard drives do just go bang and RAID will save your data,
>>> but you're stuffed if you/others delete, or the OS corrupts, a file that
>>> you created/modified since you last did your manual backup to an
>>> external drive. The RAID controller will ensure data will get
>>> deleted/trashed on the second disk at the same time. I see this happen a
>>> lot more than dramatic drive failures.
>>>>>> Rather than have a pair of drives in a RAID set, I prefer to set them up
>>> as individual drives with the second one *only* mounted during a backup
>>> or restore. Having a permanently connected "backup" drive means you can
>>> do more frequent, more automated, less intrusive backups. Write a script
>>> that mounts hdb, does an rdiff-backup and then umounts it again. rdiff
>>> will do an incremental backup and preserve the older versions rather
>>> than overwriting them. Stick it in the crontab to run as often as you
>>> like and you're nicely covered. If you want to be even more secure, use
>>> rdiff to push the data to a remote site (like Skynet) too.
>>>>>> What you call "regular testing", I call "wearing out". I'd just leave
>>> the disks alone and be sure my data is protected *when* a disk dies,
>>> rather than putting additional wear and tear on them to make them fail
>>> earlier, with the hope that SMART (or more manual methods) will detect
>>> the failure in time. Only 30% of drive failures are detectable by SMART
>>> if you believe Wikipedia (see
>>>http://en.wikipedia.org/wiki/S.M.A.R.T.#Background). If you're running
>>> hardware RAID, Linux will only see one "disk" as it'll all be hidden by
>>> the RAID controller, so there's a limit to the efficacy of any disk
>>> diagnostics you can script within the OS. A good RAID controller will
>>> have an interactive diagnostic function in firmware that you can run
>>> during boot.
>>>>>> Gary
>>>>>>>>> --------------------------------------------------
>>> From: "Ivan Griffin" <ivan at skynet.ie>
>>> Sent: Friday, September 14, 2007 10:49 PM
>>> To: <ilug at linux.ie>
>>> Subject: [ILUG] hard drive workouts - any ideas?
>>>>>>>>>>> Hi All,
>>>>>>>>>>>> I've recently become very paranoid about my data, after having lost a
>>>> drive to catastrophic failure.
>>>>>>>> My important docs are now in RAID, and backed up weekly to a drive I
>>>> keep mostly offline and offsite. My home NAS box is Sparc (LEON)
>>>> based, and runs Linux.
>>>>>>>> I run smarttools on the box, although I'm not expecting much from
>>>> S.M.A.R.T.
>>>>>>>> I am interested in strategies suitable for running from a cronjob to
>>>> give the drive firmware a good workout, and a chance to map out any bad
>>>> blocks that show up.
>>>>>>>> Is there any merit in cron'ing something like dd if=/dev/hdX
>>>> of=/dev/null bs=XXXX ?
>>>>>>>> What about going a step further, and running a tailored initrd to read
>>>> each sector, xor with a bit pattern, write, compare, xor out the
>>>> pattern, write, compare ...
>>>>>>>> I've searched for literature on this type of thing, but not found
>>>> anything of note, other than some marketing blurb on GRC's spinrite.
>>>>>>>> Do people have experience of this? Anyone work directly on drive
>>>> firmware? What works best for the drive?
>>>>>>>>>>>> Best Regards,
>>>> Ivan
>>>> --
>>>> Irish Linux Users' Group mailing list
>>>> About this list : http://mail.linux.ie/mailman/listinfo/ilug>>>> Who we are : http://www.linux.ie/>>>> Where we are : http://www.linux.ie/map/>>>>>>>>> --
>> Mike
>>>> --
>> Irish Linux Users' Group mailing list
>> About this list : http://mail.linux.ie/mailman/listinfo/ilug>> Who we are : http://www.linux.ie/>> Where we are : http://www.linux.ie/map/>>>>> --
> Irish Linux Users' Group mailing list
> About this list : http://mail.linux.ie/mailman/listinfo/ilug> Who we are : http://www.linux.ie/> Where we are : http://www.linux.ie/map/
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!