Quoting John Molohan <john.molohan at gcd.ie>:
> 1. Take the box down.
> 2. In the scsi host util run verify media on all disks to identify &
> mark any bad sectors and make them unavailable.
Unlikely any exist. But a good idea nonetheless
> 3. Reboot & remount /var ro
> 4. Rsync a new backup.
> 5. Run smartctl see if it identifies any issues.
> 6. Format /var?
> 7. Recreate /var from backup.
>> Any suggestions/additions, other approaches?
Downgrade and firmware and upgrade the driver first as suggested in my
other mail.
> Some questions:
> 1. Do you think we could continue to trust these disks or should we just
> forget it and replace them?
Nuke the array since you have working backups. Make sure it does a full scrub
prior to formatting and re-installing the OS.
> 2. Does anyone have any hints from the admittedly little information as to
> whether this might be just filesystem corruption or dead disks?
I think its either LUN (RAID) corruption, or just FS corruption. Its
unlikely its a hardware issue to be honest.
> 3. There were a lot of servers in the server room which all experienced
> this slow cooking but none have shown any obvious problems so far.
> Should we be doing something as a precaution for them?
Check H/W logs for thermal events. Unless they are crap and just crash
instead of logging ;-0
> 4. Is it safe to assume that this failure is probably a direct result
> of the heat? Other info.
Unknown. Let me know if you need more help.
regards
Conor.
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!