LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Production email server, corrupt ext3 fs - need advice

[ILUG] Production email server, corrupt ext3 fs - need advice

John Molohan john.molohan at gcd.ie
Fri May 19 16:25:12 IST 2006


a little background first.

Tue 16th ~22:00
    Blackout in server room
    Email server not UPSed (don't ask)
    ~22:30 Power restored server comes back up.

Wed 17th ~10:30
    Problem identified for 1st time
    1st technictian to server room finds no A/C running
    Server room is approx 40 degrees celcius.
    ~11:30 A/C restored room temp normal.
    All systems checked/monitored, no apparent issues.

Thur 18th 13:15
    All disk access stops on email server
    Console errors on email server e.g.
    ext-fs errot (device in transaction 24734: journal has aborted)
    System unresponsive, rebooted with a alt+sysrq+R after all else fails.
    Partitions fscked on reboot, orphaned inodes cleared, no more issues.

Fri 19th 10:30
    Same as thursday, all disk access stops, console errors etc.


Orphaned inodes only on /var partition and on all occasions appear 
around the same locations (see http://www.it.gcd.ie/inodes.txt). Given 
that we don't have massive experience with corrupt filesystems or RAID, 
other than when it's working, were looking for a bit of advice. We do 
have rsynced backups of the mail from 04:00am every night. Were thinking 
of the following approach.

1. Take the box down.
2. In the scsi host util run verify media on all disks to identify & 
   mark any bad sectors and make them unavailable.
3. Reboot & remount /var ro
4. Rsync a new backup.
5. Run smartctl see if it identifies any issues.
6. Format /var?
7. Recreate /var from backup.

Any suggestions/additions, other approaches?

Some questions:
1. Do you think we could continue to trust these disks or should we just
   forget it and replace them?
2. Does anyone have any hints from the admittedly little information as to
   whether this might be just filesystem corruption or dead disks?
3. There were a lot of servers in the server room which all experienced
   this slow cooking but none have shown any obvious problems so far.
   Should we be doing something as a precaution for them?
4. Is it safe to assume that this failure is probably a direct result of 
   the heat? 

Other info.

Dell Poweredge 2650
Kernel 2.6.3-29mdksmp
RAID 5
Red Hat/Adaptec aacraid driver (1.1.2-lk1 Nov 28 2005)
AAC0: kernel 2.8.4 build 6089
AAC0: monitor 2.8.4 build 6089
AAC0: bios 2.8.0 build 6089
AAC0: serial 171830d3fafaf001
scsi0 : percraid
 Vendor: DELL      Model: PERCRAID RAID5    Rev: V1.0
 Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 1146866176 512-byte hdwr sectors (587195 MB)
[it at dubmail it]$ df
Filesystem            Size  Used Avail Use% Mounted on
/dev/scsi/host0/bus0/target0/lun0/part5
                      15G  5.7G  8.0G  42% /
/dev/scsi/host0/bus0/target0/lun0/part6
                     522G   32G  464G   7% /var
Var hosts cyrus-imap spool





More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell