Just a bit more info on this.
Sample NFS line in /etc/fstab
node01:/home/cian /home/cian nfs
nolock,hard,intr,rsize=8192,wsize=8192,timeo=20 1 2
Right. Now here's something *really* weird. I noticed this problem a day
or 2 ago with 1 particular run of files - none of them would read.
However, I just remembered - I backup to an external disk here. My /home
on the cluster is samba mounted to my windows box. I use a cygwin
compile of rsync to backup the files. I backed up last Tuesday evening
(31st October). I haven't touched any of my files since on either the
cluster or the external disk. I've just tried there and the backups on
the external disk work perfectly - even if I copy them back to the
cluster and read them in from there. Just as a BTW, the rsync would have
yanked the files over NFS and when I copied the backup set to the
cluster a few minutes ago, it would have been sent over NFS to my home
dir. They read fine.
Investigations: Plonk corrupt set and working set into directory.
-backup is the working set and -corrupt is, well, the corrupt set. I've
taken the smallest corrupt set.
cian at master:~/corrupt$ stat -t *
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.cas.gz
10580022 20704 81b4 1000 100 e 2448004 1 0 0 1153945019 1153945019
1162895571 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.dat.gz
28584075 55896 81b4 1000 100 e 2448003 1 0 0 1153945025 1153945025
1162895576 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
10580022 20704 81b4 1000 100 e 2448006 1 0 0 1153948619 1153948619
1162895621 4096
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.dat.gz
28584075 55896 81b4 1000 100 e 2448005 1 0 0 1153948625 1153948625
1162895613 4096
cian at master:~/corrupt$ md5sum *
624afff87d49b32ed699aaa476d87fbf
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.cas.gz
ae06e44a97dd849bdd2fb85ab7625f36
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-backup.dat.gz
b8e4c40b29af3a9c9bca5d58e0da4659
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
1fe725b04fc01cfea19141c43e0dbe3f
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.dat.gz
cian at master:~/corrupt$ gunzip
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz
gunzip:
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz:
invalid compressed data--crc error
gunzip:
Bent_Plate_Model_VGeom_RNG_Solved_1e-05Res_4040Its-corrupt.cas.gz:
invalid compressed data--length error
I'm going to keep jabbing at them and see what caused the corrupt set to
change. According to stat, the modification dates are the same.
It's been suggested that I install munin to keep an eye on stuff so I'll
do that.
Thanks for everyone's help.
Regards,
Cian
Cian Davis wrote:
>> Hi,
> I have a weird, frustrating problem and would appreciate the insights
> of anyone on this list. Please bear with me, it's a long mail but the
> problem needs to be described.
>> Most of us use software called Fluent and one person in the group uses
> CFX. All our desktop machines are Windows and we use the Windows
> version but we have a cluster of 9 Fujitsu-Siemens dual processor Xeons.
>> When the cluster was initially delivered, it was running RedHat 6.
> After a few months, some of the Fluent users found that their files
> wouldn't read because they were corrupted.
>> Regards,
> Cian
>>>>
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!