LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Corrupt files

[ILUG] Corrupt files

Frank Duignan frank.duignan at gmail.com
Mon Nov 6 19:50:50 GMT 2006


Any clues in this link?
http://www.beowulf.org/archive/2002-May/007151.html
f

On 11/6/06, Cian Davis <davisc at skynet.ie> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Hi,
> I have a weird, frustrating problem and would appreciate the insights
> of anyone on this list. Please bear with me, it's a long mail but the
> problem needs to be described.
>
> Our research group focuses on CFD
> (http://en.wikipedia.org/wiki/Computational_fluid_dynamics for those
> interested)
>
> Most of us use software called Fluent and one person in the group uses
> CFX. All our desktop machines are Windows and we use the Windows
> version but we have a cluster of 9 Fujitsu-Siemens dual processor Xeons.
>
> When the cluster was initially delivered, it was running RedHat 6.
> After a few months, some of the Fluent users found that their files
> wouldn't read because they were corrupted.
>
> Fluent files are made up of descriptive text at the top, a binary blob
> of information in the middle, and text again at the bottom. Fluent has
> support for gzip so I told people to gzip the files and that helped
> for a while but it came back. The occurrences seemed random and only
> affected about 2 out of the 5 people using Fluent on the cluster. We
> would find that the modification date on a corrupted data set would be
> the same as a backup that was working.
>
> The CFX user had no problem and 2 years later continues to have no
> problem.
>
> In short, I couldn't pin it down to anything but suspected that the
> versions of software offered by RedHat 6 were old and possibly dodgy.
> So about a year ago, I wiped all the machines and put Debian sarge on
> them. It's not a supported platform for either Fluent or CFX but I've
> managed to get both working from a tarball that each provide.
>
> It's started happening again and specifically, it's started happening
> to my files. Considering that each of these datasets generally takes
> about 12 hours to solve, it's more than a bit of a pain in the arse
> that stuff is screwing up. One of the machines faces the network runs
> Kerberos, NIS, Nagios, NFS, DNS, Squid and ntpd. The other nodes have
> the Fluent and CFX software NFS mounted from the master node.
>
> Now, don't moan about this bit - it's the only way I could do it. The
> master only had 50GB of disk free. Each of the nodes had about 20GB
> free. To give everyone enough space for the thing to be useful, the
> /home of the heaviest user was put on the master node and the other
> users were given a /home on one of the nodes, which was NFS mounted to
> the master (as /home/$user). Generally, a job is set running on more
> than 1 node from the master - Fluent uses rsh to contact the other
> nodes. As far as possible, no heavy computation is done on the master
> node.
>
> I don't think it's an NFS problem - the user with the home on the
> master node was the first to go tits up. I don't think it's a Debian
> problem because the same happened with RedHat. I don't think it's a
> Linux problem because no other software seems to have a problem.
> Nothing in logs or dmesg. I'm leaning towards a Fluent problem or a
> hardware problem so I can't think of any way to test this. The problem
> is sufficiently random that I can't provide good data to the software
> maker to investigate - and the fact that we're running on an
> unsupported architecture doesn't help. And also, if it's a hardware
> problem, why is it only files read and written with this software
> that's causing the problem?
>
> So, can anyone suggest something to try or troubleshooting steps to go
> through?
>
> Any help much appreciated.
>
> Regards,
> Cian
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFT4/S2yUma7R/3b8RAj23AKCABCbCv/8c542nEkjZ/FdcJ2z0vwCeOx+L
> Fo7gokVSzUyaWj3avxnJwTg=
> =IbIz
> -----END PGP SIGNATURE-----
>
> --
> Irish Linux Users' Group mailing list
> About this list : http://mail.linux.ie/mailman/listinfo/ilug
> Who we are : http://www.linux.ie/
> Where we are : http://www.linux.ie/map/
>



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell