Hi All,
apologies for the significant delay in responding, we delivered late!
I got some fantastic advice on this topic from ilug.
In the final solution I used lockfiles, and since the number of files
I was dealing with wasn't so great, there was only a small performance
hit.
Padraig's idea to sort the files gave significant performance
improvement (double figure % differences) results on development
boxes, where there was only one disk, and less when connected to NFS
which used a SAN behind the scenes.
Thanks to everyone for their ideas and help.
Cheers,
Oisin
On 7/13/07, Oisin Kim <oisinkim at gmail.com> wrote:
> Thanks for the responses, I promise I'll update all with benchmarking
> results for Pádraig's inode sorting suggestion.
>> I'll have a good read of the links Pádraig gave too and give me 2c on it.
>> Thanks all,
>> Cheers,
> Oisin
>>> On 7/13/07, Pádraig Brady <P at draigbrady.com> wrote:
> > Efficiently checksumming files is something I've thought a bit about¹
> > The biggest bottleneck I've found is disk head seeking, so to
> > minimise that, the handiest thing I've found is to sort by inode
> > (sorting by path is nearly as efficient). 1 modern CPU should be more
> > than enough to checksum data as fast as most disks can throw at it.
> >
> > Also you do not want the overhead of starting a cksum process per file.
> >
> > As a first pass can you compare the running speed of the following:
> >
> > find . -maxdepth 1 -type f -printf "%i\t%f\n" |
> > sort -k1,1n |
> > cut -f2 |
> > tr '\n' '\0' |
> > xargs -r0 cksum
> >
> > Now for multiple spindles it would be worth having multiple
> > checksum processes (especially if you have multiple CPUs).
> > So to answer your original question, how do you syncronize
> > writes to a single file in this case?
> >
> > Well when you open a file with O_APPEND set (as the shell
> > does when you `>> file`), on each write, the file offset
> > private to each process is automatically set to the current
> > file size. All you have to worry about is that cksum does
> > not write a partial line the whole way to the kernel
> > before it scheduled. I think this is OK, but I leave it
> > as an exercise for the reader to verify there are
> > no issue with buffering²
> >
> > ¹ http://www.pixelbeat.org/fslint/> > ² http://www.pixelbeat.org/programming/stdio_buffering/> >
>>
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!