On Wed, Jun 14, 2000 at 05:08:29AM +0100, Paul Jakma mentioned:
> > Direct I/O, i.e. I/O to a filesystem which bypasses the OS
> > filesystem cache,
>> what do you mean by "filesystem cache"?
>> block/page/buffer cache? -> data cache.
>> an optimised list describing mappings between inodes (or vnodes),
> directory entries, blocks, extents, etc.. ?? -> metadata cache.
>> (dentry's in linux *i think*.. maybe kate can enlighten me)
Indeed. Basically, Direct I/O means that none of the filesystem meta data
is cached, nor is the data that's going to the filesystem. But you still
respect the semantics of the filesystem, esp. if it's running on top of an
LVM type thing.
> I've been arguing that the metadata cache is not the cause of
> slowness, the db is probably in one or two gigantic files so the
> metadata cache has an easy task at choosing what info to cache. The
> only thing it can really do is to try keep the file layed out as
> contigiously as possible.
Unless it's to do all that fancy inode redirection and the like. And,
generally filesystems are optimised to handle 4k writes. However, more
modern filesystems, like XFS, ReiserFS and VxFS are written around that.
> The slowness is in the data cache. From the point of view of it, it
> sees that within a range of blocks ( range*blocksize >> allowed data
> cache) the usage pattern is extremely complex (big database). In
> order for the data cache to correctly predict that usage pattern it
> must have unacceptably complex heurastics.. better then for the data
> cache to get completely out of the way -> raw I/O.
The database should be organising it so that inserts (much more common
than updates) are clumped, and reorganised when it's not too busy. It'll
pretty much sequentially write those. And, indexes take up more disk space
than the data itself, and are by definition clumped.
> > It was developed because, while raw disks are the ultimate in
> > performance, they are more work to administer than filesystems,
> > for obvious reasons.
> never having worked with raw I/O: in what way is it more difficult to
> maintain? i would have thought easier. You just point oracle at a raw
> I/O logical volume and forget about it until oracle starts telling
> you that it's running short, at which point you either extend the LV
> or give it a fresh lv.
Compare giving it ten or a hundred raw disks, vs. making one big LVM
partition, and sitting it on that. That said, no one would do that anyway,
they'd make up loads of different partitions, ones for indexing, data,
system and tempspace etc. so you would wonder is managing disks for a
database ever easy.
> i can imagine programming an app to use raw I/O would be a
> big/difficult job though.
Well, you do have to write your own "filesystem" of sorts, as you don't
have access to anything bar read(), write() and seek() (slight
exaggeration here, but you certainly don't have fread() and the like).
> > All the VxFS feature does, if you turn it on, is look at a request,
> > and if it's smaller than X, bypass the filesystem cache. That's all.
> if it's aimed at vlarge db's: that's still a futile hack to try get
> the fs to second guess an extremely complicated app - which it won't
> get right, and it still won't perform like raw I/O.
No, but they want to get performance down to something like raw i/o,
without the headaches. There has to be some reason people pay £16k a
machine for a veritas filesystem & LVM.
"The fool must be beaten with a stick, for an intelligent person
the merest hint is sufficient" -- Zen Master Greg
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!