On Thu, Jun 17, 1999 at 04:24:43PM +0100, kevin lyda mentioned:
> i think the bad times are due more to fs design and the layout of the
> metadata then a lack of a db. in fact the metadata is the db, and if
> it were layed out in a smaller area of the disk (or on another disk)
> it would be quicker. the be fs is done like a db iirc. but the way
> unix accesses files, and authenticates access to files/dirs is probably
> also an issue. locate runs as nobody and therefore doesn't even
> catalog all the files - what would your daemon do?
Well, there are a few things, interconnected. Databases are designed for
reporting & stuff like that. They assume fields are small, and that it's
the metadata you are going through most of the time. Filesystems have a
small amount of metadata, and are mainly concerned with stuff like speeding
up throughput. Dcache, the directory caching code for Linux, was only added
around 2.1.70 - it wasn't considered important till then.
At first, I want a fsdb daemon would running from cron, doing something
like locate. Say, every day. Then, you can do simple SQL style queries on
the metadata, names, sizes, permissions (want to find all the setuid/gid
progs on the system - takes 3 seconds on a system with 15,000 files).
You could do stuff like
% mkdir wavs ; cd wavs
% ln -s `dbquery --list name=\*.wav` .
And then, all your wav files would be in the current directory.
Eventually, I want VFS hooks that update the SQL database every so often,
telling it when new files are created & stuff. I'm kinda scared about doing
stuff like this - I am stored atimes and mtimes - so this could slow things
down a lot (though there shouldn't be a problem buffering it - after all,
we don't mind if the fsdb is lost in a crash).
With this, and the VFS to DB link there, I could add stuff like
automounting a directory, based on SQL Views. "Make up a directory with all
my C code in it, and mount it on ~/code/c" etc.
You know the way that Win95 has a "recent documents" list - imagine if you
had a directory of .txt files that were owned by you, and written to less
than a day ago !
> you might want to consider why this problem hasn't been solved.
> i think a lot of people haven't found it necessary since it would slow
> down operations done 99% of the time to speed up operations done 1%
> of the time.
I don't think it'll take much of a hit - apart from when it's doing the
complete sync of the database. Though some kernel side stuff would scare
me...
> news servers actually try to *reduce* the amount of metadata written
> (the noatime option) and that's an application that made the most use
> of the fs as db concept. i suspect the best answer would lie in writing
> an fs that could quickly respond to find requests.
The Squid and ReiserFS people are working on a filesystem for doing
caching quick - shit fast directory lookup, and lossy disk writing - as in,
they do a checksum on the URL, and store it with that name. If another file
is there, shit happens. Write on it anyway.
> i'm not even 75% sure of some of these things, so it would be interesting
> to know exactly what slows down find.
It's the fact that the metadata is all over the place. This was a concious
decision. Do we store files near the metadata, and assume that if someone
does a readdir() they are likely to open a file in that directory(), or do
we store all the directory stuff together, and have the files in another
place, and make looking at directories faster, but file-directory-file
access slower ? Of course you mix the files with their own directories...
> find / -user 288 -o -size +10k -o -user 20 -o -size +100k -print
>> find / \( -user 288 -and -size +10k \) -o\( -user 20 -and -size +100k\) -print
Cool...now how the fuck to implement that logic...
Kate
--
Microsoft: One of the best reasons in the world to drink beer
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!