Re: [ILUG] scaling Sendmail

From: Nick Hilliard (nick at domain iol.ie)
Date: Fri 17 Sep 1999 - 18:48:30 IST


: Could you define "properly designed"? I'm just curious what kind of
: problems you see at your level of throughput.

Sendmail (and indeed most current free unix MTA's) do this:

        1) receive mail over SMTP
        2) write mail to queue using unique filename
        3) either:
                3.1) attempt to deliver mail immediately or
                3.2) leave mail in queue for another process to manage
        4) rescan queue periodically and deliver what can be delivered.

sendmail isn't preforking, which means that every SMTP connection means
loading up another instance of the program. This leads to bad latency
problems if there's a memory load or a high disk I/O load on the machine in
question.

When these MTA's try to open up unique filenames, it's partly a hit-and-miss
thing. You guess a particular filename (using, say, sendmail's paradigm of
time-of-day + PID), and then if the file doesn't exist, then you're ok. But
sometimes, you're going to take a performance hit on this (admittedly a
small one).

More importantly, creating and opening up queue files every time you need to
write out what you're got from the network is a pretty inefficient thing to
do. If you run your spool in async mode, then you run the risk of really,
really nasty filesystem corruption if the machine crashes (see
http://www.shub-internet.org/brad/papers/sendmail-tuning/sld033.html). If
you run the spool in sync mode, then the latency associated with creating,
opening and deleting files is huge. A good compromise is BSD softupdates
which give you the advantages of async speeds with sync-style metadata
safety. Perhaps even better would be Brad Knowle's idea of having a bucket
of available pre-made spool files.

So now, you've got a queue full of data files -- let's say 100,000 of them,
if a system goes down for a few hours. Filesystem directory structures on
almost all systems at the moment (except for hash-based systems like SGI XFS
and NetApp OnTAP, but you'd never want to have an NFS mail spool) are
essentially linked lists, and if you're got a large directory full of files,
scan times are abysmal. Now, if you've got this large amount of queue
files, most destined to a large number of different places on the net, then
you're going to need piles and piles of sendmail processes to deliver them,
so you start off a couple of instances of "sendmail -q", or you depend on
"sendmail -q5m" or something (although, in reality, the latter is a really
bad idea because it can easily lead to catastrophic failure modes). Each
sendmail process does a full queue sweep, stat()'ing every file before
opening the q* files. This leads to vast amounts of disk I/O, which causes
major performance loss. There are way of limiting the number of active
sendmail queue processes running, but they aren't included in the standard
sendmail distribution.

And then customer with domain xyz.ie has just logged in and wants all of the
mail to be delivered within 10 seconds. Or even better, you wait until the
top of the hour, when 30 people log in simultaneously and want their mail to
be delivered at the same time. Each of these delivery messages is going to
take 1 full queue run each. Yikes!

In addition, it's difficult to control sendmail when the machine is
stressed. Ok, you can tell it not to do anything if the load goes above
whatever, but often by that stage, the machine is pretty crippled anyway,
and then you start losing out SMTP connections and relying on MX backup
systems.

Or if you're getting a DOS attack from a single IP, you can't tell sendmail
to only allow n connections before refusing them from that IP address. The
net result when someone does this (and people often do it accidentally), is
that you get hundreds and hundreds of sendmail processes hanging around in
the background, waiting for nothing to happen. You can help this but
changing some of the sendmail timer values, but it doesn't solve the
problem.

Also, the controls for delivering mail to remote systems are poor -- you
can't specify how many concurrent queue run or any other sort of processes
you want to max out at; you can't specify the max amount of bandwidth you
want to use, and so forth. It's all a little inflexible.

Some other mailers address some of these problems, but they all have the
same basic queueing problem: namely, that there is no decent queue
management system. MTA's like PP and MMDF come some way towards addressing
the problem, but they are so hideous in so many other ways and it's just not
worth even considering them. Hey - just ask some old-timer in UCD computing
services about PP (or MMTA as it was later called), and see what the
reaction is :-)

Anyway, I could go on, but you get the idea. All of the standard mailers
are deficient in lots of different but infuriating ways. It would have been
nice to think that starting from a clean slate, that Wietse Venema might
have created Postfix with a really nice queueing system, but alas....

Nick



This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:04:34 GMT