On Wed, 14 Apr 2004, Ronan Cunniffe wrote:
> The whole point of their trick is to provide a statistically
> useless message body.
Doesnt matter really... the point is to detect statistically
_meaningful_ words that indicate spammyness or non-spammyness of a
mail. The fluff doesnt (shouldnt at least) matter.
If the spammers 'stuff' their spam with random text, then all that
happens is that a bayesian filter will tend to score random text as
neutral, ie 0.5 probability. A decent bayesian filter will only use
phrases with indicative probabilities (ie high or low probabilities)
to construct the bayesian probability for the mail, and discard the
neutral ones.
So text-stuffing wont really affect things much, well not when every
spammer does it. What _will_ hurt bayesian filtering is if the
spammers include the most minimal of spam payloads, eg just one url,
especially if they do not reuse URLs (and spammers register lots of
throwaway domains).
regards,
--
Paul Jakma paul at clubi.iepaul at jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam at dishone.st
Fortune:
No wonder Clairol makes so much money selling shampoo.
Lather, Rinse, Repeat is an infinite loop!
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!