On Sat, 3 Jan 2004, Niall O Broin wrote:
> The thought occurs then that perhaps some kind of addition to a
> Bayesian filter is necessary, so that if a message has above a
> certain threshold of unknown words, it scores highly. Even better
> might be to compare every word against a dictionary, and again give
> high score to a mail which contained lots of non-dictionary words.
> Lots of problems with this, of course, not least being the CPU cost
> of such an approach.
One approach to for a bayesian filter to take is to score on phrases,
rather than or as well as individual words. Eg, where a phrase
constitutes n number of words. Random dictionary insertions are then
far less likely to match known phrases. (at the cost of having to
remember and sort through far more phrases).
spamprobe takes this approach.
regards,
--
Paul Jakma paul at clubi.iepaul at jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam at dishone.st
Fortune:
"Our vision is to speed up time, eventually eliminating it."
-- Alex Schure
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!