LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

bayes/hash poison (was Re: [ILUG] cheeap sooftware avaailable ! uqdrrs )

bayes/hash poison (was Re: [ILUG] cheeap sooftware avaailable ! uqdrrs )

Paul Jakma paul at clubi.ie
Sun Jan 4 06:34:45 GMT 2004


On Sat, 3 Jan 2004, Niall O Broin wrote:

> The thought occurs then that perhaps some kind of addition to a
> Bayesian filter is necessary, so that if a message has above a
> certain threshold of unknown words, it scores highly. Even better
> might be to compare every word against a dictionary, and again give
> high score to a mail which contained lots of non-dictionary words.  
> Lots of problems with this, of course, not least being the CPU cost
> of such an approach.

One approach to for a bayesian filter to take is to score on phrases,
rather than or as well as individual words. Eg, where a phrase
constitutes n number of words. Random dictionary insertions are then
far less likely to match known phrases. (at the cost of having to 
remember and sort through far more phrases).

spamprobe takes this approach.

regards,
-- 
Paul Jakma	paul at clubi.ie	paul at jakma.org	Key ID: 64A2FF6A
	warning: do not ever send email to spam at dishone.st
Fortune:
"Our vision is to speed up time, eventually eliminating it."
		-- Alex Schure



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell