LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] More SA Bayes questions

[ILUG] More SA Bayes questions

Niall O Broin niall at makalumedia.com
Tue Feb 8 02:01:03 GMT 2005


I sent this the other day, but nobody bit. In the hopes that it was the 
weekend lull, I'm repeating myself :-)

Thanks to contributions from a couple of people the other day I came up 
with
this little script to produce a small report on the Bayes DB:

echo Spam Assassin Bayes Statistics
echo ""
echo Bayes Token Count
echo "Total     Ham     Spam"
sa-learn --dump |awk '{count += 1; if ($0 > 0.5) spam+=1; \
if ($0 < 0.5) ham+=1} END {print count "\t" ham "\t" spam}'
echo ""
echo -n "Number of ham messages learnt from: "
sa-learn --dump magic |awk '/nham/ {print $3}'
echo -n "Number of spam messages learnt from: "
sa-learn --dump magic |awk '/nspam/ {print $3}'

which runs at tne end of a script which sa-learns spam placed in 
folders by
humans during the day. After doing its nightly run, it reported as 
follows:

Spam Assassin Bayes Statistics

Bayes Token Count
Total   Ham     Spam
140114  78443   61671

Number of ham messages learnt from: 2109
Number of spam messages learnt from: 1387

I then fed sa-learn something over 1000 pieces of ham, and now the same 
script
gives me:

Spam Assassin Bayes Statistics

Bayes Token Count
Total   Ham     Spam
153518  10      153508

Number of ham messages learnt from: 2850
Number of spam messages learnt from: 0

AARGH! - what the hell has happened there. It has forgotten about ALL 
the spam
messages it ever learnt from, apparently, but conversely, 78000 ham 
tokens
have become spam tokens. Did SA somehow choke on all that ham?

Straight sa-learn --dump magic now gives

0.000    0          3          0  non-token data: bayes db version
0.000    0          0          0  non-token data: nspam
0.000    0       2850          0  non-token data: nham
0.000    0     153508          0  non-token data: ntokens
0.000    0 1091609393          0  non-token data: oldest atime
0.000    0 1107564300          0  non-token data: newest atime
0.000    0 1107564852          0  non-token data: last journal sync 
atime
0.000    0 1107564590          0  non-token data: last expiry atime
0.000    0    1382400          0  non-token data: last expire atime 
delta
0.000    0      17827          0  non-token data: last expire reduction 
count

whereas sa-learn --dump magic from the databases as of 19:00 last night
(retrieved from the warm standby box) gives

0.000    0          3          0  non-token data: bayes db version
0.000    0       1342          0  non-token data: nspam
0.000    0       2096          0  non-token data: nham
0.000    0     138010          0  non-token data: ntokens
0.000    0 1106096390          0  non-token data: oldest atime
0.000    0 1107544172          0  non-token data: newest atime
0.000    0 1107538029          0  non-token data: last journal sync 
atime
0.000    0 1107478750          0  non-token data: last expiry atime
0.000    0    1382400          0  non-token data: last expire atime 
delta
0.000    0       5589          0  non-token data: last expire reduction 
count

Can anyone shed any light on this?

--
Niall




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell