LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] SA statistics again

[ILUG] SA statistics again

Niall O Broin niall at linux.ie
Sat Feb 5 01:19:24 GMT 2005


Thanks to contributions from a couple of people the other day I came up with 
this little script to produce a small report on the Bayes DB:



echo Spam Assassin Bayes Statistics
echo ""
echo Bayes Token Count
echo "Total     Ham     Spam"
sa-learn --dump |awk '{count += 1; if ($0 > 0.5) spam+=1; \
if ($0 < 0.5) ham+=1} END {print count "\t" ham "\t" spam}'
echo ""
echo -n "Number of ham messages learnt from: "
sa-learn --dump magic |awk '/nham/ {print $3}'
echo -n "Number of spam messages learnt from: "
sa-learn --dump magic |awk '/nspam/ {print $3}'

which runs at tne end of a script which sa-learns spam placed in folders by 
humans during the day. After doing its nightly run, it reported as follows:

Spam Assassin Bayes Statistics

Bayes Token Count
Total   Ham     Spam
140114  78443   61671

Number of ham messages learnt from: 2109
Number of spam messages learnt from: 1387


I then fed sa-learn something over 1000 pieces of ham, and now the same script 
gives me:

Spam Assassin Bayes Statistics

Bayes Token Count
Total   Ham     Spam
153518  10      153508

Number of ham messages learnt from: 2850
Number of spam messages learnt from: 0


AARGH! - what the hell has happened there. It has forgotten about ALL the spam 
messages it ever learnt from, apparently, but conversely, 78000 ham tokens 
have become spam tokens.

Straight sa-learn --dump magic now gives

0.000    0          3          0  non-token data: bayes db version
0.000    0          0          0  non-token data: nspam
0.000    0       2850          0  non-token data: nham
0.000    0     153508          0  non-token data: ntokens
0.000    0 1091609393          0  non-token data: oldest atime
0.000    0 1107564300          0  non-token data: newest atime
0.000    0 1107564852          0  non-token data: last journal sync atime
0.000    0 1107564590          0  non-token data: last expiry atime
0.000    0    1382400          0  non-token data: last expire atime delta
0.000    0      17827          0  non-token data: last expire reduction count

whereas sa-learn --dump magic from the databases as of 19:00 last night 
(retrieved from the warm standby box) gives

0.000    0          3          0  non-token data: bayes db version
0.000    0       1342          0  non-token data: nspam
0.000    0       2096          0  non-token data: nham
0.000    0     138010          0  non-token data: ntokens
0.000    0 1106096390          0  non-token data: oldest atime
0.000    0 1107544172          0  non-token data: newest atime
0.000    0 1107538029          0  non-token data: last journal sync atime
0.000    0 1107478750          0  non-token data: last expiry atime
0.000    0    1382400          0  non-token data: last expire atime delta
0.000    0       5589          0  non-token data: last expire reduction count



Can anyone shed any light on this?




-- 
Niall



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell