LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] More SA Bayes questions

[ILUG] More SA Bayes questions

Justin Mason jm at jmason.org
Tue Feb 8 02:20:35 GMT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


alright, I'll bite...

it looks like somehow the locking failed, and it allowed you to overwrite
parts of the db while another process wrote at the same time; obv. this
isn't supposed to be possible. ;)  (is this in 3.0.x? or 2.6x?)

- --j.

Niall O Broin writes:
> I sent this the other day, but nobody bit. In the hopes that it was the 
> weekend lull, I'm repeating myself :-)
> 
> Thanks to contributions from a couple of people the other day I came up 
> with
> this little script to produce a small report on the Bayes DB:
> 
> echo Spam Assassin Bayes Statistics
> echo ""
> echo Bayes Token Count
> echo "Total     Ham     Spam"
> sa-learn --dump |awk '{count += 1; if ($0 > 0.5) spam+=1; \
> if ($0 < 0.5) ham+=1} END {print count "\t" ham "\t" spam}'
> echo ""
> echo -n "Number of ham messages learnt from: "
> sa-learn --dump magic |awk '/nham/ {print $3}'
> echo -n "Number of spam messages learnt from: "
> sa-learn --dump magic |awk '/nspam/ {print $3}'
> 
> which runs at tne end of a script which sa-learns spam placed in 
> folders by
> humans during the day. After doing its nightly run, it reported as 
> follows:
> 
> Spam Assassin Bayes Statistics
> 
> Bayes Token Count
> Total   Ham     Spam
> 140114  78443   61671
> 
> Number of ham messages learnt from: 2109
> Number of spam messages learnt from: 1387
> 
> I then fed sa-learn something over 1000 pieces of ham, and now the same 
> script
> gives me:
> 
> Spam Assassin Bayes Statistics
> 
> Bayes Token Count
> Total   Ham     Spam
> 153518  10      153508
> 
> Number of ham messages learnt from: 2850
> Number of spam messages learnt from: 0
> 
> AARGH! - what the hell has happened there. It has forgotten about ALL 
> the spam
> messages it ever learnt from, apparently, but conversely, 78000 ham 
> tokens
> have become spam tokens. Did SA somehow choke on all that ham?
> 
> Straight sa-learn --dump magic now gives
> 
> 0.000    0          3          0  non-token data: bayes db version
> 0.000    0          0          0  non-token data: nspam
> 0.000    0       2850          0  non-token data: nham
> 0.000    0     153508          0  non-token data: ntokens
> 0.000    0 1091609393          0  non-token data: oldest atime
> 0.000    0 1107564300          0  non-token data: newest atime
> 0.000    0 1107564852          0  non-token data: last journal sync 
> atime
> 0.000    0 1107564590          0  non-token data: last expiry atime
> 0.000    0    1382400          0  non-token data: last expire atime 
> delta
> 0.000    0      17827          0  non-token data: last expire reduction 
> count
> 
> whereas sa-learn --dump magic from the databases as of 19:00 last night
> (retrieved from the warm standby box) gives
> 
> 0.000    0          3          0  non-token data: bayes db version
> 0.000    0       1342          0  non-token data: nspam
> 0.000    0       2096          0  non-token data: nham
> 0.000    0     138010          0  non-token data: ntokens
> 0.000    0 1106096390          0  non-token data: oldest atime
> 0.000    0 1107544172          0  non-token data: newest atime
> 0.000    0 1107538029          0  non-token data: last journal sync 
> atime
> 0.000    0 1107478750          0  non-token data: last expiry atime
> 0.000    0    1382400          0  non-token data: last expire atime 
> delta
> 0.000    0       5589          0  non-token data: last expire reduction 
> count
> 
> Can anyone shed any light on this?
> 
> --
> Niall
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCCCHzMJF5cimLx9ARAkU2AKCbjvM2dcQV8hlI+gcyAsluwsBosgCguDl2
9o6oDPnBhL7/SEdlgQrw8ME=
=vyx2
-----END PGP SIGNATURE-----




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell