I have just revamped my spam-filtering techniques
to include the usage of SpamAssassin (v3.1.8).
the Bayesian filter was trained with c.40,000 of
the spams I've received, and with c.20,000 hams;
both the hams and spams cover the last c.5 years.
the training was done in `--local' mode (i.e., no
internet access).
to-date, and using essentially the shipped defaults,
I've had no false-spams, but only c.50% of the spams
are being caught. that is notably lower than I was
hoping for, but we'll see what happens with time
and tweaks. (possibly relevant here, spamd(1) is
currently being run `--local' (this could, perhaps,
be changed?).)
one of the key changes I made was, when refiling a
false-ham as spam, to run `sa-learn --spam' on the
misclassified-spam. and this is where my current
issue is: it's a bit slow, and on occasion, takes
a really really long time (multiple minutes whilst
consuming a great deal of CPU). (this training is
also done `--local', so internet access is not the
problem here.)
re-reading the sa-learn(1) man page, I note there
is a `--no-sync' option which _sounds_ like it may
deal with one or both slowness issues. however,
the manual page is mostly opaque about about the
consequences of using this option, and seems to
suggest that after a series of `sa-learn --no-sync's,
an `sa-learn --sync' ought to be done. IF that is
true, it's an issue: it doesn't fit into my nominal
routine; and IF it's required, then I cannot ensure
it will "always" be done. (besides, I've no clear
idea when or why it's required?)
• so just what is `--no-sync' about?
• is a `--sync' subsequently required? (if so, why?)
• what are the consequences of not (always) doing a
`--sync' afterwards (whether required or not)?
and b.t.w., how safe is it to interrupt (^C) or
suspend (^Z) an überlong `sa-learn --spam'?
cheers!
-blf-
--
Experienced (>25 yrs) kernel/software Eng: | Brian Foster Montpellier,
• Unix, embedded, &tc; • Linux; • doc; | blf at utvinternet.ie FRANCE
• IDL, automated testing, process, &tc. | Stop E$$o (ExxonMobile)!
Résumé (CV) http://www.blf.utvinternet.ie | http://www.stopesso.com
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!