LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] [Q] `sa-learn --no-sync': what does it (not?) do, and is `--sync' then needed?

[ILUG] [Q] `sa-learn --no-sync': what does it (not?) do, and is `--sync' then needed?

Justin Mason jm at jmason.org
Tue Apr 17 09:45:27 IST 2007


Brian Foster writes:
>  I have just revamped my spam-filtering techniques
>  to include the usage of SpamAssassin (v3.1.8).
>  the Bayesian filter was trained with c.40,000 of
>  the spams I've received, and with c.20,000 hams;
>  both the hams and spams cover the last c.5 years.
>  the training was done in `--local' mode (i.e., no
>  internet access).

Wow, that's quite a lot!  I'd suggest that it'd be fine (and faster) with
just the most recent 1000 or so.

>  to-date, and using essentially the shipped defaults,
>  I've had no false-spams, but only c.50% of the spams
>  are being caught.  that is notably lower than I was
>  hoping for, but we'll see what happens with time
>  and tweaks.  (possibly relevant here, spamd(1) is
>  currently being run `--local' (this could, perhaps,
>  be changed?).)

Yep, definitely change this -- unfortunately spam has evolved to really
require it.  Allowing network lookups will have a much greater
effect on accuracy than training will.

>  one of the key changes I made was, when refiling a
>  false-ham as spam, to run `sa-learn --spam' on the
>  misclassified-spam.  and this is where my current
>  issue is:  it's a bit slow, and on occasion, takes
>  a really really long time (multiple minutes whilst
>  consuming a great deal of CPU).  (this training is
>  also done `--local', so internet access is not the
>  problem here.)
> 
>  re-reading the sa-learn(1) man page, I note there
>  is a `--no-sync' option which _sounds_ like it may
>  deal with one or both slowness issues.  however,
>  the manual page is mostly opaque about about the
>  consequences of using this option, and seems to
>  suggest that after a series of `sa-learn --no-sync's,
>  an `sa-learn --sync' ought to be done.  IF that is
>  true, it's an issue:  it doesn't fit into my nominal
>  routine; and IF it's required, then I cannot ensure
>  it will "always" be done.  (besides, I've no clear
>  idea when or why it's required?)
> 
>   • so just what is `--no-sync' about?
>   • is a `--sync' subsequently required?  (if so, why?)
>   • what are the consequences of not (always) doing a
>      `--sync' afterwards (whether required or not)?

--no-sync will write changes to a journal; you do, then, need
to run --sync later to synchronise the journal to the DB.
If you don't run --sync, the changes will not be reflected
in your scan results.

>  and b.t.w., how safe is it to interrupt (^C) or
>  suspend (^Z) an überlong `sa-learn --spam'?

Both are safe -- just don't "kill -9", or it'll leave lock
files in your ~/.spamassassin dir.

--j.



More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell