Flutterby™! : Spam solutions

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Spam solutions

2003-07-30 01:37:26.65951+00 by Dan Lyke 8 comments

As of this morning on the ferry I decided it was time to get serious about unsolicited email. I've installed SpamAssassin, but that's just rule based. Igor pointed out CRM114 - the Controllable Regex Mutilator, with a couple of learning algorithms. Looks like it might be a good "in addition to".

[ related topics: Spam ]

comments in ascending chronological order (reverse):

#Comment Re: Spam solutions made: 2003-07-30 01:49:29.555615+00 by: ghasty

Yes, I have SpamAssasin running here...but still get the crap. Have a friend that loves ASSP (http://assp.sourceforge.net/) but haven't had a chance to play with it...

#Comment Re: Spam solutions made: 2003-07-30 02:41:29.332826+00 by: Pete

Turn on the RBL stuff; it's good. So's the AWL. The defaults on the bayes are beyond unhelpful, so treat bayes with great suspicion. Jack the base 64 enc text penalty through the roof. kill the user agent bonuses (the spammers have wised up) kill the quoted text and in rep to tests. also jack up the web bugs penalty, but also supercharge the protection afforded to evites (missing parties because of your SA setup is pretty fucking wrong).

My threshhold is at 3.7, but I'm looking at dropping it to 2.0 (remember that SA scores do go negative). I've had exactly two false spam taggings out of maybe 3000 actaul spams. One was an evite before increasing the credit for them, and one was from somebody's horribly misconfigured edu mail server.

#Comment Re: Spam solutions made: 2003-07-30 03:33:19.127812+00 by: dws

A fairly simple set of rules catches 85% of my spam. HTML-only email gets set aside (no false positives here in over 6,000 emails). Email that's got a base64-encoded test/plan or text/html part gets set aside (no false positives in over 1,000). A handful of procmail rules--a combination of blacklist rules and subject patterns--catch the final 15%, with only a few false positives. That said, the remaining 15% is a nuisance that's been getting worse to the point of maybe following your lead.

#Comment Re: [Entry #6391] Re: Spam solutions made: 2003-07-30 09:26:03.900296+00 by: Unknown, from NNTP

In article <flutterbycomweblogcomment$18260@mail.flutterby.com>, dws <prefersanonymity_@flutterby.com> wrote: >A fairly simple set of rules catches 85% of my spam. HTML-only email gets set >aside (no false positives here in over 6,000 emails). Email that's got a >base64-encoded test/plan or text/html part gets set aside (no false positives >in over 1,000). A handful of procmail rules--a combination of blacklist rules >and subject patterns--catch the final 15%, with only a few false positives. >That said, the remaining 15% is a nuisance that's been getting worse to the >point of maybe following your lead.

I can really recommend Bogofilter.

A sizable number of the log emails I get from various machines at work are base64 encoded UTF-8, so I can't just junk on that basis -- bogofilter parses html and base-64 encodings first before breaking the email into words for its spaminess decisions.

Bogofilter catches something like 99.5% of the spam I get at work, with no false positives so far. (Excepting the amanda backup logs which I hadn't told it about in the first place -- too many numbers and no good words I suspect. Now that it knows about them it's happy again).

Phil -- http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

#Comment Re: Spam solutions made: 2003-07-30 13:29:57.975173+00 by: meuon

http://www.tmda.net - works well. Under rapid development, but worth using.

#Comment Re: Spam solutions made: 2003-07-30 14:57:10.134179+00 by: ebradway

Whenever I get around to rebuilding Scrounge, it'll provide TDMA for the users. I'm on a tirade against the faster/better/cheaper mentality. I can deal with having my email delayed a little bit and if some gets annoyed at having to deal with my authorization responses, then I don't need email from them anyway.

Filtering, I think, starts to become a game for a while. Why can't the filter work 1/100th as good as my eyes can. I can scan 50 emails in about 3 seconds. It takes me longer to select/deselect the real exceptions. I can get 95% of the spam from the subject and from (what I see in the folder view). If I could see the full header, I could probably eliminate another 95% and the body of the text would pick up another 95%. I can deal with letting 5% of 5% of 5% of the spam through...

I'll have to remember this the next time someone tries to convince me that computers are almost as smart people.

#Comment Re: Spam solutions made: 2003-07-30 14:59:59.650753+00 by: topspin

Not being a geek, I take a low tech approach to this... using easily created mail rules.

Folks who occasionally send emails are filtered to an "occasional emailer" folder. That's a big ol' email rule that gets added to regularly. Folks/groups who are frequent or important senders get their own folder and rule. As I develop a conversation with someone not in the above group, they are filtered to the "current conversations" folder.... another big rule that occasionally gets edited to drop folks I no longer swap emails with.

Basically, my low tech answer is to filter known stuff outta the inbox, then scan the inbox for garbage (and 99+% is, on a given day) and delete. It ain't geeky, but it works quickly and easily for me. 'Course I also have the excellent work of meuon and crew on my side.

#Comment Re: Spam solutions made: 2003-07-30 17:55:06.767118+00 by: Shawn

We've been using POPFile (http://popfile.sourceforge.net/) here for about the last month and I am absolutely sold on it. It's basically a proxy and web admin interface run by perl scripts. Works on both *nix and Windows. It's easy to set up, simple to use and I don't worry about spam any more.

The only thing I don't have it set up to handle gracefully is SSH tunneling, but it shouldn't be too involved to work something like that out.