nanog mailing list archives

Re: Comment spammers chewing blogger bandwidth like crazy


From: "Alexander Harrowell" <a.harrowell () gmail com>
Date: Tue, 16 Jan 2007 09:42:32 +0000


Frisvold: How does this make his assumption incorrect?  Spam is spam and DNSBLs
will likely be very effective when it comes to stopping comment spam.
There are, of course, some severe problems with using a DNSBL as a
blocklist for comments...

  But there's a major problem here...  A DNSBL is a source blocklist.
Since the current trend in spam (comment and smtp) is to use botnets,
then by blocking the bots, you also block the users who would make
meaningful comments.

Especially as bots are usually found in customer dynamic-IP pools.
Assigning a value for relative spamminess by country would work up to
a point (Italy, Ukraine, we mean you) but the false positive rate is
unacceptable. Anyway, very anti-Internet and hardly appropriate for a
blog whose declared mission is pan-European opinion..

The argument there is that those users don't deserve to comment if
they can't keep their computers clean, but let's get real..  Some of
this stuff is getting pretty advanced and it's getting tougher for
general users to keep their computers clean.

I think a far better system is something along the lines of a SURBL
with word filtering.  I believe that Akismet does something along
these lines.

We had a word filter plus lookups of bsb.spamlookup.net. Our
experience in the last few months was not good - the rate of false
positives was high (essentially all genuines had to be individually
approved, and worse, rather than into a queue they usually went into
the spamtrap) and the rate of false negatives was nontrivial.

We have recently implemented Akismet. It's a major improvement - the
false positives have been nearly eliminated and the false negatives
down to a couple a week. Multi-layered defence is a "must" - for
example, most comments spam is very self-similar, so you could run a
Bayesian filter comparing the stuff rejected by the blocklist with the
content of the trap in order to sort between "spam" and "hold for
approval".

Mind you, some of the Bayesian-beating techniques used for SMTP spam
are now showing up in comments - for example, delivering the
beneficiary link and a paragraph of news scraped from news.bbc.co.uk,
which is a lot like a real (but dull:-)) comment. Perhaps a better
filter might be on the links they contain (some domains come up again,
and again, and again).

Then again, once you're doing anything like that, it's already hit
your server and is costing cycles if nothing else. In the future,
someone will lose the vote through being mistaken for a spambot.

Alex


Current thread: