Interesting People mailing list archives

Re: Comcast blocking mail to its customers


From: David Farber <dave () farber net>
Date: Thu, 16 Oct 2008 04:19:34 -0400



Begin forwarded message:

From: Joel Snyder <Joel.Snyder () Opus1 COM>
Date: October 16, 2008 1:39:50 AM EDT
To: dave () farber net
Cc: ip <ip () v2 listbox com>, johnl () iecc com, tilghman () mail jeffandtilghman com
Subject: Re: [IP] Re:       Comcast blocking mail to its customers

John Levine's point is spot-on: ISPs don't have the luxury of building their services to support that 1% of technology-savvy folks, and anyone who forwards mail has to deal with 'backpressure' from other ISPs when the forward mail that the destination ISP doesn't want to accept, even if it has already been cleaned of spam! (For example, Qwest won't accept mail if the envelope FROM domain name is not in DNS, which does indeed block spam, but also has a fairly high false positive rate.)

But I'd like to respond to Tilghman's point: yes, he's also right (to summarize, he says "you should spam scan at SMTP time"), BUT...

But the reality is that his approach, while technically superior, is not a particularly scalable one. Most anti-spam gateways filter for both spam and viruses, and that takes a lot of CPU time.

Doing reputation-based message refusal (what some folks are calling blacklists or RBLs) at SMTP time is a HUGE benefit and does follow Tilghman's philosophy of block-at-smtp-time (rather than accept-and- blackhole, which is industry standard practice). However, doing the spam filtering, virus filtering, content analysis, and any necessary message splitting, all at SMTP time only really works if you have massively fast CPUs and a fairly low trickle of a mail stream.

The problem is that mail tends to be bursty, and while the average arrival rate may be modest enough for whatever appliance you're using, the peaks are both frequent and high. If you start returning 4xx responses whenever load gets high, you'll have a lot of dissatisfied users, because you'll go into resource conservation mode fairly quickly and frequently. And, given that there is no predictability about when the other side will retry (some in seconds, others in hours), that doesn't work in practice very well.

Some appliance vendors don't follow his advice because they have poor architectures and cannot; others don't because they want to sell the cheapest hardware possible to keep their margins up. But some don't just because it doesn't work that well in real enterprise mail streams.

Obviously, a middle-ground approach would be to block when you can at SMTP time, and if you start falling behind, then fall back to 'old school' and queue the mail for later scanning. There are huge hosted services that do this, but the anti-spam appliance vendors (who control most of the enterprise commercial market) haven't embraced that approach, likely for complexity reasons.

Message splitting also complicates the picture. Sometimes a message will be destined for many recipients (this is not uncommon in spam) and each will have a different policy for sensitivity and action. Having to figure out what the policy is and then applying different actions based on whether all recipients are the same or not, all at SMTP time, is another bit that many anti-spam vendors have avoided chewing.

I think that everyone who is (sane and) active in this space agrees that Tilghman's approach is the best, but it's easy for us guys who use and deploy the products to advocate it; I have found it more difficult to get the developers who write the code to go down that path. This doesn't mean that some vendors aren't doing it, but certainly the dominant appliance vendors are not and probably won't be anytime soon.

jms

David Farber wrote:
Begin forwarded message:
From: Tilghman Lesher <tilghman () mail jeffandtilghman com>
Date: October 15, 2008 7:32:17 PM EDT
To: johnl () iecc com
Cc: dave () farber net
Subject: Re: [IP] Re:      Comcast blocking mail to its customers
On Wednesday 15 October 2008 15:32:50 David Farber wrote:
Begin forwarded message:

From: John Levine <johnl () iecc com>
Date: October 15, 2008 3:12:36 PM EDT
To: dave () farber net
Subject: Re: [IP] Re:      Comcast blocking mail to its customers

My view is that an appropriate AUP for email should be similar to
that of a common carrier or the USPS.  It's a critical service these
days.  Using robotic methods or wholesale IP shutoffs to dump
presumptive spam into the trash is not acceptable for such a
service.

The mail stream that ISPs see is typically 95% spam these days.  That
means 20 spams for every real message, so if they were to accept and
store all the spam, that's more than an order of magnitude increase in
the size and cost of their mail system, which would be passed through
to the customers, most of whom don't want it.  And even if they did,
how much confidence do you have that you could manually sort it
correctly?  I've seen plausible studies that say that mechanical
filters are if anything better than humans at sorting large mail
streams, since mechanical filters' eyes don't glaze over.
I think you missed the part which I consider to be most important, that of
dumping presumptive spam into the trash.  The most correct method of
filtering is to do it at SMTP time and reject the email then, rather than
trying to either a) accept all email and bounce the stuff that is
undeliverable (this is arguably what is most wrong with some MTAs, such as qmail, as it causes the secondary problem of backscatter) or b) accepting all email and tossing the stuff that a mechanical filter thinks is spam (which means that a sender may never be notified that their message was
falsely flagged as spam).
Rejecting at SMTP time guarantees that minimal backscatter bounces are
generated and when an email is rejected as a false positive, the sender has
immediate feedback of the problem and can work to address the issue.
It is really no more computationally expensive than current operations (which have to scan all email anyway, so they might as well do it at SMTP time). In
the case of a flood of email causing problems with scanning (the prime
argument against scanning at SMTP time, that it does not scale), that is easily addressed within the mail protocol, simply by sending a 400- level error, indicating a temporary issue, which good MTAs use as an indication to try the delivery again later. Oddly, a 400-level error stops many spam bots in their tracks, which will never reattempt delivery of the same message upon
receiving the first error.

--
Joel M Snyder, 1404 East Lind Road, Tucson, AZ, 85719
Senior Partner, Opus One       Phone: +1 520 324 0494
jms () Opus1 COM                http://www.opus1.com/jms




-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com


Current thread: