nanog mailing list archives

Re: help needed - state of california needs a benchmark - beware bufferbloat


From: Jim Gettys <jg () freedesktop org>
Date: Mon, 31 Jan 2011 09:58:53 -0500

On 01/29/2011 01:00 PM, Mike wrote:
Hello,

My company is small clec / broadband provider serving rural communities
in northern California, and we are the recipient of a small grant from
the state thru our public utilities commission. We went out to 'middle
of nowhere' and deployed adsl2+ in fact (chalk one up for the good
guys!), and now that we're done, our state puc wants to gather
performance data to evaluate the result of our project and ensure we
delivered what we said we were going to. Bigger picture, our state is
actively attempting to map broadband availability and service levels
available and this data will factor into this overall picture, to be
used for future grant/loan programs and other support mechanisms, so
this really is going to touch every provider who serves end users in the
state.

The rub is, that they want to legislate that web based 'speedtest.com'
is the ONLY and MOST AUTHORITATIVE metric that trumps all other
considerations and that the provider is %100 at fault and responsible
for making fraudulent claims if speedtest.com doesn't agree. No
discussion is allowed or permitted about sync rates, packet loss,
internet congestion, provider route diversity, end user computer
performance problems, far end congestion issues, far end server issues
or cpu loading, latency/rtt, or the like. They are going to decide that
the quality of any provider service, is solely and exclusively resting
on the numbers returned from 'speedtest.com' alone, period.

All of you in this audience, I think, probably immediately understand
the various problems with such an assertion. Its one of these situations
where - to the uninitiated - it SEEMS LIKE this is the right way to do
this, and it SEEMS LIKE there's some validity to whats going on - but in
practice, we engineering types know it's a far different animal and
should not be used for real live benchmarking of any kind where there is
a demand for statistical validity.

My feeling is that - if there is a need for the state to do
benchmarking, then it outta be using statistically significant
methodologies for same along the same lines as any other benchmark or
test done by other government agencies and national standards bodies
that are reproducible and dependable. The question is, as a hotbutton
issue, how do we go about getting 'the message' across, how do we go
about engineering something that could be considered statistically
relevant, and most importantly, how do we get this to be accepted by
non-technical legislators and regulators?

Mike,

For general tests of most things an ISP does, ICSI's netalyzr tests can't be beat.

http://netalyzr.icsi.berkeley.edu/

There are also tests at m-lab that may be useful: http://www.measurementlab.net/

As in all pieces of software, these may have bugs; netalyzr was under detecting bufferbloat on high bandwidth links until recently; this should be fixed now, I hope.

And SamKnows is doing the FCC broadband tests.

The speedtest.net tests (and pingtest.net) are good as far as they go (and you can host them someplace yourself; as others have noted, having and endpoint at someplace you control is wise); but they don't tell the whole story: they miss a vital issue that has been hidden.

Here's the rub:

Most tests have focussed on bandwidth (now misnamed "speed" by marketing, which it isn't).

Some tests have tested latency.

But there have been precious few that test latency under load, which is how we've gotten into a world of hurt on broadband over the last decade, where we now have a situation where a large fraction of broadband has latencies under load measured in *seconds*. (See: http://gettys.wordpress.com/ and bufferbloat.net). These both make for fuming retail customers, as well as lots of service calls (I know, I generated quite a few myself over the years). This is a killer for lots of applications, VOIP, teleconferencing, gaming, remote desktop hosting, etc.

Netalyzr tries to test for excessive buffering, as does at least one of the mlabs tests.

Dave Clark and I have been talking to SamKnows and Ookla to try to get latency under load tests added to the mix. I think we've been having some traction at getting such tests added, but it's slightly too soon to tell.

We also need tests to identify ISP's failing to run queue management internal to their networks, as there is both research and anecdotal data that shows that that is also much more common than it should be. Some ISP's do a wonderful job, and others don't; Van Jacobson believes this is because Sally Floyd and his classic RED algorithm is buggy, and tuning it has scared many operators off; I believe his explanation.

So far, so bad.

Then there is the home router/host disaster:

As soon as you move the bottleneck link from the broadband hop to the 802.11 link usually beyond it these days (by higher broadband bandwidth, or by having several chimneys in your house as I do, dropping the wireless bandwidth), you run into the fact that home routers and even our operating systems sometimes have even worse buffering than the broadband gear, sometimes measured in hundreds or even thousands of *packets*.

We're going to need to fix the home routers and user's operating systems. For the 802.11 case, this is hard; Van says RED won't hack it, and we need better algorithms, whether Van's unpublished nRED algorithm or Doug Leith's recent work.

So you need to ensure the regulators' understand that doing testing carefully enough to know what you are looking at is hard. Tests not done directly at the broadband gear may mix this problem with the broadband connection.

This is not to say tests should not be done: we're not going to get this swamp drained without the full light of day on the issue; just that current speedtest.net tests misses this entire issue right now (though may detect it in the future), and that the tests (today) aren't something you "just run" and get a simple answer, since the problem can be anywhere in a path.

Maybe there will be tests that "do the right thing" for regulators in a year or two; but not now: the tests today don't identify which link is at fault, and that the problem can easily be entirely inside the customer's house, if the test tests for bufferbloat at all.

I think it very important we get tests together that not only detect bufferbloat (which is very easy to detect, once you know how), but also point to where in the network the problem is occurring, to reduce the rate of complaints to something manageable, where everyone isn't having to field calls for problems they aren't responsible for (and unable to fix).

You can look at a talk about bufferbloat I gave recently at:
http://mirrors.bufferbloat.net/Talks/BellLabs01192011/

Let me know if I can be of help. People who want to help the bufferbloat problem please also note we recently opened a bufferbloat.net web site to help collaboration on this problem.

                        Best regards,
                                Jim Gettys
                                Bell Labs


On 01/06/2011 01:50 PM, Van Jacobson wrote:
> Jim,
>
> Here's the Doug Leith paper I mentioned. As I said on the phone I
> think there's an easier, more robust way to accomplish the same
> thing but they have running code and I don't. You can get their
> mad-wifi implementation at
> http://www.hamilton.ie/tianji_li/buffersizing.html
>
>   - van



Current thread: