nanog mailing list archives

RE: 95th Percentile again!


From: woods () weird com (Greg A. Woods)
Date: Sun, 3 Jun 2001 13:55:36 -0400 (EDT)


[ On Saturday, June 2, 2001 at 23:59:17 (-0700), David Schwartz wrote: ]
Subject: RE: 95th Percentile again!

      I don't agree that this is so for 95th percentile. Exactly which five
minute interval a packet is counted in will affect the results. There is no
way to totally agree on which such interval a packet belongs in. Similarly,
where the five-minute intervals begin and end is arbitrary and affects the
final numbers.

Perhaps you should sit down with a table of numbers and compare the
results by hand.  I think you'll find that you are gravely mistaken.

(I can provide you with some raw numbers that are guaranteed to have
been sampled out-of-sync at the ends of the same pipe if you'd like.)

The only time there can ever be a descrepancy is at the "edge".  I.e. if
during the last sample time in the billing period the ISP sees a huge
count of bytes, but the customer (because his last full sample was five
minutes less one second before the end of the period) sees zero bytes,
*AND* iff this one large sample throws the Nth percentile calculation
for the entire billing period up over the next billing increment, then
the lack of syncronisation will cause a "problem" (for the customer in
this case :-).  However the chances of this kind of error happening in
real life are so tiny as to be almost impossible (at least if the
billing period is orders of magnitude larger than the sample period,
which of course is what we're supposing here).  I count over three
orders of magnitude difference for a 30-day billing period and a 5-min
sample period.

For the customer it's easy to avoid too -- just unplug your network
(scheduled down time) during the 10-minute period between billing cyle
roll-overs.  :-)

      The interface byte counters won't tell you where the packets went.

Clearly if the ISP is at one end of the pipe and the customer's at the
other then the out/in (and in/out at the other end) counters are an
extremely accurate count of where the packets went!

Obviously such a scheme "limits" in some ways the viable alternatives
for connecting customers, and it certainly forces you to do your data
collection at specific points.

So any
such billing scheme would be based ultimately upon statistical sampling.

Please try and talk sense man!  Regardless of what you're buying or
selling there's absolutely NOTHING "statistical" about byte counting!

It's pure accounting, plain and simple.  It's 100% auditable and
100% verifiable too!

The
provider would determine that typically some of your packets are local and
cost very little and some are remote and may cost much more. Rather than
counting each packet and figuring out its cost, the provider relies upon
prior statistical sampling to come up with some 'average' cost which he
bills you on the basis of.

The only way to do that is to count flows instead of bytes and the only
way I know of doing that is indeed based only on statistical sampling.

Any customer who'd be willing to suffer under such a scheme is either
not very clueful or getting one heck of a deal on their pricing....

      Sometimes what happens in this case is the customer or the provider realize
that this particular traffic pattern does not match the statistical sample
on which the billing was based. Richard Steenbergen told me a story about a
company that colocated all their servers at POPs of the same provider and
paid twice for traffic between their machines. Needless to say, they had to
negotiate new pricing. Why? Because their traffic pattern made the
statistical sampling upon which their billing was based inappropriate.

You're talking apples and oranges -- please stop mis-directing the topic
in an apparent attempt to "call the kettle black".

      If a billing scheme were not based upon statistical sampling, it would
require the provider to somehow accurately determine how much each packet
cost him to get to you or handoff from you and bill you based upon that on
something like a cost plus basis.

Iff.  but that's not what we're talking about here.

      I agree, but all of the alternatives are ultimately based upon statistical
sampling. NetFlow, for example, loses a certain percentage of the packets
because it's UDP based. The provider compensates for this by raising his
rates. If he expects 3% of his accounting records to be lost, he raises his
rates to 103% hoping that he'll get a fair statistical sample. If this
assumption is violated, for example if packets are more likely to drop at
peak times and a particular customer passes most of their traffic at peak
times, then the statistical assumptions upon which the billing is based will
be violated, and the ISP will get taken advantage of.

Duh.  But this isn't what we're talking about.

      If he counts bytes out an Ethernet port, he'll be billing you for some
broadcast traffic that costs him nothing. He'll be billing you for some
local traffic that costs him nothing. He'll be billing you for some
short-range traffic that costs him very little. But he uses statistical
sampling to come up with some 'per byte' cost. If, for example, most of a
particular customer's traffic is from another customer in the same POP,
again the statistical assumptions upon which the billing is based will be
violated, and the customer will likely have to negotiate some other billing
mechanism.

I don't see the problem.  It's a very simple matter to adjust the
pricing to fit.  You can do some "statistical sampling" to set the
price, just like anyone might do in any form of cost estimation, but
what's on the invoice in the end is a pure accounting of the actual
traffic.  You can do the same for packet loss too.  It's only the
price/unit that's based on statistical sampling and cost estimates.  Why
is this so difficult for some people to understand?

      Every billing scheme I have ever seen has been based upon statistical
sampling. The closest to an exception I've seen is Level3's distance-based
scheme.

You've obviously never looked beyond the silly schemes you're apparently
stuck on talking about.  I know of many billing systems that are based
on pure bulk-throughput accounting and several that are based on true
Nth percentile usage.  None of them, not a single one, are based on
statistical samples of anything -- *ALL* are pure 100% byte-counting and
all of them count each and every byte.

-- 
                                                        Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods () acm org>     <woods () robohack ca>
Planix, Inc. <woods () planix com>;   Secrets of the Weird <woods () weird com>


Current thread: