nanog mailing list archives

Re: What does 95th %tile mean?


From: "David P. Maynard" <dpm () flametree com>
Date: Fri, 20 Apr 2001 10:16:19 -0500



[ On Friday, April 20, 2001 at 00:52:39 (-0400), Charles Sprickman wrote: ]
Subject: RE: What does 95th %tile mean?

On Thu, 19 Apr 2001, Greg A. Woods wrote:

Neither MRTG nor Cricket (nor anything with RRDtool or anything similar
underlying it), in their standard released form, are truly suitable for
accounting purposes since they both can introduce additional averaging
errors.  You need to keep all of the original sample data.

This actually works pretty well:

http://www.seanadams.com/95/

If you read that page carefully you'll note that he's using a modified
version of MRTG that doesn't average its samples.  As it says:

   This is a patch to add 95th percentile metering to MRTG. This is not as
   simple a feature as one might think. MRTG normally saves only one day
   worth of 5-minute samples. It is not possible to accurately calculate the
   95th percentile without having all of the samples for a one month period.
   In order to calculate the 95th percentile for a 30-day period, it is
   necessary to save an entire 30 days worth of the 5-minute samples.

MRTG does not do that by default, nor does Cricket, nor will any tool
using RRDtool as an underlying database.

You need to use the "old" MRTG without RRDTOOL to avoid the averaging.  It maintains an accurate timestamp for the 
previous sample so that the data store in the table is accurate even if there was some jitter in the collection 
interval.  You still do need to maintain backup logs so that you have the entire month's of 5-minute samples.

I have tried arguing against the "corrections" that RRDTOOL makes to data, but the only suggested "fix" is to lie to 
RRDTOOL about the timestamp.  I understand that the old MRTG database is "wrong" since the timestamp it stores in the 
database is not the actual sample collection time.  However, for most of the things I want to do, I prefer to know what 
the real data was at the collection point closest to the time of interest instead of what the data "should" have been 
if it was collected at precisely the right time.

As for the original topic, we used Alex's (max(in,out)) definition of 95% percentile billing.  I always thought that 
the in+out method was a little "sleazy" since the explanation is usually buried in some fine print and people who 
aren't careful can be easily tricked into making an invalid provider comparison.

For the journal of meaningless statistics, we found that over time, the average (mean) usage for our "typical" 
colocation customer was 69-72% of the 95% value.

The 95% measure definitely isn't the answer to all problems.  It does address some problems that "actual usage" doesn't 
though.  Mainly, if you bill based on actual usage, customers can get very nervous that things like smurf attacks that 
are out of their control will send their bill through the roof.  Depending on your customers and your business model, 
there are other ways to deal with the problem though.  FWIW, I expect that the 95% model will slowly be phased out as 
the industry matures.

-dpm

-- 
 David P. Maynard, CTO
 OutServ.net, Inc. -- The e-Business Operations Solution [TM]
 EMail: dmaynard () outserv net,  Tel: +1 512 977 8918,  Fax: +1 512 977 0986
--




Current thread: