Interesting People mailing list archives

Re: The Iron Laws of Network Cost Scaling


From: David Farber <dave () farber net>
Date: Wed, 26 May 2010 20:05:11 -0400



Begin forwarded message:

From: Karl Auerbach <karl () cavebear com>
Date: May 26, 2010 3:57:44 PM EDT
To: dave () farber net
Cc: craig () trader name
Subject: Re: [IP] The Iron Laws of Network Cost Scaling
Reply-To: karl () cavebear com

On 05/26/2010 11:37 AM, David Farber wrote:

From: "W. Craig Trader"<craig () trader name>
Date: May 26, 2010 2:25:20 PM EDT
Subject: For IP: The Iron Laws of Network Cost Scaling

2. Network costs scale primarily with the number of troubleshooters
required to run them, not with capacity.

One might think, therefore, that the industry would be concerned about the quality of networking software and the 
dearth of test tools and test points.

I test network software at my company (InterWorking Labs - http://www.iwl.com/), not so much for conformance as for 
robustness.

We see a lot of bad code.  It make us look at the internet and perceive it as a highway filled with a lot of Trabants 
and Yugos with their wheels wobbling and doors ready to fall off.

On the internet we often don't architect-in things that telco's have long found to be important - things like remote 
loopback testing.

Way back in the early 1990's I developed a tool, an internet "buttset" that was intended to be used by field 
troubleshooters to get in, get working, and get done.  Yet even today I see network installers come out to the field 
with nothing more than a laptop running windows (with a long boot-up and configuration time) using nothing more than 
the same tools we've had since ancient times (ICMP-echo/ping, traceroute, etc.)

And I am amazed that even those tools haven't evolved much - for example it would save a lot of time if there were an 
option on a typical traceroute command to say "skip the first N" hops (I had that in my 1993 buttset.)

(And it is surprising how few people realize that on a broadcast domain, such as an access link, it is often better to 
try an initial reachability test via "arp" rather than ICMP echo.)

More recently I've been working on an internet demarcation device - essentially a bit of trustable code that can be 
placed at the place that divides ISP responsibility from customer responsibility.  This device would contain some 
semi-autonomous mechanisms to monitor the path status (particularly when the link is perceived to be 'down') and allow 
providers to know what their net access looks like from the customer point of view without asking Grannie to open a 
windows command window and run "tracert".

Security and troubleshooting are often at odds - Security tries to hide and block; troubleshooting and repair require 
synoptic views, often involving the correlation of different kinds of events at different levels of abstraction.

Net troubleshooting, like surgery, involves sharp instruments that, if wielded by the wrong people or incorrectly, 
could cause harm.  We don't take scalpels away from surgeons; we need to let net troubleshooting and repair people have 
equally sharp tools of their trade.

I don't want the concern for security to drive network troubleshooting and repair to become a priesthood of a few 
people who are denominated as "trustworthy" (but not necessarily competent), but it may come to that.

I've long advocated that we take a cue from the medical community and begin to create a discipline of network pathology 
- so that we can at begin to understand the relationship between symptoms and causes, even if only on a gross 
probabilistic level.

Even back in 1995 I had code that reasoned from symptoms to suggest possible causes.  More recently I designed and 
implemented a prototype engine that would exercise protocol implementations and generate not only a list of possible 
causative factors.  The next step was to have been to instigate tests to explore those potentially causative factors.  
It has been disappointing how little progress has been made in the intervening 15+ years.

Because networks suffer from the Roshomon effect - the network looks different to different people at different places 
and different times - a more refined network troubleshooting system might need a means to ask other parts of the net 
for information about "what does thing X look like from your point of view".

To that end I have suggested that those of us who build network tools ought to consider a kind of software/protocol 
interchange bus to support diagnostics and testing - it would have to be very asynchronous and security is a big issue.

We have a hint of such a system with the informal array of traceroute and route view servers that people (mainly ISPs) 
have deployed around the net.

The problem of internet maintenance and troubleshooting is going to get worse as we move into new technologies and 
cloud computing - even things as simple as internationalized domain names are going to cause pain to the eyeballs of 
people who try to elicit meaning from host/router names.  And with cloud computing, with applications moving, 
splitting, and recombining, it's going to be harder to know if we are actually looking at the part that is causing 
trouble.  (Just consider how "anycast" routing has already made it more difficult to understand whether domain name 
servers are running accurately or not.)  And virtual networks between virtual machines are annoying difficult to watch 
for troubleshooting purposes.

And all of this is happening as users increasingly believe that the internet is a lifeline grade utility.

It's going to be fun.

                --karl--






-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
RSS Feed: https://www.listbox.com/member/archive/rss/247/
Powered by Listbox: http://www.listbox.com


Current thread: