nanog mailing list archives

Re: latency (was: RE: cooling door)


From: Adrian Chadd <adrian () creative net au>
Date: Sun, 30 Mar 2008 13:03:18 +0800


On Sun, Mar 30, 2008, Mikael Abrahamsson wrote:

On Sat, 29 Mar 2008, Frank Coluccio wrote:

Please clarify. To which network element are you referring in connection 
with
extended lookup times? Is it the collapsed optical backbone switch, or the
upstream L3 element, or perhaps both?

I am talking about the matter that the following topology:

server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter 
fiber - switch - 5 meter UTP - server

has worse NFS performance than:

server - 25 meter UTP - switch - 25 meter UTP - server

Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.

This is one of the issues that the server/storage people have to deal 
with.

Thats because the LAN protocols need to be re-jiggled a little to start
looking less like LAN protocols and more like WAN protocols. Similar
things need to happen for applications.

I helped a friend debug an NFS throughput issue between some Linux servers
running Fortran-77 based numerical analysis code and a 10GE storage backend.
The storage backend can push 10GE without too much trouble but the application
wasn't poking the kernel in the right way (large fetches and prefetching, basically)
to fully utilise the infrastructure.

Oh, and kernel hz tickers can have similar effects on network traffic, if the
application does dumb stuff. If you're (un)lucky then you may see 1 or 2ms
of delay between packet input and scheduling processing. This doesn't matter
so much over 250ms + latent links but matters on 0.1ms - 1ms latent links.

(Can someone please apply some science to this and publish best practices please?)



adrian


Current thread: