nanog mailing list archives

Re: Thousands of hosts on a gigabit LAN, maybe not


From: Lamar Owen <lowen () pari edu>
Date: Sat, 9 May 2015 17:06:22 -0400

On 05/08/2015 02:53 PM, John Levine wrote:
...
Most of the traffic will be from one node to another, with
considerably less to the outside.  Physical distance shouldn't be a
problem since everything's in the same room, maybe the same rack.

What's the rule of thumb for number of hosts per switch, cascaded
switches vs. routers, and whatever else one needs to design a dense
network like this?  TIA

You know, I read this post and immediately thought 'SGI Altix'........ scalable to 512 CPU's per "system image" and 20 images per cluster (NASA's Columbia supercomputer had 10,240 CPUs in that configuration.....twelve years ago, using 1.5GHz 64-bit RISC CPUs running Linux.... my, how we've come full circle.... (today's equivalent has less power consumption, at least....)). The NUMA technology in those Altix CPU's is a de-facto 'memory-area network' and thus can have some interesting topologies.

Clusters can be made using nodes with at least two NICs in them, and no switching. With four or eight ports you can do some nice mesh topologies. This wouldn't be L2 bridging, either, but a L3 mesh could be made that could be rather efficient, with no switches, as long as you have at least three ports per node, and you can do something reasonably efficient with a switch or two and some chains of nodes, with two NICs per node. L3 keeps the broadcast domain size small, and broadcast overhead becomes small.

If you only have one NIC per node, well, time to get some seriously high-density switches..... but even then how many nodes are going to be per 42U rack? A top-of-rack switch may only need 192 ports, and that's only 4U, with 1U 48 port switches. 8U you can do 384 ports, and three racks will do a bit over 1,000. Octopus cables going from an RJ21 to 8P8C modular are available, so you could use high-density blades; Cisco claims you could do 576 10/100/1000 ports in a 13-slot 6500. That's half the rack space for the switching. If 10/100 is enough, you could do 12 of the WS-X6196-21AF cards (or the RJ-45 'two-ports-per-plug' WS-X6148X2-45AF) and get in theory 1,152 ports in a 6513 (one SUP; drop 96 ports from that to get a redundant SUP).

Looking at another post in the thread, these moonshot rigs sound interesting.... 45 server blades in 4.3U. 4.3U?!?!? Heh, some custom rails, I guess, to get ten in 47U. They claim a quad-server blade, so 1,800 servers (with networking) in a 47U rack. Yow. Cost of several hundred thousand dollars for that setup.

The effective limit on subnet size would be of course broadcast overhead; 1,000 nodes on a /22 would likely be painfully slow due to broadcast overhead alone.


Current thread: