nanog mailing list archives

Re: Thousands of hosts on a gigabit LAN, maybe not


From: Brandon Martin <lists.nanog () monmotha net>
Date: Fri, 08 May 2015 15:41:31 -0400

On 05/08/2015 02:53 PM, John Levine wrote:
Some people I know (yes really) are building a system that will have
several thousand little computers in some racks.  Each of the
computers runs Linux and has a gigabit ethernet interface.  It occurs
to me that it is unlikely that I can buy an ethernet switch with
thousands of ports, and even if I could, would I want a Linux system
to have 10,000 entries or more in its ARP table.

Most of the traffic will be from one node to another, with
considerably less to the outside.  Physical distance shouldn't be a
problem since everything's in the same room, maybe the same rack.

What's the rule of thumb for number of hosts per switch, cascaded
switches vs. routers, and whatever else one needs to design a dense
network like this?  TIA

Unless you have some dire need to get these all on the same broadcast domain, those kind of numbers on a single L2 would send me running for the hills for lots of reasons, some of which you've identified.

I'd find a good L3 switch and put no more ~200-500 IPs on each L2 and let the switch handle gluing it together at L3. With the proper hardware, this is a fully line-rate operation and should have no real downsides aside from splitting up the broadcast domains (if you do need multicast, make sure your gear can do it). With a divide-and-conquer approach, you shouldn't have problems fitting the L2+L3 tables into even a pretty modest L3 switch.

Densest chassis switches I know of are going to be gets about 96 ports per RU (48 ports each on a half-width blade, but you need breakout panels to get standard RJ45 8P8C connectors as the blades have MRJ21s) less rack overhead for power supplies, management, etc.. That should get you ~2000 ports per rack [1]. Such switches can be quite expensive. The trend seems to be toward stacking pizza boxes these days, though. Get the number of ports you need per rack (you're presumably not putting all 10,000 nodes in a single rack) and aggregate up one or two layers. This gives you a pretty good candidate for your L2/L3 split.

[1] Purely as an example, you can cram 3x Brocade MLX-16 chassis into a 42U rack (with 0RU to spare). That gives you 48 slots for line cards. Leaving at least one slot in each chassis for 10Gb or 100Gb uplinks to something else, 45x48 = 2160 1000BASE-T ports (electrically) in a 42U rack, and you'll need 45 more RU somewhere for breakout patch panels!
--
Brandon Martin


Current thread: