nanog mailing list archives

Re: scaling linux-based router hardware recommendations

From: Jim Shankland <nanog () shankland org>
Date: Tue, 27 Jan 2015 08:31:09 -0800

On 1/26/15 11:33 PM, Pavel Odintsov wrote:

Hello!

Looks like somebody want to build Linux soft router!) Nice idea for
routing 10-30 GBps. I route about 5+ Gbps in Xeon E5-2620v2 with 4
10GE cards Intel 82599 and Debian Wheezy 3.2 (but it's really terrible
kernel, everyone should use modern kernels since 3.16 because "buggy
linux route cache"). My current processor load on server is about:
15%, thus I can route about 15 GE on my Linux server.

I looked into the promise and limits of this approach pretty intensivelya few years back before abandoning the effort abruptly due to otherconstraints. Underscoring what others have said: it's all about pps, notaggregate throughput. Modern NICs can inject packets at line rate intothe kernel, and distribute them across per-processor queues, etc.Payloads end up getting DMA-ed from NIC to RAM to NIC. There's really noreason you shouldn't be able to push 80 Gb/s of traffic, or more,through these boxes. As for routing protocol performance (BGPconvergence time, ability to handle multiple full tables, etc.): that'sjust CPU and RAM.

The part that's hard (as in "can't be fixed without rethinking thisapproach") is the per-packet routing overhead: the cost of reading thepacket header, looking up the destination in the routing table,decrementing the TTL, and enqueueing the packet on the correct outboundinterface. At the time, I was able to convince myself that being able todo this in 4 us, average, in the Linux kernel, was within reach. That'snot really very much time: you start asking things like "will the entirerouting table fit into the L2 cache?"

4 us to "think about" each packet comes out to 250Kpps per processor;with 24 processors, it's 6Mpps (assuming zero concurrency/lockingoverhead, which might be a little bit of an ... assumption). With1500-byte packets, 6Mpps is 72 Gb/s of throughput -- not too shabby. Butwith 40-byte packets, it's less than 2 Gb/s. Which means that your XeonES-2620v2 will not cope well with a DDoS of 40-byte packets. That's notnecessarily a reason not to use this approach, depending on yoursituation; but it's something to be aware of.

I ended up convincing myself that OpenFlow was the right general idea:marry fast, dumb, and cheap switching hardware with fast, smart, andcheap generic CPU for the complicated stuff.

My expertise, such as it ever was, is a bit stale at this point, and myfigures might be a little off. But I think the general principleapplies: think about the minimum number of x86 instructions, and theminimum number of main memory accesses, to inspect a packet header, do arouting table lookup, and enqueue the packet on an outbound interface. Ican't see that ever getting reduced to the point where a generic servercan handle 40-byte packets at line rate (for that matter, "line rate" isincreasing a lot faster than "speed of generic server" these days).

Jim

Current thread:

Re: scaling linux-based router hardware recommendations, (continued)
- Re: scaling linux-based router hardware recommendations Phil Bedard (Jan 26)
  - Re: scaling linux-based router hardware recommendations David bass (Jan 26)
- Re: scaling linux-based router hardware recommendations Sudeep Khuraijam (Jan 26)
- Re: scaling linux-based router hardware recommendations Pavel Odintsov (Jan 26)
  - Re: scaling linux-based router hardware recommendations Paul S. (Jan 27)
    - Re: scaling linux-based router hardware recommendations Pavel Odintsov (Jan 27)
    - Re: scaling linux-based router hardware recommendations Baldur Norddahl (Jan 27)
    - Re: scaling linux-based router hardware recommendations Phil Bedard (Jan 27)
  - Re: scaling linux-based router hardware recommendations Hugo Slabbert (Jan 27)
    - Re: scaling linux-based router hardware recommendations Eduardo Schoedler (Jan 27)
  - Re: scaling linux-based router hardware recommendations Jim Shankland (Jan 27)
    - Re: scaling linux-based router hardware recommendations Robert Bays (Jan 28)
    - Re: scaling linux-based router hardware recommendations Paul S. (Jan 28)
    - Re: scaling linux-based router hardware recommendations Robert Bays (Jan 28)
    - Re: scaling linux-based router hardware recommendations Charles N Wyble (Jan 28)
    - Re: scaling linux-based router hardware recommendations Colin Johnston (Jan 28)
    - Re: scaling linux-based router hardware recommendations Mark Tinka (Jan 28)
    - Re: scaling linux-based router hardware recommendations Nick Hilliard (Jan 28)
    - Re: scaling linux-based router hardware recommendations Baldur Norddahl (Jan 28)
- Re: scaling linux-based router hardware recommendations Eddie Tardist (Jan 27)
  - Re: scaling linux-based router hardware recommendations Eduardo Meyer (Jan 28)

(Thread continues...)