nanog mailing list archives

Re: Mikrotik Cloud Core Router and BGP real life experiences?


From: Jim Shankland <nanog () shankland org>
Date: Fri, 27 Dec 2013 08:26:28 -0800

On 12/27/13 6:40 AM, matt kelly wrote:
They can not handle a full routing table. The load balancing doesn't work.
They can not properly reassemble fragmented packets, and therefore drop all
but the first "piece". They can not reliably handle traffic loads over
maybe 200 Mbps, we needed 4-6 Gbps capacity. They can not hold a gre tunnel
connection.


Can't say anything about MicroTik specifically, but I've used Linux as a routing platform for many years, off and on, and took a reasonably close look at performance about a year ago, in the previous job, using relatively high-end, but pre-Sandy Bridge, generic hardware. We were looking to support ca. 8 x 10 GbE ports with several full tables, and the usual suspects wanted the usual 6-figure amounts for boxes that could do that (the issue being the full routes -- 8 x 10 GbE with minimal routing is a triviality these days).

Routing table size was completely not an issue in our environment; we were looking at a number of concurrent flows in the high-5 to low-6-digit range, and since Linux uses a route cache, it was that number, rather than the number of full tables we carried, that was important. Doing store-and-forward packet processing, as opposed to cut-through switching, took about 5 microseconds per packet, and consumed about that much CPU time. The added latency was not an issue for us; but at 5 us, that's 200Kpps per CPU. With 1500-byte packets, that's about 2.4 Gb/s total throughput; but with 40-byte packets, it's only 64 Mb/s (!).

But that's per CPU. Our box had 24 CPUs (if you count a hyperthreaded pair as 2), and this work is eminently parallelizable. So a theoretical upper bound on throughput with this box would have been 4.8 Mpps -- 57.6 Gb/s with 1500-byte packets, 1.5 Gb/s with 40-byte packets.

The Linux network stack (plus RSS on the NICs) seemed to do quite a good job of input-side parallelism - but we saw a lot of lock contention on the output side. At that point, we abandoned the project, as it was incidental to the organization's mission. I think that with a little more work, we could have gotten within, say, a factor of 2 of the limits above, which would have been good enough for us (though surely not for everybody). Incrementally faster hardware would have incrementally better performance.

OpenFlow, which marries cheap, fast, and dumb ASICs with cheap, slower, and infinitely flexible generic CPU and RAM, seemed, and still seems, like the clearly right approach. At the time, it didn't seem ready for prime time, either in the selection of OpenFlow-capable routers or in the software stack. I imagine there's been some progress made since. Whether the market will allow it to flourish is another question.

Below a certain maximum throughput, routing with generic boxes is actually pretty easy. Today, I'd say that maximum is roughly in the low-single-gigabit range. Higher is possible, but gets progressively harder to get right (and it's not a firm bound, anyway, as it depends on traffic mix and other requirements). Whether it's worth doing really depends on your goals and skill. Most people will probably prefer a canned solution from a vendor. People who grow and eat their own food surely eat better, and more cheaply, than those who buy at the supermarket; but it's not for everybody.

Jim Shankland



Current thread: