nanog mailing list archives
Re: maximum ipv4 bgp prefix length of /24 ?
From: Saku Ytti <saku () ytti fi>
Date: Sat, 30 Sep 2023 09:26:47 +0300
On Fri, 29 Sept 2023 at 23:43, William Herrin <bill () herrin us> wrote:
My understanding of Juniper's approach to the problem is that instead of employing TCAMs for next-hop lookup, they use general purpose CPUs operating on a radix tree, exactly as you would for an all-software
They use proprietary NPUs, with proprietary IA. Which is called 'Trio'. Single Trio can have hundreds of PPEs, packet processing engines, these are all identical. Packets are sprayed to PPEs, PPEs do not run constant time, so reordering occurs always. Juniper is a pioneer in FIB in DRAM, and has patente gated it to a degree. So it takes a very very long time to get an answer from memory. To amortise this, PPEs have a lot of threads, and while waiting for memory, another packet is worked on. But there is no pre-emption, there is no kind of moving register/memory around or cache-misses here as a function of FIB size. PPE does all the work it has, then it requests an answer from memory, then goes to sleep, then comes back when the answer arrives and does all the work it has, never pre-empted. But there is a lot more complexity here, memory used to be in the original Trio RLDRAM which was a fairly simple setup. Once they changed to HMC, they added a cache in front of memory, a proprietary chip called CAE. IFLs were dynamically allocated one of multiple CAEs they'd use to access memory. Single CAE wouldn't have 'wire rate' performance. So if you had pathological setup, like 2 IFL, and you'd get unlucky, you'd get both IFLs in some boots assigned to same CAE, instead of spread to two CAEs, you would on some boots see lower PPS performance than other boots, because you were hot-banking the CAE. This is only type of cache problem I can recall related to Juniper. But these devices are entirely proprietary and things move relatively fast and complexity increases all the time.
router. This makes each lookup much slower than a TCAM can achieve. However, that doesn't matter much: the lookup delays are much shorter than the transmission delays so it's not noticeable to the user. To
In DRAM lookups, like what Juniper does, most of the time you're waiting for the memory. With DRAM, FIB size is trivial engineering problem, memory bandwidth and latency is the hard problem. Juniper does not do TC AMs on it's service provider class devices. -- ++ytti
Current thread:
- RE: maximum ipv4 bgp prefix length of /24 ?, (continued)
- RE: maximum ipv4 bgp prefix length of /24 ? Tony Wicks (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? William Herrin (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? Owen DeLong via NANOG (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? William Herrin (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? Tom Beecher (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? Owen DeLong via NANOG (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? William Herrin (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? Mark Tinka (Sep 30)
- Re: maximum ipv4 bgp prefix length of /24 ? Owen DeLong via NANOG (Sep 30)
- Re: maximum ipv4 bgp prefix length of /24 ? Owen DeLong via NANOG (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? Saku Ytti (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? VOLKAN SALİH (Sep 29)
- Re: maximum ipv4 bgp prefix length of /24 ? William Herrin (Sep 28)