nanog mailing list archives

Re: Whats so difficult about ISSU


From: Jonathan Lassoff <jof () thejof com>
Date: Thu, 8 Nov 2012 21:15:04 -0800

On Thu, Nov 8, 2012 at 8:13 PM, Mikael Abrahamsson <swmike () swm pp se> wrote:
On Thu, 8 Nov 2012, Phil wrote:

The major vendors have figured it out for the most part by moving to
stateful synchronization between control plane modules and implementing
non-stop routing.


NSR isn't ISSU.

ISSU contains the wording "in service". 6 seconds of outage isn't "in
service". 0.5 seconds of outage isn't "in service". I could accept a few
microseconds of outage as being "ISSU", but tenths of seconds isn't in
service.


The main remaining hurdle is updating microcode on linecards, they still
need to be rebooted after an upgrade.


... and as long as this is the case, there is no ISSU. There is only
"shorter outages during upgrade compared to a complete reboot".

This.
There are some wonderfully reconfigurable router hardwares out in the
world, and platforms that can dynamically program their forwarding
hardware make this seem possible.

It's possible to build things such that portions of a single box can
be upgraded at a time. With multiple links, or forwarding-paths out to
a remote destination, it seems to me that if the upgrade process could
just coordinate things and update each piece of forwarding hardware
while letting traffic cut over and waiting for it to come back before
moving on.

I could envision a Juniper M/TX box, where MPLS FRR or an "ae"
interface across FPCs could take backup traffic while a PFE is
upgraded.
Of course, every possible path would need to be able to survive an FPC
being down, and the process would have to have hooks into protocols to
know when everything is switched back.


Current thread: