nanog mailing list archives

Re: ROV Deployment (was LDPv6 Census Check)


From: Christopher Morrow <morrowc.lists () gmail com>
Date: Tue, 16 Jun 2020 12:00:53 -0400

On Tue, Jun 16, 2020 at 11:51 AM Randy Bush <randy () psg com> wrote:

router implementations; i.e. every step in the chain.  the only reason
the mess is not blatantly visible is the fail soft design, aka notFound.
the problem with fail soft is that you think you are protected when you
are not.

I don't see how we would have reasonably found these problems without
large scale actually operating deployments. To me this seems like:
  ipv6 rollouts
  dnssec rollouts
  any other large system change

we expected things to work like X, in reality they work a little differently AND
we have software / systems problems which SEEM like non-problems (or even
features!) which under stress/scale prove to be complications to be filed down.

my inner naggumite is starting to wonder if fail soft was a mistake.

would be hard to argue: "Sure! you should deploy, worse case when
things go wrong
in your deployment (which happens, always) you fall off the net!"

fail soft at least for a while is ok... and helps get systems/people/scale.


Current thread: