nanog mailing list archives

Re: Software Bugs


From: Kasper Adel <karim.adel () gmail com>
Date: Mon, 21 Feb 2011 01:44:44 +0200

Thanks Valdis.

On Sun, Feb 20, 2011 at 9:43 PM, <Valdis.Kletnieks () vt edu> wrote:

On Sun, 20 Feb 2011 18:05:44 +0200, Kasper Adel said:

(Disclaimer - I've never filed a bug report with Cisco or Juniper,
but I've spent 3 decades filing bugs with almost everybody else in
the computer industry, it seems...  Questions like the ones you asked
are almost always pointless unless the asker and answerer are sharing
a set of base assumptions.  In other words, "which one is best/worst?"
is a meaningless question unless you either tell us what *your* criteria
are in detail, or are willing to listen to advice that uses other
criteria (without stating how they're different from yours).


I tried to put details and criteria below and yes i am mainly interested in
Juniper, Cisco, Alcatel and Huawei Routers and Switches, mostly High End
Equipment and yes i am willing to listen to advice on criteria, why wouldnt
I :) ?


1) Which vendor has more bugs than others, what are the top 3

More actual bugs, more known and acknowledged bugs, or more serious bugs
that
actually affect day to day operations in a major manner?


What i wanted to ask is from the field experience of experts on the alias if
there is a clear winner on which vendor has throughout history shown more
bugs impacting operation and interrupting traffic....poor written code or
bad internal testing, can we have some sort of a general assumption here or
that is not possible?


The total number of actual bugs for each vendor is probably unknownable,
other
than "there's at least one more in there".  The vendor probably can produce
a
number representing how many bug reports they've accepted as valid. The
vendor's number is guaranteed to be different than the customer's number -
how
divergent, *and why*, probably tells you a lot about the vendor and the
customer base. The vendor may be difficult about accepting a bug report, or
the
customer base may be clueless about what the product is supposed to be
doing
and calling in a lot of non-bugs - almost every trouble ticket closed with
RTFM
status is one of these non-bugs. If there's a lot of non-bugs, it usually
indicates a documentation/training issue, not an actual software quality
issue.

And of course, bug severity *has* to be considered.  "Router falls over if
somebody in Zimbabwe sends it a christmas-tree packet" is different than
"the
CLI insists on a ;; where a ; should suffice".  You may be willing to
tolerate
or work around dozens or even hundreds of the latter (in fact, there's
probably
hundreds of such bugs in your current vendor that you don't know about
simply
because they don't trigger in your environment), but it only takes 2 or 3
of
the former to render the box undeployable.

2) Who is doing a better job fixing them

Again, see the above discussion of severity.  If a vendor is good about
fixing
the real show-stoppers in a matter of hours or days, but has a huge backlog
of
fixes for minor things, is that better or worse than a vendor that fixes
half
of both serious and minor things?

In addition, the question of how fixes get deployed matters too.  If a
vendor
is consistently good about finding a root cause, fixing it, and then saying
"we'll ship the fix in the next dot-rev release", is that good or bad?
Remember that if they ship a new, updated, more-fixed image every week,
that
means you get to re-qualify a new image every week....


What you have mentioned is operations headache, so one questions comes to
mind here is what are issues a vendor will never be able to find in their
internal testing, i mean are there issues that will definitely be discovered
on the customer networks or we can assume that software needs to come out
with less number of sev1/2 bugs because internal testing is not doing a good
job?

thanks


Current thread: