nanog mailing list archives
Re: Journal of Internet Disasters
From: Marc Slemko <marcs () znep com>
Date: Fri, 13 Nov 1998 17:58:02 -0800 (PST)
On Fri, 13 Nov 1998, Michael Dillon wrote:
- f.root-servers.net and NSI's servers reacted differently. What are the differences between them (BIND versions, in-house source code changes, operating systems/run-time libraries/compilers)Whatever was causing the Internic link to be congested could have disrupted NSI's server. Wasn't vixie's server acting properly by answering lame for the zones it could not retrieve? It seems like all the problems revolve around NSI's server and network. Vixie's problems were merely a symptom. On the other hand, I would classify the inability of AXCFR to transfer the zone as a weakness in BIND that could be addressed. Additionally, since it is known that zone transfers require a certain amount of bandwidth, Vixie could improve his operations by implementing a system that monitors the bandwidth with pathshow prior to intiating AXFR. Also, he could monitor the progress of the AXFR and also alarm if it was taking too long. This would have allowed a fallback to ftp sooner and operationally, such a fallback might even be something that could be automated. Of course, none of this means Vixie was at fault and I'd argue that NSI is at fault for not being able to detect the problem sooner and not being able to swap in a backup server sooner. Vixie knows that he is one of 13 root nameservers. But NSI knows that they are the one and only master root nameserver which puts more responsibility on them.
There have been no even remotely logical claims that f.root-servers.net caused any problems at all. If Paul's server had been working correctly and had transferred the zone properly, the impact of NSI's screwups would have been almost exactly the same. What you are discussing is a problem, but not "the" problem and not a problem that causes a significant impact over the short term. It is important to keep that clear in messages; NSI has already spread enough lies, so any confusion about the issue isn't wise. In fact, the fact that at least three of NSI's servers were giving false NXDOMAINs isn't really the issue either, from nanog's perspective. It needs to be figured out, is a major problem in BIND, etc. but isn't necessarily something they could have or should have been able to prevent before it happened: that is very difficult to figure out from the outside, and I can certainly imagine situations where, despite the best operations anywhere, they could not predict such things. The big issue that needs to be addressed is why the heck it took NSI over two hours after they were notified to fix it, especially in the middle of the day, and why the didn't have any automated system that detected it and notified them in minutes. Whatever the exact problem was is important and needs to be addressed, but addressing each instance is pointless without knowing why NSI's operations procedures are so flawed. In fact, they are so flawed that the VP of engineering either had no idea what was going on or chose to lie. The problem is that NSI currently has no accountability (not even to their customers), and doesn't even make a token effort to followup to their screwups. The organization that controls the root nameservers should have one of the best operations departments, not one of the worst.
Current thread:
- Journal of Internet Disasters Sean Donelan (Nov 13)
- Re: Journal of Internet Disasters Paul Vixie (Nov 13)
- RE: Journal of Internet Disasters Eric M. Carroll (Nov 13)
- RE: Journal of Internet Disasters Dave Crocker (Nov 15)
- RE: Journal of Internet Disasters Michael Freeman (Nov 16)
- Re: Journal of Internet Disasters Michael Dillon (Nov 13)
- Re: Journal of Internet Disasters Marc Slemko (Nov 13)
- Re: Journal of Internet Disasters Michael Dillon (Nov 14)
- Re: Journal of Internet Disasters Deborah Ann Smith (Nov 15)
- Message not available
- Re: Journal of Internet Disasters Roeland M.J. Meyer (Nov 16)
- Message not available
- Re: Journal of Internet Disasters Dave Crocker (Nov 17)
- Re: Journal of Internet Disasters J.D. Falk (Nov 17)
- Re: Journal of Internet Disasters Howard C. Berkowitz (Nov 17)
- Re: Journal of Internet Disasters Marc Slemko (Nov 13)
- Message not available
- Re: Journal of Internet Disasters Roeland M.J. Meyer (Nov 18)
- <Possible follow-ups>
- Re: Journal of Internet Disasters Dean Anderson (Nov 13)
- Re: Journal of Internet Disasters Mathias Koerber (Nov 13)
- Re: Journal of Internet Disasters Paul A Vixie (Nov 14)