nanog mailing list archives

Re: A case against vendor-locking optical modules


From: Chuck Anderson <cra () WPI EDU>
Date: Sat, 6 Dec 2014 08:37:01 -0500

On Sat, Dec 06, 2014 at 11:51:56AM +0200, Saku Ytti wrote:
a) one particular optic had slow i2c, vendor polled it more aggressively than
it could respond. Vendor polling code didn't handle errors reading from i2c,
but instead crashed whole linecard control-plane.
Vendor claimed it's not bug, because it didn't happen on their optic. I tried
to explain to them, they cannot guarantee that I2C reads won't fail on their
own optics, and it's serious problem, but was unable to convince them to fix
it.
Now I am in possession of good bunch of SFP I can stick to your routers in
colo, have them crash, and you won't have any clue why they crashed.

b) particular vendor had bug in their SFP microcontroller where after 2**31
1/100 of a seconds had passed, it started to write its uptime to a location
where DDM temperature measurements are read. This was obvious from graphs,
because it went linearily from -127 ... 127, then jumped back to -127.
These optics when seated on Vendor1 caused no problems, when seated on Vendor2
they caused link flapping, even two boxes away! (A-B-C, A having problematic
optic, B-C might flap). Coincidentally Vendor2 is same as in case a), they
didn't consider this was bug in their code.
This was particularly funny, if you rebooted 100 boxes in a maintenance
window, then the bug would trigger at same moment after 2**31 1/100th of a
second, causing potentially major outage.

Who is Vendor2?


Current thread: