nanog mailing list archives

Re: Did your BGP crash today?


From: Thomas Mangin <thomas.mangin () exa-networks co uk>
Date: Sun, 29 Aug 2010 22:12:35 +0200

It would seem to me that there should actually be a better option, e.g.
recognizing the malformed update, and simply discarding it (and sending the
originator an error message) instead of resetting the session.

Resetting of BGP sessions should only be done in the most dire of
circumstances, to avoid a widespread instability incident.


I had the same thought before giving up on it. 

Negotiating a new error message could be a per peer option. BGP has capabilities for this exact reason.

However to make sense you would need to find a resynchronisation point to only exclude the one faulty message. 
Initially I thought that the last received KEEPALIVE (for the receiver of the error message) could do - but you find 
yourselves with races conditions - so perhaps two KEEPALIVE back ?
Each TCP packet can contain multiple message, so the messages would have to be then split and ACK individually to find 
the faulty one and then ACK individually. EOR could be used for that purpose.

Still it adds lots of complexity in the conversation - are we not going to introduce bug in that not much used and 
tested code path as well ?
Unless you have a new "ACK" capability for each message - another idea but  those are clearly a discussions for outside 
NANOG.

Thomas





Current thread: