nanog mailing list archives

RE: Illegal header length in BGP error


From: Matthew Huff <mhuff () ox com>
Date: Tue, 24 Feb 2009 12:29:29 -0500

We were using PMTUD. However:

1) The link was iBGP and was done via crossever with both having default MTU
2) I tried disabling PMTUD with no difference
3) Cisco admitted it was a known bug, and downreving it to 12.4(15)T
resolved the issue.



----
Matthew Huff       | One Manhattanville Rd
OTA Management LLC | Purchase, NY 10577
http://www.ox.com  | Phone: 914-460-4039
aim: matthewbhuff  | Fax:   914-460-4139



-----Original Message-----
From: Paul Cosgrove [mailto:paul.cosgrove () heanet ie]
Sent: Tuesday, February 24, 2009 12:26 PM
To: Mills, Charles
Cc: Renaud RAKOTOMALALA; Matthew Huff; nanog () nanog org
Subject: Re: Illegal header length in BGP error

Are you using PMTUD?

We saw this on a couple of our route reflectors and on one occasion
picked it up in a capture.   So I can say that the issue is due to bad
packets being sent, rather than an inaccurate error.  It can be
reported
differently according to where the corruption occurs (e.g. unsupported
message type, update malformed etc.).

Two production BGP sessions were affected at different times, and one
showed errors every few days, the other weeks apart.  Both sessions
were
from route reflectors to other routers receiving full tables, and both
traversed multiple hops. All other sessions of these routers were fine.
Whilst investigating we identified that different MTUs were being used
on the device interfaces at each end of the sessions.  The session on
which we saw most errors also had lower MTUs on intervening links, so
PMTUD was suspected to be a factor.

I replaced one of the paths with a direct link, using identical MTUs,
and that stopped the errors on that session (since PMTUD had nothing to
do anymore).  Just to be sure we recreated a multiple hop topology from
our production route reflectors to isolated lab routers, with low
intervening link MTUs and ACLs to keep out other unwanted traffic -
which also produced the same error on those sessions (but only once
each
over three months).

After correcting all the MTUs in the production network the errors
ceased completely.  Our test routers shared these links, but also used
an additional link with a low mtu which we deliberately did not fix; as
it turned out we not see it again there either so the trigger was not
entirely clear.

One other thing to note is that, at the time, we were seeing some other
problems with these production routers, whichcisco believed may have
been due to SNMP polling of BGP stats.  If you have been changing that
recently I would also consider it a possibility.

Paul.



Mills, Charles wrote:
I ran into exactly the same thing during a code upgrade a few weeks
ago.

I wrote it off as a bug in BGP and backed off the code until a new
release was out.  I was also running 12.4(22)T
On an NPE-G2.

Chuck

-----Original Message-----
From: Renaud RAKOTOMALALA [mailto:renaud () rakotomalala com]
Sent: Tuesday, February 24, 2009 10:49 AM
To: Matthew Huff; 'nanog () nanog org'
Subject: Re: Illegal header length in BGP error

Hello Matthew,

We changed the motherboard from cisco one of our from 7206VXR (NPE-
G1)
to 7206VXR (NPE-G2).

Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to
12.4(12.2r)T. At the end we've got the same problem as you between
one
of our 7200 in 12.3 and the new one in 12.4 ....

We solved the problem by upgrading the cisco withe the IOS from
12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive ....

So now everything work fine between our 7200 (IOS 12.3) and the other
7200 in IOS 12.4(4)XD10

I hope it could help you ...

Cheers,
Renaud


Matthew Huff a écrit :

One of our upstream providers flapped this morning, and since then
they are
sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s.
I'm
getting no BGP errors from that providers and the number of routes
and basic
sanity check looks okay. However, when it tries to redistribute the
bgp
routes via iBGP to our other board routers, we get:

003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x
Down BGP
Notification sent
003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to
neighbor
x.x.x.x 1/2 (illegal header length) 2 bytes


All routes have identical hardware and IOS versions. My google and
cisco
search fu leads me to the AS path length bug, but the interesting
thing is
that since we have "bgp maxas-limit 75" configured and a recent IOS,
we
haven't had the problem before when other people were reporting
issues. I've
also looked at the path mtu issue, and although we haven't had a
problem
before I disabled bgp mtu path discovery, but have the same issues.

Anyone seeing something like this today, and or does anyone have a
suggestion on finding out more specific info (which as path for
example so I
can filter it)?





This e-mail message and any files transmitted with it contain
confidential information intended only for the person(s) to whom this
email message is addressed. If you have received this e-mail message in
error, please notify the sender immediately by telephone or e-mail and
destroy the original message without making a copy.  Thank you.
Neither this information block, the typed name of the sender, nor
anything else in this message is intended to constitute an electronic
signature unless a specific statement to the contrary is included in
this message.






Attachment: Matthew Huff.vcf
Description:

Attachment: smime.p7s
Description:


Current thread: