nanog mailing list archives

Re: IPv6 oddness in Comcast land...


From: Casey Russell <crussell () kanren net>
Date: Mon, 20 Mar 2017 10:16:53 -0500

(I first sent this directly to Valdis instead of the list, so my apologies
to Valdis for getting this twice)

Greetings,

I'm afraid I can't hand the ultimate solution, but I can point you in a
direction.

     Sounds like you probably have an IPv6 neighbor discovery problem.
Most likely (since that's where the change occurred) it's between your WRT
and the Comcast CPE (I assume a cable modem) or the first active piece of
the upstream cable plant.  But It'll be the first Comcast device actually
speaking Ipv6 to your WRT.

     I've seen this happen several times in new (or changed) peering links
with other providers (where dissimilar equipment, or new ACLs) are
involved.  Typically what's happening is that an ACL or firewall rule on
one device isn't allowing that devices interface to speak fully over the
new link, and that's preventing IPv6 neighbor discovery from happening
properly between two adjacent devices.  (In this case those devices are
likely your WRT and the first upstream Comcast device speaking IPv6).

     Since it's your device that changed, you likely won't have a lot of
luck convincing comcast to dig too deep into this issue, especially since
their device "worked" before and these providers have few engineers
on-staff that really understand v6.  It's not that there's no one at
comcast who can fix it, it'll just take you a while to find them.

     So without knowing your equipment, I can only offer a few general
tips.  Look for troubleshooting commands that will show you the ipv6
neighbor discovery status on your device interfaces.  See what the status
is before a traceroute (when things are broken) and after a traceroute
(when things are fixed).  If it appears I'm right, go to that Interface and
create ACLs or firewall rules to allow the actual ipv6 addresse(s) on that
interface to speak (outward) to their local subnet.

     Be sure to remember you may need to create a rule for the global
(permanent, public) address, and also for the link-local address.  Some
vendors will put the link-local address in the ND solicitation and others
will use the global unicast (if it's already been assigned).  The RFC
suggests the link-local, but also says that the source and destination
addresses in the messages need be only "An address assigned to the
interface from which the advertisement is sent."

     If that does help, remember to tighten those new ACLs as much as you
can and still have things work.  If it doesn't, you'll likely have to
engage comcast about the issue, as it may, or may not be this at all.

:-)  good luck




Sincerely,
Casey Russell
Network Engineer
[image: KanREN] <http://www.kanren.net>
[image: phone]785-856-9809
2029 Becker Drive, Suite 282
Lawrence, Kansas 66047
[image: linkedin]
<https://www.linkedin.com/company/92399?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A92399%2Cidx%3A1-1-1%2CtarId%3A1440002635645%2Ctas%3AKanREN>
[image:
twitter] <https://twitter.com/TheKanREN> [image: twitter]
<http://www.kanren.net/feed/> need support? <support () kanren net>

On Sun, Mar 19, 2017 at 6:16 PM, <valdis.kletnieks () vt edu> wrote:

Trying to figure out what the heck is going on here.  Any good
explanations cheerfully accepted.

Background:  Home internet router is a Linksys WRT1200AC that had been
running OpenWRT 15.05.01. IPv6 worked just fine - Comcast handed me a /60
via DHCP-PD and no issues.  I reflashed it to Lede 17.01, and after doing
all the reconfig, I'm hitting a really strange IPv6 issue.

Symptoms - IPv6 still configures correctly, but IPv6 packets appear to go
out
and disappear into the ether when they leave the Linksys.  Doing a
traceroute
to any IPv6 destination makes things work again - for a while (from 15
minutes
to an hour or two).

As seen from my laptop (I have the matching tcpdump from the outbound
interface on the Linksys):

[~] ping -6 -c 3 listserv.vt.edu
PING listserv.vt.edu(listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769))
56 data bytes

--- listserv.vt.edu ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2070ms

[~] traceroute -6 listserv.vt.edu
traceroute to listserv.vt.edu (2001:468:c80:2105:211:43ff:feda:d769), 30
hops max, 80 byte packets
 1  2601:5c0:c001:69e2::1 (2601:5c0:c001:69e2::1)  2.417 ms  3.077 ms
5.358 ms
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * hu-0-10-0-7-pe04.ashburn.va.ibone.comcast.net (2001:558:0:f5c1::2)
31.478 ms  31.975 ms
 7  2001:559::d16 (2001:559::d16)  32.406 ms  17.102 ms  24.751 ms
 8  2001:550:2:2f::a (2001:550:2:2f::a)  23.245 ms  23.519 ms  22.185 ms
 9  2607:b400:f0:2003::f0 (2607:b400:f0:2003::f0)  29.782 ms  28.604 ms
29.891 ms
10  2607:b400:90:ff05::f1 (2607:b400:90:ff05::f1)  30.423 ms *  30.680 ms
11  * * *
12  listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769)  34.562
ms  39.072 ms  24.633 ms
[~] ping -6 -c 3 listserv.vt.edu
PING listserv.vt.edu(listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769))
56 data bytes
64 bytes from listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769):
icmp_seq=1 ttl=53 time=33.3 ms
64 bytes from listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769):
icmp_seq=2 ttl=53 time=24.3 ms
64 bytes from listserv.ipv6.vt.edu (2001:468:c80:2105:211:43ff:feda:d769):
icmp_seq=3 ttl=53 time=46.0 ms

--- listserv.vt.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 24.334/34.595/46.093/8.927 ms

So it looks like something times out somewhere and fails to pass packets
back.

TCP connections don't keep IPv6 alive. I have a browser window that
auto-updates every 5 minutes, and a SmokePing process on a Raspberri Pi
uploads
to a server at work every few minutes, and those eventually drop back to
IPv4
when the IPv6 TCP fails to connect. And normal UDP doesn't seem to keep it
alive - NTP pointing at IPv6 peers loses connectivity as well.

But a traceroute wakes it up. It's almost like some router is losing the
route to me out of the FIB, and fixes it when it has to handle a packet
on the CPU slow path (like send back a 'time exceeded').  But I'm mystified
why this started when I reflashed my router.




Current thread: