nanog mailing list archives

Re: link monitoring


From: Eric Kuhnke <eric.kuhnke () gmail com>
Date: Thu, 29 Apr 2021 14:44:20 -0700

If I may add one thing I forgot, this post reminded me. In the question I
think it was probably a 100G CWDM4 short distance link. When monitoring a
100G coherent (QPSK, 16QAM, whatever) longer distance link, be absolutely
sure to poll all of the SNMP OIDs for it the same as if it was a point to
point microwave link.

Depending on exactly what line card and optic it is, it may behave somewhat
similarly to a faded or misaligned radio link under conditions related to
degradation of the fiber or the lasers. In particular I'm thinking of
coherent 100G linecards that can switch on the fly between 'low FEC' and
'high FEC' payload vs FEC percentage (much as an ACM-capable 18 or 23 GHz
band radio would), which should absolutely trigger an alarm. And also the
data for FEC decode stress percentage level, etc.

On Thu, Apr 29, 2021 at 2:37 PM Lady Benjamin Cannon of Glencoe, ASCE <
lb () 6by7 net> wrote:

We monitor light levels and FEC values on all links and have thresholds
for early-warning and PRe-failure analysis.

Short answer is yes we see links lose packets before completely failing
and for dozens of reasons that’s still a good thing, but you need to
monitor every part of a resilient network.

Ms. Lady Benjamin PD Cannon of Glencoe, ASCE
6x7 Networks & 6x7 Telecom, LLC
CEO
lb () 6by7 net
"The only fully end-to-end encrypted global telecommunications company
in the world.”

FCC License KJ6FJJ

Sent from my iPhone via RFC1149.

On Apr 29, 2021, at 2:32 PM, Eric Kuhnke <eric.kuhnke () gmail com> wrote:


The Junipers on both sides should have discrete SNMP OIDs that respond
with a FEC stress value, or FEC error value. See blue highlighted part here
about FEC. Depending on what version of JunOS you're running the MIB for it
may or may not exist.


https://kb.juniper.net/InfoCenter/index?page=content&id=KB36074&cat=MX2008&actp=LIST

In other equipment sometimes it's found in a sub-tree of SNMP adjacent to
optical DOM values. Once you can acquire and poll that value, set it up as
a custom thing to graph and alert upon certain threshold values in your
choice of NMS.

Additionally signs of a failing optic may show up in some of the optical
DOM MIB items you can poll:
https://mibs.observium.org/mib/JUNIPER-DOM-MIB/

It helps if you have some non-misbehaving similar linecards and optics
which can be polled during custom graph/OID configuration, to establish a
baseline 'no problem' value, which if exceeded will trigger whatever
threshold value you set in your monitoring system.

On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl <baldur.norddahl () gmail com>
wrote:

Hello

We had a 100G link that started to misbehave and caused the customers to
notice bad packet loss. The optical values are just fine but we had packet
loss and latency. Interface shows FEC errors on one end and carrier
transitions on the other end. But otherwise the link would stay up and our
monitor system completely failed to warn about the failure. Had to find the
bad link by traceroute (mtr) and observe where packet loss started.

The link was between a Juniper MX204 and Juniper ACX5448. Link length 2
meters using 2 km single mode SFP modules.

What is the best practice to monitor links to avoid this scenarium? What
options do we have to do link monitoring? I am investigating BFD but I am
unsure if that would have helped the situation.

Thanks,

Baldur




Current thread: