nanog mailing list archives

Re: link monitoring


From: Alain Hebert <ahebert () pubnix net>
Date: Fri, 30 Apr 2021 08:47:52 -0400

    Yes the JNP DOM MIB is what you are looking for.

    It also the traps for warnings and alarms thresholds you can use which is driven by the optic own parameters.
    ( Human Interface: show interfaces diagnostics optics <interface> ] )

    TLDR:

        Realtime: Traps;
        Monitoring: DOM MIB;

    PS: I suggest you join [ juniper-nsp () puck nether net ] mailing list.

-----
Alain Hebert                                ahebert () pubnix net
PubNIX Inc.
50 boul. St-Charles
P.O. Box 26770     Beaconsfield, Quebec     H9W 6G7
Tel: 514-990-5911  http://www.pubnix.net    Fax: 514-990-9443

On 4/29/21 5:32 PM, Eric Kuhnke wrote:
The Junipers on both sides should have discrete SNMP OIDs that respond with a FEC stress value, or FEC error value. See blue highlighted part here about FEC. Depending on what version of JunOS you're running the MIB for it may or may not exist.

https://kb.juniper.net/InfoCenter/index?page=content&id=KB36074&cat=MX2008&actp=LIST <https://kb.juniper.net/InfoCenter/index?page=content&id=KB36074&cat=MX2008&actp=LIST>

In other equipment sometimes it's found in a sub-tree of SNMP adjacent to optical DOM values. Once you can acquire and poll that value, set it up as a custom thing to graph and alert upon certain threshold values in your choice of NMS.

Additionally signs of a failing optic may show up in some of the optical DOM MIB items you can poll: https://mibs.observium.org/mib/JUNIPER-DOM-MIB/ <https://mibs.observium.org/mib/JUNIPER-DOM-MIB/>

It helps if you have some non-misbehaving similar linecards and optics which can be polled during custom graph/OID configuration, to establish a baseline 'no problem' value, which if exceeded will trigger whatever threshold value you set in your monitoring system.

On Thu, Apr 29, 2021 at 1:40 PM Baldur Norddahl <baldur.norddahl () gmail com <mailto:baldur.norddahl () gmail com>> wrote:

    Hello

    We had a 100G link that started to misbehave and caused the
    customers to notice bad packet loss. The optical values are just
    fine but we had packet loss and latency. Interface shows FEC
    errors on one end and carrier transitions on the other end. But
    otherwise the link would stay up and our monitor system completely
    failed to warn about the failure. Had to find the bad link by
    traceroute (mtr) and observe where packet loss started.

    The link was between a Juniper MX204 and Juniper ACX5448. Link
    length 2 meters using 2 km single mode SFP modules.

    What is the best practice to monitor links to avoid this
    scenarium? What options do we have to do link monitoring? I am
    investigating BFD but I am unsure if that would have helped the
    situation.

    Thanks,

    Baldur




Current thread: