nanog mailing list archives

Re: looking for hostname router identifier validation


From: Large Hadron Collider <large.hadron.collider () gmx com>
Date: Mon, 29 Apr 2019 18:18:53 -0700

I legit guffawed.

On 19-04-29 13 h 13, Eric Kuhnke wrote:
I would caution against putting much faith in the validity of
geolocation or site ID by reverse DNS PTR records. There are a vast
number of unmaintained, ancient, stale, erroneous or wildly wrong PTR
records out there. I can name at least a half dozen ISPs that have
absorbed other ASes, some of those which also acquired other ASes
earlier in their history, forming a turducken of obsolete PTR records
that has things with ISP domain names last in use in the year 2002.



On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl () luckie org nz
<mailto:mjl () luckie org nz>> wrote:

    Hi NANOG,

    To support Internet topology analysis efforts, I have been working on
    an algorithm to automatically detect router names inside hostnames
    (PTR records) for router interfaces, and build regular expressions
    (regexes) to extract them.  By "router name" inside the hostname, I
    mean a substring, or set of non-contiguous substrings, that is common
    among interfaces on a router.  For example, suppose we had the
    following three routers in the savvis.net <http://savvis.net>
    domain suffix, each with two
    interfaces:

    das1-v3005.nj2.savvis.net <http://das1-v3005.nj2.savvis.net>
    das1-v3006.nj2.savvis.net <http://das1-v3006.nj2.savvis.net>

    das1-v3005.oc2.savvis.net <http://das1-v3005.oc2.savvis.net>
    das1-v3007.oc2.savvis.net <http://das1-v3007.oc2.savvis.net>

    das2-v3009.nj2.savvis.net <http://das2-v3009.nj2.savvis.net>
    das2-v3012.nj2.savvis.net <http://das2-v3012.nj2.savvis.net>

    We might infer the router names are das1|nj2, das1|oc2, and das2|nj2,
    respectively, and captured by the regex:
    ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$

    After much refinement based on smaller sets of ground truth, I'm
    asking for broader feedback from operators.  I've placed a webpage at
    https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm
    made for 2523 domains.  If you operate one of the domains in that
    list, I would appreciate it if you could comment (private is probably
    better but public is fine with me) on whether the regex my algorithm
    inferred represents your naming intent.  In the first instance, I am
    most interested in feedback for the suffix / date combinations for
    suffixes that are colored green, i.e. appear to be reasonable.

    Each suffix / date combination links to a page that contains the
    naming convention and corresponding inferences.  The colored part of
    each hostname is the inferred router name.  The green hostnames appear
    to be correct, at least as far as the algorithm determined. Some
    suffixes have errors due to either stale hostnames or incorrect
    training data, and those hostnames are colored red or orange.

    If anyone is interested in sets of hostnames the algorithm may have
    inferred as 'stale' for their network, because for some operators it
    was an oversight and they were grateful to learn about it, I can
    provide that information.

    Thanks,

    Matthew


Current thread: