nanog mailing list archives

Re: NTP Sync Issue Across Tata (Europe)


From: Tom Beecher <beecher () beecher cc>
Date: Wed, 16 Aug 2023 10:50:09 -0400

Thanks for that link.

This is jumping out at me though :

Their interior routing protocol used amongst their mesh of routers was
IS-IS which was using authentication.  The authentication [section 4.19]
was described having a "password validity start date" of 01 July 2012.
Thus, any routers which had picked up the time from the faulty source no
longer had valid IS-IS authentication and were thus isolated.


It's been a while, but last time I remember diving into the IS-IS weeds ,
the time of the transmitting system wasn't part of a Hello.  Is this a
Cisco specific option they toss in a TLV?

On Wed, Aug 16, 2023 at 9:04 AM Matthew Richardson via NANOG <
nanog () nanog org> wrote:

Mel Beckman wrote:-

Do you have a citation for your Jersey event? I doubt GPS caused the
problem, but I’d like to see the documentation.

The event took place on the evening of Sunday 12 July 2020, and seems NOT
to have been due to an issue caused directly by GPS, but rather to
misbehaviour of a GPS NTP server relating to week numbers.  Our regulator
subsequently issued the following comprehensive document:-


https://www.jcra.je/media/598397/t-027-jt-july-2020-outage-decision-directions.pdf

By way of summary, JT operated two GPS derived NTP servers, with all of
their routers were pointing to both.  On the evening in question, one of
the two reset its clock back to 27 November 2000.

Their interior routing protocol used amongst their mesh of routers was
IS-IS which was using authentication.  The authentication [section 4.19]
was described having a "password validity start date" of 01 July 2012.
Thus, any routers which had picked up the time from the faulty source no
longer had valid IS-IS authentication and were thus isolated.

Whilst only 15% of their routers were affected, this was enough to cause an
almost total failure in their network, affecting telephony (fixed & mobile)
and Internet.  For foreign readers (this is NANOG!) "999" calls refer to
the emergency services in these parts, where any failures attract the
attention of our regulator.

The details of why the clock "failed" start at section 4.23, and seem to
relate a GPS week number rollover.

So, probably not a failure "caused by GPS", rather one caused by poor
design (only two clock sources) combined with unsupported and buggy
devices.

One curious aspect is that some routers followed the "bad" time, which is
alluded to in section 4.31.

Something not discussed in that report is that JT's email failed during the
incident despite its being hosted on Office365.  The reason was that the
two authoritative DNS servers for jtglobal.com were hosted in Jersey
inside
their network.  As that network was wholly disconnected, there was no DNS
and hence no email.  Despite my having raised this since with their senior
management, their DNS remains hosted in this way:-

matthew@m88:~$ dig +norec +noedns +nocmd +nostats -t ns jtglobal.com @
ns1.jtibs.net
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20462
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 4

;; QUESTION SECTION:
;jtglobal.com.                 IN      NS

;; ANSWER SECTION:
jtglobal.com.          60      IN      NS      ns2.jtibs.net.
jtglobal.com.          60      IN      NS      ns1.jtibs.net.

;; ADDITIONAL SECTION:
ns1.jtibs.net.         60      IN      A       212.9.0.135
ns2.jtibs.net.         60      IN      A       212.9.0.136
ns1.jtibs.net.         60      IN      AAAA    2a02:c28::d1
ns2.jtibs.net.         60      IN      AAAA    2a02:c28::d2

Rediculously (and again despite my agitation to their management) our
government domain gov.je has similar DNS fragility:-

matthew@m88:~$ dig +norec +noedns +nocmd +nostats -t ns gov.je @
ns1.gov.je
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4249
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2

;; QUESTION SECTION:
;gov.je.                               IN      NS

;; ANSWER SECTION:
gov.je.                        3600    IN      NS      ns2.gov.je.
gov.je.                        3600    IN      NS      ns1.gov.je.

;; ADDITIONAL SECTION:
ns2.gov.je.            3600    IN      A       212.9.21.137
ns1.gov.je.            3600    IN      A       212.9.21.9

--
Best wishes,
Matthew

 ------
From: Mel Beckman <mel () beckman org>
To: Matthew Richardson <matthew-l () itconsult co uk>
Cc: Nanog <nanog () nanog org>
Date: Tue, 8 Aug 2023 15:12:29 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)

Until the Internet NTP network can be made secure, no. Do you have a
citation for your Jersey event? I doubt GPS caused the problem, but I’d
like to see the documentation.

Using GPS for time sync is simple risk management: the risk of Internet
NTP with known, well documented vulnerabilities and many security
incidents, versus the risk of some theoretical GPS-based vulnerability, for
which mitigations such as geographic diversity are readily available. Sure,
you could use Internet NTP as a last resort should GPS fail globally
(perhaps due to a theoretical — but conceivable — meteor storm). But that
would be a fall-back. I would not mix the systems.

-mel

On Aug 8, 2023, at 1:36 AM, Matthew Richardson <
matthew-l () itconsult co uk> wrote:

?Mel Beckman wrote:-

It's a problem that has received a lot of attention in both NTP and
aviation navigation circles. What is hard to defend against is total
signal
suppression via high powered jamming. But that you can do with a
geographically diverse GPS NTP network.

Whilst looking forward to being corrected, GPS (even across multiple
locations) seems to be a SINGLE source of time.  You seem (have I
misunderstood?) to be a proponent of using GPS exclusively as the
external
clock source.

Might it be preferable to have a mixture of GPS (perhaps with another
GNSS)
together with carefully selected Internet-based NTP servers?

I recall an incident over here in Jersey (the one they named New Jersey
after!) where our primary telco had a substantial time shift on one of
their two GPS synced servers.  This managed to adjust the clock on
enough
of their routers that the certificate-based OSPF authentication
considered
the certificates invalid, and caused a failure of almost their whole
network.

This is, of course, not to say that GPS is not a very good clock source,
but rather to wonder whether more diversity would be preferable than
using
it as a single source.

--
Best wishes,
Matthew

------
From: Mel Beckman <mel () beckman org>
To: "Forrest Christian (List Account)" <lists () packetflux com>
Cc: Nanog <nanog () nanog org>
Date: Mon, 7 Aug 2023 14:03:30 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)

Forrest,

GPS spoofing may work with a primitive Raspberry Pi-based NTP server,
but commercial industrial NTP servers have specific anti-spoofing
mitigations. There are also antenna diversity strategies that vendors
support to ensure the signal being relied upon is coming from the right
direction. It's a problem that has received a lot of attention in both NTP
and aviation navigation circles. What is hard to defend against is total
signal suppression via high powered jamming. But that you can do with a
geographically diverse GPS NTP network.

-mel

On Aug 7, 2023, at 1:39 AM, Forrest Christian (List Account) <
lists () packetflux com> wrote:

?
The problem with relying exclusively on GPS to do time distribution is
the ease with which one can spoof the GPS signals.

With a budget of around $1K, not including a laptop, anyone with
decent technical skills could convince a typical GPS receiver it was at any
position and was at any time in the world.   All it takes is a decent
directional antenna, some SDR hardware, and depending on the location and
directivity of your antenna maybe a smallish amplifier.   There is much
discussion right now in the PNT (Position, Navigation and Timing) community
as to how best to secure the GNSS network, but right now one should
consider the data from GPS to be no more trustworthy than some random NTP
server on the internet.

In order to build a resilient NTP server infrastructure you need
multiple sources of time distributed by multiple methods - typically both
via satellite (GPS) and by terrestrial (NTP) methods.   NTP does a pretty
good job of sorting out multiple time servers and discarding sources that
are lying.  But to do this you need multiple time sources.  A common
recommendation is to run a couple/few NTP servers which only get time from
a GPS receiver and only serve time to a second tier of servers that pull
from both those in-house GPS-timed-NTP servers and other trusted NTP
servers.   I'd recommend selecting the time servers to gain geographic
diversity, i.e. poll NIST servers in Maryland and Colorado, and possibly
both.

Note that NIST will exchange (via mail) a set of keys with you to talk
encrypted NTP with you.   See
https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service
.



On Sun, Aug 6, 2023 at 8:36?PM Mel Beckman <mel () beckman org<mailto:
mel () beckman org>> wrote:
GPS Selective Availability did not disrupt the timing chain of GPS,
only the ephemeris (position information).  But a government-disrupted
timebase scenario has never occurred, while hackers are a documented threat.

DNS has DNSSec, which while not deployed as broadly as we might like,
at least lets us know which servers we can trust.

Your own atomic clocks still have to be synced to a common standard to
be useful. To what are they sync'd? GPS, I'll wager.

I sense hand-waving :)

-mel via cell

On Aug 6, 2023, at 7:04 PM, Rubens Kuhl <rubensk () gmail com<mailto:
rubensk () gmail com>> wrote:

?


On Sun, Aug 6, 2023 at 8:20?PM Mel Beckman <mel () beckman org<mailto:
mel () beckman org>> wrote:
Or one can read recent research papers that thoroughly document the
incredible fragility of the existing NTP hierarchy and soberly consider
their recommendations for remediation:

The paper suggests the compromise of critical infrastructure. So,
besides not using NTP, why not stop using DNS ? Just populate a hosts file
with all you need.

BTW, the stratum-0 source you suggested is known to have been
manipulated in the past (https://www.gps.gov/systems/gps/modernization/sa/),
so you need to bet on that specific state actor not returning to old habits.

OTOH, 4 of the 5 servers I suggested have their own atomic clock, and
you can keep using GPS as well. If GPS goes bananas on timing, that source
will just be disregarded (one of the features of the NTP architecture that
has been pointed out over and over in this thread and you keep ignoring it).

Rubens




Current thread: