nanog mailing list archives

Re: NTP Sync Issue Across Tata (Europe)


From: "Forrest Christian (List Account)" <lists () packetflux com>
Date: Mon, 14 Aug 2023 01:07:14 -0600

I've responded in bits and pieces to this thread and haven't done an
excellent job expressing my overall opinion.   This is probably because my
initial goal was to point out that GPS-transmitted time is no less subject
to being attacked than your garden variety NTP-transmitted time. Since this
thread has evolved, I'd like to describe my overall position to be a bit
clearer.

To start, we need a somewhat simplified version of how UTC is created so I
can refer to it later:

Across the globe, approximately 85 research and standards institutions run
a set of freestanding atomic clocks that contribute to UTC.   The number of
atomic clocks across all these institutions totals around 450.   Each
institution also produces a version of UTC based on its own set of
atomic clocks.  In the international timekeeping world, this is designated
as UTC(Laboratory), where Laboratory is replaced with the abbreviation for
the lab producing that version of UTC.   So UTC(NIST) is the version that
NIST produces at Boulder, Colorado, NICT produces UTC(NICT) in Tokyo, and
so on.

Because no clock is perfectly accurate, all of these versions of UTC drift
in relation to each other, and you could have significant differences in
time between different labs.   As a result, there has to be a way to
synchronize them.  Each month, the standards organization BIPM collects
relative time measurements and other statistics from each
institution described above.  This data is then used to determine the
actual value of UTC. BIPM then produces a report detailing each
organization's difference from the correct representation of UTC.   Each
institution uses this data to adjust its UTC representation, and the cycle
repeats the next month. In this way, all of the representations of UTC end
up being pretty close to each other.   The document BIPM produces is titled
"Circular T."  The most recent version indicates that most of the
significant standards institutions maintain a UTC version that differs by
less than 10ns from the official version of UTC.

Note that 10ns is far more accurate than we need for NTP, so most of the
UTC representations can be considered identical as far as this discussion
goes. Still, it is essential to realize that UTC(NIST) is generated
separately from UTC(USNO) or other UTC implementations.  For example, a
UTC(NIST) failure should not cause UTC(USNO) to fail as they utilize
separate hardware and systems.

Each of these versions of UTC is also disseminated in various ways.
UTC(NIST) goes out via the "WWV" radio stations, NTP, and other esoteric
methods.   GPS primarily distributes UTC(USNO), which is also available
directly via NTP.  UTC(SU) is the timescale for GLONASS.  And so on.

So, back to NTP and the accuracy required:

Most end users (people running everyday web applications or streaming video
or similar) don't need precisely synchronized time.   The most sensitive
application I'm aware of in this space is likely TOTP, which often needs
time on the server and time on the client (or hardware key) within 90
seconds of each other.   In addition, having NTP time fail usually isn't
the end of the world for these users.  The best way to synchronize their
computers (including desktop and server systems) to UTC is to point their
computer time synchronization service (whatever that is) at pool.ntp.org,
time.windows.com, their ISP's time server, or similar.  Or, with
modern OS'es, you can leave the time configured to whatever server the OS
manufacturer preconfigured.   As an aside, one should note that
historically windows ticked at 15ms or so, so trying to synchronize most
windows closer than 15ms was futile.

On the other hand, large ISPs or other service providers (including content
providers) see real benefits to having systems synchronized to fractions of
seconds of UTC.   Comparing logs and traces becomes much easier when you
know that something logged at 10:02:23.1 on one device came before
something logged at 10:02:23.5 on another.   Various server-to-server
protocols and software implementations need time to be synchronized to
sub-second intervals since they rely on timestamps to determine the latest
copy of data, and so on.   In addition, as an ISP, you'll often provide
time services to downstream customers who demand more accuracy and
reliability than is strictly necessary.

As a result, one wants to ensure that all time servers are synchronized
within some reasonable standard of accuracy.   Within 100ms is acceptable
for most applications but a goal of under 50ms is better.   If you have
local GPS receivers, times down to around 1ms is achievable with careful
design.  Beyond that, you're chasing unnecessary accuracy.  Note that loss
of precision is somewhat cumulative here - running a time server
synchronized to within 100ms will ensure that no client can be synchronized
to better than within 100ms from that server.   Generally, you'll want your
time server to be synchronized much better than needed to avoid the time
server being the limiting factor.

In a perfect world with no bad actors and where all links ran perfectly,
one could set up an NTP server that pulled from pool.ntp.org or used GPS
and essentially acted as a proxy.   Unfortunately, we don't live in this
world.   So one has to ask how you build a system that meets at least the
following goals:

* Synchronized to UTC within 50ms, with lower being better.
* Not subject to a reasonable set of attacks (typical DoS attacks, RF
signal attacks, spoofing, etc).
* Able to be run by typical network operations staff

In addition, an ideal server setup would be made up of redundant servers in
case one piece of hardware fails.  I will ignore this part, as it's usually
just setting up multiple copies of the same thing.

The two most straightforward options are using a GPS-based NTP appliance or
installing an NTP server and pointing it at pool.ntp.org.   Under normal
circumstances, both options will be synchronized to UTC with enough
accuracy for most applications, and both are easy to run by typical network
operations staff.  This assumes reasonably consistent network latency in
the NTP case and a good sky view in the GPS case.  The GPS-based appliance
is, however, subject to spoofing or jamming, as I've discussed earlier.
 The NTP server is at the mercy of the quality of the servers it picked
from pool.ntp.org and is also subject to various outside attacks (spoofing,
etc.).   One must decide how critical time is to them before deciding
whether this option is valid.

The other end of the scale is the "develop your own offline version of UTC
using atomic clocks" methodology.  This fixes the attack issue but
introduces several others.   The main one is that you are now relying on
the clock's accuracy.  Admittedly rubidium and especially cesium clocks
tend to be sufficiently reliable and stable.   However, one has to ensure
the frequency is accurate initially and stays that way. You must also wire
the clock to an NTP Server and calibrate the initial UTC offset.   If the
clock goes haywire or is less accurate than is required, your in-house
version of UTC will drift in relation to real UTC.   This means you may
need 2 or 3 or more atomic clocks to be sufficiently reliable.  You'll then
need to regularly take an average, compare it to UTC, and adjust if it's
drifted too much.   This quickly becomes more of a science project than
something you want network operations staff to deal with on an ongoing
basis.    To be clear:  If you need robust time not subject to outside
forces and have or can obtain the skill set to pull this off internally, I
won't argue that this is a bad option.  However, I feel this isn't the type
of service most providers want to run internally.

So, looking at some middle-ground options that trade a bit of robustness
for ease of use is reasonable.

My lowest cost preference has always been to use a set of in-house NTP
servers pointed at a carefully curated collection of NTP servers.    Your
curation strategy should depend on network connectivity, the reliability of
the time sources, etc.   In North America, picking one or two NIST servers
from each NIST location is a good starting point.  That is one or two from
each of Maryland, Fort Collins, Boulder, and the University of Colorado.
 One may want to add some servers from other timekeeping organizations
(such as USNO).   Note that there is one commonality:  These time servers
are run by organizations listed in circular T as contributing to UTC, and
the servers are tied to the atomic clocks. That way, we ensure that the
servers are not subject to inaccuracies caused by time transfer from an
authoritative source for UTC.   What is left is any potential attack on the
time transfer over NTP itself.   I would argue that with a curated list of
enough NTP servers, this risk can be pushed down to where it is low enough
for many use cases.   A lot will depend on the quantity and quality of NTP
servers you select and the robustness of the network path to those
servers.  If the packets between your NTP server and the NTP servers you
choose traverse a relatively secure and short path with plenty of
bandwidth, and the paths to differing NTP servers are diverse, many attacks
will become harder to implement.   In addition, the more NTP servers you
add, the more likely it is that NTP will be able to correctly pick the
servers providing the correct time, even if an attacker is successfully
spoofing one or more sources.  In some cases it may make sense to add
additional servers which are run by third parties if it gains additional
robustness based on network architecture.  This is especially true if
you're closely connected network-wise with the third party and they run a
good quality NTP service as well.

As I've mentioned, a good middle-of-the-road solution is adding various
sources of time derived via GPS.   Note I said, "to add."    Start with the
carefully curated NTP server set, then install one or more GPS-based NTP
Servers polled by your NTP server.   Adding these GPS time sources to your
NTP servers does three things:  First, it provides another source of time
NTP can use to determine the correct time.   Second, we're now using a
different time transmission method with different vulnerabilities.   And
finally, it will significantly improve the accuracy of the time the NTP
server produces as NTPd will generally prefer it to do the final trimming
to UTC.   The strength of the combination of both terrestrial transmitted
time via NTP and the precision of rf-transmitted GPS time ensures that time
is both correct and precise.  There are still attack vectors here, but as
you add more time sources, the complexity of pulling off a successful
attack increases.  This is especially true if you can monitor the NTP
server for signs of stress, such as time servers that are not telling the
correct time or GPS signals which are inconsistent with the NTP-derived
time.   A successful attack would require simultaneous NTP (network) and
GPS (rf) attacks.

Other options or blends of options are also possible.   With a reasonably
large network, putting enough GPS receivers into place would significantly
reduce the possibility of a spoofer or jammer taking out your entire GPS
infrastructure.  Reducing or eliminating external NTP time sources might be
reasonable in that case.   The theory is that attacking GPS receivers at
one location is easy.  Doing it at dozens simultaneously is much more
difficult.   To use an exaggeration to make a point:  If you had 100
different GPS receivers spread across 100 widely geographically diverse
locations, and all of your NTP servers were able to poll all of them for
time, the chances that an attacker would be able to take out or spoof
enough GPS receivers to make a difference would be close to zero.  Your
failure point becomes UTC(USNO) and the GPS constellation itself. The same
argument would apply to NTP servers regarding quantity and diversity.

Other options involve adding additional technologies.   For example, some
appliances use GPS to discipline (adjust) an internal atomic clock. Once
the atomic clock is locked to UTC, the GPS can fail for extended periods
without affecting NTP output.   In addition, some of these will filter
updates from the GPS based on the appliance's internal atomic time.   That
way, a spoofer would be ignored, jammers would have to continue for hours
or days, and so on.   Of course, these solutions' reliability depends on
the implementation quality.   If I had the budget to implement something
like this in a network, I'd likely scatter a few of these around the
network and then still use garden variety NTPd servers which would be
pointed at these appliances.  I might even consider buying solutions from
multiple vendors to ensure a bug in one implementation was filtered out and
ignored.

I can't cover every option here, but balancing security, cost, operational
complexity, and application needs is the key.   Some solutions are cheap
and easy but not robust.  Some are highly robust but expensive and not
easy.   Somewhere in the middle is probably where most real implementations
should lie.

Now, to address a couple of specific items:

1) Additional GPS and commercial time distribution systems will likely
improve reliability.   However, only GPS and GALILEO are available for free
in the US.   I'm ignoring GLONASS for various legal and political reasons.
 GALILEO is a valid option but it lives in the same band as GPS, so jamming
GPS will usually also jam GALILEO.  Utilizing GNSS receivers that use the
civilian signals in the newer bands would also help.  Some commercial
solutions are available that don't require GNSS, but they're relatively new
and not as commonly available as one would like.

2) For running my own time servers in a service-provider environment, I'd
rather specifically designate the exact NTP server I want to utilize and
not rely on a third party to give me a pool of servers.   It's more about
ensuring the server I use is running a trusted server, and if I delegate
the server selection, I lose this ability.  On the other hand, where I'm
not running a NTP server that is critical for many clients, I'll just point
it at pool.ntp.org, or north-america.pool.ntp.org and skip all of the
recommendations that I've made above.   I would be cautious about
requesting pool.ntp.org add entries for "stratum of server" or "origin of
time" as this seems like it would tend to overload the stratum one servers
in the pool with people "optimizing" their configuration to use only
stratum one servers.   Remember that pool.ntp.org is generally intended as
an end-user-device service, and providing methods that end users can bypass
the robustness that a fully distributed pool will provide is probably not a
great idea.

3) This all should hopefully sort itself out over the next few years.   GPS
and GALILEO are flying new birds that have changes designed to improve
attack resilience by using cryptography to ensure authentic transmissions
(which may rely on ground transmission of cryptographic keys).   NTP
already supports manual cryptographic keys that work, but NTS is a pain in
the rear. Hopefully, NTPv5 will have a better security mechanism.   Other,
more secure, time sources are on the horizon as the cybersecurity crowd is
aware of the issues.

And finally, as a sort of a tl;dr; Summary:  Each operator needs to decide
how critical time is to their network and pick a solution that works for
them and fits the organization's budget.   Some operators might point
everything at pool.ntp.org and not run their own servers.  Others might run
their own time lab and use that time to provide NTP time and precision time
and frequency via various methods.  Most will be somewhere in between. But
regardless of which you choose, please be aware that GPS isn't 100% secure,
and neither is NTP. If attack resilience matters to you, you should think
about all of the attack vectors and design something that is robust enough
to meet your use case.

Current thread: