nanog mailing list archives

Re: NTP Sync Issue Across Tata (Europe)


From: "Forrest Christian (List Account)" <lists () packetflux com>
Date: Mon, 14 Aug 2023 04:55:01 -0600

We're going to have to somewhat disagree here...

I may not have been 100% clear about what I see as the most common risks
for GPS.  The reason I suggest that NTP risks and GPS risks are similar is
not primarily due to intentional time injection hacks (although that is a
risk).   Instead, it's related to GPS failure modes, and the increased
commonality of GPS jamming causing those failures.   I will 100% concede
that NTP carries far more spoofing or intentional DoS risks.   But GPS is
far more likely to suffer a failure in the absence of a bad actor than NTP.

The reason for this is that the GPS signal is incredibly weak, and it's
incredibly easy to break GPS reception.   Good antenna placement and
antennas that try to reject terrestrial signals help but don't always
prevent the failures from happening.

Because GPS is used more and more to track objects and people, people who
don't want to be tracked are starting to buy and use jammers.  In addition,
it's becoming increasingly common for gamers to spoof their GPS location
(and, as a result, time) via GPS injection.  So the kid down the street
trying to cheat at pokemon go or the truck driver not wanting to get in
trouble for speeding may unintentionally cause your time server to quit
working correctly.  Not to mention the random piece of electronic gear
which malfunctions and spews noise across the GPS band.

So, yes, I will 100% agree with you that NTP carries more intentional
hacking risk.   But I'm going to argue that GPS carries a significantly
higher risk of a jamming-related failure.   Without good statistics, it's
hard to tell which is more prevalent.   I see a lot more GPS failures from
my viewpoint, but I also have to talk to customers who are having precision
timing issues due to GPS failures.

My intuitive feeling is that in the absence of bad actors, NTP is
significantly more reliable than GPS.   In the presence of remote bad
actors, I'll grant that NTP is 100% the loser here.  When everything is
working, GPS will provide better time.  Adding a holdover oscillator to GPS
does help in marginal situations, but doesn't resolve all of the GPS issues.

In those situations where time is not critical, either NTP or GPS is a good
solution, and it largely comes down to which you prefer.   I deal with way
too many antennas so I'd rather just harden a NTP server.   You might deal
with way too many hackers getting in your systems so you might prefer
relying on a GPS antenna.   Either way, most of the time you're going to
get decent time service.   We could go into a lot of details about how each
system can fail, but for non-time critical applications I'm not sure either
would come out a clear winner.  I know you believe GPS does, and I believe
that it isn't 100% clear which one is better for those "just want time that
works most of the time" applications.   We could argue all day about this
and we won't get anywhere beyond us disagreeing about this.

Once you get to more time-critical apps where actual budget is going to be
expended on ensuring reliable NTP services are available 24x7, then neither
a default configuration NTP server nor a single GPS receiver will provide
reliable time.   Selecting servers and hardening firewalls to limit the
likelihood of time injection can work wonders on NTP robustness.   GPS
works too if you provide enough GPS timing sources that multiple locations
would have to be jammed at once.   Providing a mix of these is even
better.  If I was to go GPS-only I'd probably try to ensure a minimum of 3
different GPS receivers at 3 different locations, with internal NTP servers
pulling from each of the GPS-connected NTP servers.   5 would even be
better.   An even more robust option would be to go with 5 GPS receivers
and 2 or 3 NTP-connected stratum 1 time sources.   In this last case, you
could spoof ALL of the NTP servers and the GPS would still be in control.
You could also have signal failures at 3 of the GPS sites and the NTP
connections would provide redundant time sources.   Only with GPS failures
at multiple sites AND NTP failures or spoofing happening at the same time
would one have an issue where the NTP servers could possibly fail to
receive correct time.


On Mon, Aug 14, 2023 at 2:00 AM Mel Beckman <mel () beckman org> wrote:

Forrest,

I think you’re gilding the lilly. My original recommendation was to use
GPS as primary, for its superior accuracy and resistance to attack, and
have anti-GPS back up.  If you want automatic fail over, do that in an
intermediate server on your site that makes a conscious test and decision
to fail over to Internet NTP.

You’re mistaken to say that the vulnerability of GPS is remotely
comparable  to the vulnerability of Internet-based NTP. To interfere with a
GPS-derived clock, an attacker has to physically be present. That’s a huge
expense — and risk — that hackers are really not interested in undertaking.
They would much rather sit in Russia or China and attack NTP servers
remotely using any of the several attack methodologies I’ve cited
previously.

So curate Internet NTP or not (personally, that seems like just another
thing to monitor and maintain), but make GPS your primary time standard.
You’re much better off staying air-gapped from Internet NTP until you
detect a GPS failure.  All the other machinations are pointless while GPS
is working, because GPS gives you by far the best accuracy and security for
the buck.  Like I said, spend $400 on a commercial GPS time server and
timing problems are solved. Or use facility-provided GPS if you can’t get
an antenna up.

 -mel

On Aug 14, 2023, at 12:10 AM, Forrest Christian (List Account) <
lists () packetflux com> wrote:


I've responded in bits and pieces to this thread and haven't done an
excellent job expressing my overall opinion.   This is probably because my
initial goal was to point out that GPS-transmitted time is no less subject
to being attacked than your garden variety NTP-transmitted time. Since this
thread has evolved, I'd like to describe my overall position to be a bit
clearer.

To start, we need a somewhat simplified version of how UTC is created so I
can refer to it later:

Across the globe, approximately 85 research and standards institutions run
a set of freestanding atomic clocks that contribute to UTC.   The number of
atomic clocks across all these institutions totals around 450.   Each
institution also produces a version of UTC based on its own set of
atomic clocks.  In the international timekeeping world, this is designated
as UTC(Laboratory), where Laboratory is replaced with the abbreviation for
the lab producing that version of UTC.   So UTC(NIST) is the version that
NIST produces at Boulder, Colorado, NICT produces UTC(NICT) in Tokyo, and
so on.

Because no clock is perfectly accurate, all of these versions of UTC drift
in relation to each other, and you could have significant differences in
time between different labs.   As a result, there has to be a way to
synchronize them.  Each month, the standards organization BIPM collects
relative time measurements and other statistics from each
institution described above.  This data is then used to determine the
actual value of UTC. BIPM then produces a report detailing each
organization's difference from the correct representation of UTC.   Each
institution uses this data to adjust its UTC representation, and the cycle
repeats the next month. In this way, all of the representations of UTC end
up being pretty close to each other.   The document BIPM produces is titled
"Circular T."  The most recent version indicates that most of the
significant standards institutions maintain a UTC version that differs by
less than 10ns from the official version of UTC.

Note that 10ns is far more accurate than we need for NTP, so most of the
UTC representations can be considered identical as far as this discussion
goes. Still, it is essential to realize that UTC(NIST) is generated
separately from UTC(USNO) or other UTC implementations.  For example, a
UTC(NIST) failure should not cause UTC(USNO) to fail as they utilize
separate hardware and systems.

Each of these versions of UTC is also disseminated in various ways.
UTC(NIST) goes out via the "WWV" radio stations, NTP, and other esoteric
methods.   GPS primarily distributes UTC(USNO), which is also available
directly via NTP.  UTC(SU) is the timescale for GLONASS.  And so on.

So, back to NTP and the accuracy required:

Most end users (people running everyday web applications or streaming
video or similar) don't need precisely synchronized time.   The most
sensitive application I'm aware of in this space is likely TOTP, which
often needs time on the server and time on the client (or hardware key)
within 90 seconds of each other.   In addition, having NTP time fail
usually isn't the end of the world for these users.  The best way to
synchronize their computers (including desktop and server systems) to UTC
is to point their computer time synchronization service (whatever that is)
at pool.ntp.org, time.windows.com, their ISP's time server, or similar.
Or, with modern OS'es, you can leave the time configured to whatever server
the OS manufacturer preconfigured.   As an aside, one should note that
historically windows ticked at 15ms or so, so trying to synchronize most
windows closer than 15ms was futile.

On the other hand, large ISPs or other service providers (including
content providers) see real benefits to having systems synchronized to
fractions of seconds of UTC.   Comparing logs and traces becomes much
easier when you know that something logged at 10:02:23.1 on one device came
before something logged at 10:02:23.5 on another.   Various
server-to-server protocols and software implementations need time to be
synchronized to sub-second intervals since they rely on timestamps to
determine the latest copy of data, and so on.   In addition, as an ISP,
you'll often provide time services to downstream customers who demand more
accuracy and reliability than is strictly necessary.

As a result, one wants to ensure that all time servers are synchronized
within some reasonable standard of accuracy.   Within 100ms is acceptable
for most applications but a goal of under 50ms is better.   If you have
local GPS receivers, times down to around 1ms is achievable with careful
design.  Beyond that, you're chasing unnecessary accuracy.  Note that loss
of precision is somewhat cumulative here - running a time server
synchronized to within 100ms will ensure that no client can be synchronized
to better than within 100ms from that server.   Generally, you'll want your
time server to be synchronized much better than needed to avoid the time
server being the limiting factor.

In a perfect world with no bad actors and where all links ran perfectly,
one could set up an NTP server that pulled from pool.ntp.org or used GPS
and essentially acted as a proxy.   Unfortunately, we don't live in this
world.   So one has to ask how you build a system that meets at least the
following goals:

* Synchronized to UTC within 50ms, with lower being better.
* Not subject to a reasonable set of attacks (typical DoS attacks, RF
signal attacks, spoofing, etc).
* Able to be run by typical network operations staff

In addition, an ideal server setup would be made up of redundant servers
in case one piece of hardware fails.  I will ignore this part, as it's
usually just setting up multiple copies of the same thing.

The two most straightforward options are using a GPS-based NTP appliance
or installing an NTP server and pointing it at pool.ntp.org.   Under
normal circumstances, both options will be synchronized to UTC with enough
accuracy for most applications, and both are easy to run by typical network
operations staff.  This assumes reasonably consistent network latency in
the NTP case and a good sky view in the GPS case.  The GPS-based appliance
is, however, subject to spoofing or jamming, as I've discussed earlier.
 The NTP server is at the mercy of the quality of the servers it picked
from pool.ntp.org and is also subject to various outside attacks
(spoofing, etc.).   One must decide how critical time is to them before
deciding whether this option is valid.

The other end of the scale is the "develop your own offline version of UTC
using atomic clocks" methodology.  This fixes the attack issue but
introduces several others.   The main one is that you are now relying on
the clock's accuracy.  Admittedly rubidium and especially cesium clocks
tend to be sufficiently reliable and stable.   However, one has to ensure
the frequency is accurate initially and stays that way. You must also wire
the clock to an NTP Server and calibrate the initial UTC offset.   If the
clock goes haywire or is less accurate than is required, your in-house
version of UTC will drift in relation to real UTC.   This means you may
need 2 or 3 or more atomic clocks to be sufficiently reliable.  You'll then
need to regularly take an average, compare it to UTC, and adjust if it's
drifted too much.   This quickly becomes more of a science project than
something you want network operations staff to deal with on an ongoing
basis.    To be clear:  If you need robust time not subject to outside
forces and have or can obtain the skill set to pull this off internally, I
won't argue that this is a bad option.  However, I feel this isn't the type
of service most providers want to run internally.

So, looking at some middle-ground options that trade a bit of robustness
for ease of use is reasonable.

My lowest cost preference has always been to use a set of in-house NTP
servers pointed at a carefully curated collection of NTP servers.    Your
curation strategy should depend on network connectivity, the reliability of
the time sources, etc.   In North America, picking one or two NIST servers
from each NIST location is a good starting point.  That is one or two from
each of Maryland, Fort Collins, Boulder, and the University of Colorado.
 One may want to add some servers from other timekeeping organizations
(such as USNO).   Note that there is one commonality:  These time servers
are run by organizations listed in circular T as contributing to UTC, and
the servers are tied to the atomic clocks. That way, we ensure that the
servers are not subject to inaccuracies caused by time transfer from an
authoritative source for UTC.   What is left is any potential attack on the
time transfer over NTP itself.   I would argue that with a curated list of
enough NTP servers, this risk can be pushed down to where it is low enough
for many use cases.   A lot will depend on the quantity and quality of NTP
servers you select and the robustness of the network path to those
servers.  If the packets between your NTP server and the NTP servers you
choose traverse a relatively secure and short path with plenty of
bandwidth, and the paths to differing NTP servers are diverse, many attacks
will become harder to implement.   In addition, the more NTP servers you
add, the more likely it is that NTP will be able to correctly pick the
servers providing the correct time, even if an attacker is successfully
spoofing one or more sources.  In some cases it may make sense to add
additional servers which are run by third parties if it gains additional
robustness based on network architecture.  This is especially true if
you're closely connected network-wise with the third party and they run a
good quality NTP service as well.

As I've mentioned, a good middle-of-the-road solution is adding various
sources of time derived via GPS.   Note I said, "to add."    Start with the
carefully curated NTP server set, then install one or more GPS-based NTP
Servers polled by your NTP server.   Adding these GPS time sources to your
NTP servers does three things:  First, it provides another source of time
NTP can use to determine the correct time.   Second, we're now using a
different time transmission method with different vulnerabilities.   And
finally, it will significantly improve the accuracy of the time the NTP
server produces as NTPd will generally prefer it to do the final trimming
to UTC.   The strength of the combination of both terrestrial transmitted
time via NTP and the precision of rf-transmitted GPS time ensures that time
is both correct and precise.  There are still attack vectors here, but as
you add more time sources, the complexity of pulling off a successful
attack increases.  This is especially true if you can monitor the NTP
server for signs of stress, such as time servers that are not telling the
correct time or GPS signals which are inconsistent with the NTP-derived
time.   A successful attack would require simultaneous NTP (network) and
GPS (rf) attacks.

Other options or blends of options are also possible.   With a reasonably
large network, putting enough GPS receivers into place would significantly
reduce the possibility of a spoofer or jammer taking out your entire GPS
infrastructure.  Reducing or eliminating external NTP time sources might be
reasonable in that case.   The theory is that attacking GPS receivers at
one location is easy.  Doing it at dozens simultaneously is much more
difficult.   To use an exaggeration to make a point:  If you had 100
different GPS receivers spread across 100 widely geographically diverse
locations, and all of your NTP servers were able to poll all of them for
time, the chances that an attacker would be able to take out or spoof
enough GPS receivers to make a difference would be close to zero.  Your
failure point becomes UTC(USNO) and the GPS constellation itself. The same
argument would apply to NTP servers regarding quantity and diversity.

Other options involve adding additional technologies.   For example, some
appliances use GPS to discipline (adjust) an internal atomic clock. Once
the atomic clock is locked to UTC, the GPS can fail for extended periods
without affecting NTP output.   In addition, some of these will filter
updates from the GPS based on the appliance's internal atomic time.   That
way, a spoofer would be ignored, jammers would have to continue for hours
or days, and so on.   Of course, these solutions' reliability depends on
the implementation quality.   If I had the budget to implement something
like this in a network, I'd likely scatter a few of these around the
network and then still use garden variety NTPd servers which would be
pointed at these appliances.  I might even consider buying solutions from
multiple vendors to ensure a bug in one implementation was filtered out and
ignored.

I can't cover every option here, but balancing security, cost, operational
complexity, and application needs is the key.   Some solutions are cheap
and easy but not robust.  Some are highly robust but expensive and not
easy.   Somewhere in the middle is probably where most real implementations
should lie.

Now, to address a couple of specific items:

1) Additional GPS and commercial time distribution systems will likely
improve reliability.   However, only GPS and GALILEO are available for free
in the US.   I'm ignoring GLONASS for various legal and political reasons.
 GALILEO is a valid option but it lives in the same band as GPS, so jamming
GPS will usually also jam GALILEO.  Utilizing GNSS receivers that use the
civilian signals in the newer bands would also help.  Some commercial
solutions are available that don't require GNSS, but they're relatively new
and not as commonly available as one would like.

2) For running my own time servers in a service-provider environment, I'd
rather specifically designate the exact NTP server I want to utilize and
not rely on a third party to give me a pool of servers.   It's more about
ensuring the server I use is running a trusted server, and if I delegate
the server selection, I lose this ability.  On the other hand, where I'm
not running a NTP server that is critical for many clients, I'll just point
it at pool.ntp.org, or north-america.pool.ntp.org and skip all of the
recommendations that I've made above.   I would be cautious about
requesting pool.ntp.org add entries for "stratum of server" or "origin of
time" as this seems like it would tend to overload the stratum one servers
in the pool with people "optimizing" their configuration to use only
stratum one servers.   Remember that pool.ntp.org is generally intended
as an end-user-device service, and providing methods that end users can
bypass the robustness that a fully distributed pool will provide is
probably not a great idea.

3) This all should hopefully sort itself out over the next few years.
 GPS and GALILEO are flying new birds that have changes designed to improve
attack resilience by using cryptography to ensure authentic transmissions
(which may rely on ground transmission of cryptographic keys).   NTP
already supports manual cryptographic keys that work, but NTS is a pain in
the rear. Hopefully, NTPv5 will have a better security mechanism.   Other,
more secure, time sources are on the horizon as the cybersecurity crowd is
aware of the issues.

And finally, as a sort of a tl;dr; Summary:  Each operator needs to decide
how critical time is to their network and pick a solution that works for
them and fits the organization's budget.   Some operators might point
everything at pool.ntp.org and not run their own servers.  Others might
run their own time lab and use that time to provide NTP time and precision
time and frequency via various methods.  Most will be somewhere in between.
But regardless of which you choose, please be aware that GPS isn't 100%
secure, and neither is NTP. If attack resilience matters to you, you should
think about all of the attack vectors and design something that is robust
enough to meet your use case.





-- 
- Forrest

Current thread: