nanog mailing list archives

Re: NTP Sync Issue Across Tata (Europe)


From: Mike Hammett <nanog () ics-il net>
Date: Mon, 14 Aug 2023 08:44:38 -0500 (CDT)

Forrest seems to have posted a good general overview and perspectives about "good enough for the use case" while others 
continue to be pedantic about nuances that don't seem to be relevant to most use cases. 




----- 
Mike Hammett 
Intelligent Computing Solutions 
http://www.ics-il.com 

Midwest-IX 
http://www.midwest-ix.com 

----- Original Message -----

From: "Forrest Christian (List Account)" <lists () packetflux com> 
To: "nanog list" <nanog () nanog org> 
Sent: Monday, August 14, 2023 2:07:14 AM 
Subject: Re: NTP Sync Issue Across Tata (Europe) 



I've responded in bits and pieces to this thread and haven't done an excellent job expressing my overall opinion. This 
is probably because my initial goal was to point out that GPS-transmitted time is no less subject to being attacked 
than your garden variety NTP-transmitted time. Since this thread has evolved, I'd like to describe my overall position 
to be a bit clearer. 


To start, we need a somewhat simplified version of how UTC is created so I can refer to it later: 


Across the globe, approximately 85 research and standards institutions run a set of freestanding atomic clocks that 
contribute to UTC. The number of atomic clocks across all these institutions totals around 450. Each institution also 
produces a version of UTC based on its own set of atomic clocks. In the international timekeeping world, this is 
designated as UTC(Laboratory), where Laboratory is replaced with the abbreviation for the lab producing that version of 
UTC. So UTC(NIST) is the version that NIST produces at Boulder, Colorado, NICT produces UTC(NICT) in Tokyo, and so on. 


Because no clock is perfectly accurate, all of these versions of UTC drift in relation to each other, and you could 
have significant differences in time between different labs. As a result, there has to be a way to synchronize them. 
Each month, the standards organization BIPM collects relative time measurements and other statistics from each 
institution described above. This data is then used to determine the actual value of UTC. BIPM then produces a report 
detailing each organization's difference from the correct representation of UTC. Each institution uses this data to 
adjust its UTC representation, and the cycle repeats the next month. In this way, all of the representations of UTC end 
up being pretty close to each other. The document BIPM produces is titled "Circular T." The most recent version 
indicates that most of the significant standards institutions maintain a UTC version that differs by less than 10ns 
from the official version of UTC. 


Note that 10ns is far more accurate than we need for NTP, so most of the UTC representations can be considered 
identical as far as this discussion goes. Still, it is essential to realize that UTC(NIST) is generated separately from 
UTC(USNO) or other UTC implementations. For example, a UTC(NIST) failure should not cause UTC(USNO) to fail as they 
utilize separate hardware and systems. 


Each of these versions of UTC is also disseminated in various ways. UTC(NIST) goes out via the "WWV" radio stations, 
NTP, and other esoteric methods. GPS primarily distributes UTC(USNO), which is also available directly via NTP. UTC(SU) 
is the timescale for GLONASS. And so on. 


So, back to NTP and the accuracy required: 


Most end users (people running everyday web applications or streaming video or similar) don't need precisely 
synchronized time. The most sensitive application I'm aware of in this space is likely TOTP, which often needs time on 
the server and time on the client (or hardware key) within 90 seconds of each other. In addition, having NTP time fail 
usually isn't the end of the world for these users. The best way to synchronize their computers (including desktop and 
server systems) to UTC is to point their computer time synchronization service (whatever that is) at pool.ntp.org , 
time.windows.com , their ISP's time server, or similar. Or, with modern OS'es, you can leave the time configured to 
whatever server the OS manufacturer preconfigured. As an aside, one should note that historically windows ticked at 
15ms or so, so trying to synchronize most windows closer than 15ms was futile. 


On the other hand, large ISPs or other service providers (including content providers) see real benefits to having 
systems synchronized to fractions of seconds of UTC. Comparing logs and traces becomes much easier when you know that 
something logged at 10:02:23.1 on one device came before something logged at 10:02:23.5 on another. Various 
server-to-server protocols and software implementations need time to be synchronized to sub-second intervals since they 
rely on timestamps to determine the latest copy of data, and so on. In addition, as an ISP, you'll often provide time 
services to downstream customers who demand more accuracy and reliability than is strictly necessary. 


As a result, one wants to ensure that all time servers are synchronized within some reasonable standard of accuracy. 
Within 100ms is acceptable for most applications but a goal of under 50ms is better. If you have local GPS receivers, 
times down to around 1ms is achievable with careful design. Beyond that, you're chasing unnecessary accuracy. Note that 
loss of precision is somewhat cumulative here - running a time server synchronized to within 100ms will ensure that no 
client can be synchronized to better than within 100ms from that server. Generally, you'll want your time server to be 
synchronized much better than needed to avoid the time server being the limiting factor. 


In a perfect world with no bad actors and where all links ran perfectly, one could set up an NTP server that pulled 
from pool.ntp.org or used GPS and essentially acted as a proxy. Unfortunately, we don't live in this world. So one has 
to ask how you build a system that meets at least the following goals: 


* Synchronized to UTC within 50ms, with lower being better. 
* Not subject to a reasonable set of attacks (typical DoS attacks, RF signal attacks, spoofing, etc). 
* Able to be run by typical network operations staff 


In addition, an ideal server setup would be made up of redundant servers in case one piece of hardware fails. I will 
ignore this part, as it's usually just setting up multiple copies of the same thing. 


The two most straightforward options are using a GPS-based NTP appliance or installing an NTP server and pointing it at 
pool.ntp.org . Under normal circumstances, both options will be synchronized to UTC with enough accuracy for most 
applications, and both are easy to run by typical network operations staff. This assumes reasonably consistent network 
latency in the NTP case and a good sky view in the GPS case. The GPS-based appliance is, however, subject to spoofing 
or jamming, as I've discussed earlier. The NTP server is at the mercy of the quality of the servers it picked from 
pool.ntp.org and is also subject to various outside attacks (spoofing, etc.). One must decide how critical time is to 
them before deciding whether this option is valid. 


The other end of the scale is the "develop your own offline version of UTC using atomic clocks" methodology. This fixes 
the attack issue but introduces several others. The main one is that you are now relying on the clock's accuracy. 
Admittedly rubidium and especially cesium clocks tend to be sufficiently reliable and stable. However, one has to 
ensure the frequency is accurate initially and stays that way. You must also wire the clock to an NTP Server and 
calibrate the initial UTC offset. If the clock goes haywire or is less accurate than is required, your in-house version 
of UTC will drift in relation to real UTC. This means you may need 2 or 3 or more atomic clocks to be sufficiently 
reliable. You'll then need to regularly take an average, compare it to UTC, and adjust if it's drifted too much. This 
quickly becomes more of a science project than something you want network operations staff to deal with on an ongoing 
basis. To be clear: If you need robust time not subject to outside forces and have or can obtain the skill set to pull 
this off internally, I won't argue that this is a bad option. However, I feel this isn't the type of service most 
providers want to run internally. 


So, looking at some middle-ground options that trade a bit of robustness for ease of use is reasonable. 


My lowest cost preference has always been to use a set of in-house NTP servers pointed at a carefully curated 
collection of NTP servers. Your curation strategy should depend on network connectivity, the reliability of the time 
sources, etc. In North America, picking one or two NIST servers from each NIST location is a good starting point. That 
is one or two from each of Maryland, Fort Collins, Boulder, and the University of Colorado. One may want to add some 
servers from other timekeeping organizations (such as USNO). Note that there is one commonality: These time servers are 
run by organizations listed in circular T as contributing to UTC, and the servers are tied to the atomic clocks. That 
way, we ensure that the servers are not subject to inaccuracies caused by time transfer from an authoritative source 
for UTC. What is left is any potential attack on the time transfer over NTP itself. I would argue that with a curated 
list of enough NTP servers, this risk can be pushed down to where it is low enough for many use cases. A lot will 
depend on the quantity and quality of NTP servers you select and the robustness of the network path to those servers. 
If the packets between your NTP server and the NTP servers you choose traverse a relatively secure and short path with 
plenty of bandwidth, and the paths to differing NTP servers are diverse, many attacks will become harder to implement. 
In addition, the more NTP servers you add, the more likely it is that NTP will be able to correctly pick the servers 
providing the correct time, even if an attacker is successfully spoofing one or more sources. In some cases it may make 
sense to add additional servers which are run by third parties if it gains additional robustness based on network 
architecture. This is especially true if you're closely connected network-wise with the third party and they run a good 
quality NTP service as well. 


As I've mentioned, a good middle-of-the-road solution is adding various sources of time derived via GPS. Note I said, 
"to add." Start with the carefully curated NTP server set, then install one or more GPS-based NTP Servers polled by 
your NTP server. Adding these GPS time sources to your NTP servers does three things: First, it provides another source 
of time NTP can use to determine the correct time. Second, we're now using a different time transmission method with 
different vulnerabilities. And finally, it will significantly improve the accuracy of the time the NTP server produces 
as NTPd will generally prefer it to do the final trimming to UTC. The strength of the combination of both terrestrial 
transmitted time via NTP and the precision of rf-transmitted GPS time ensures that time is both correct and precise. 
There are still attack vectors here, but as you add more time sources, the complexity of pulling off a successful 
attack increases. This is especially true if you can monitor the NTP server for signs of stress, such as time servers 
that are not telling the correct time or GPS signals which are inconsistent with the NTP-derived time. A successful 
attack would require simultaneous NTP (network) and GPS (rf) attacks. 


Other options or blends of options are also possible. With a reasonably large network, putting enough GPS receivers 
into place would significantly reduce the possibility of a spoofer or jammer taking out your entire GPS infrastructure. 
Reducing or eliminating external NTP time sources might be reasonable in that case. The theory is that attacking GPS 
receivers at one location is easy. Doing it at dozens simultaneously is much more difficult. To use an exaggeration to 
make a point: If you had 100 different GPS receivers spread across 100 widely geographically diverse locations, and all 
of your NTP servers were able to poll all of them for time, the chances that an attacker would be able to take out or 
spoof enough GPS receivers to make a difference would be close to zero. Your failure point becomes UTC(USNO) and the 
GPS constellation itself. The same argument would apply to NTP servers regarding quantity and diversity. 


Other options involve adding additional technologies. For example, some appliances use GPS to discipline (adjust) an 
internal atomic clock. Once the atomic clock is locked to UTC, the GPS can fail for extended periods without affecting 
NTP output. In addition, some of these will filter updates from the GPS based on the appliance's internal atomic time. 
That way, a spoofer would be ignored, jammers would have to continue for hours or days, and so on. Of course, these 
solutions' reliability depends on the implementation quality. If I had the budget to implement something like this in a 
network, I'd likely scatter a few of these around the network and then still use garden variety NTPd servers which 
would be pointed at these appliances. I might even consider buying solutions from multiple vendors to ensure a bug in 
one implementation was filtered out and ignored. 


I can't cover every option here, but balancing security, cost, operational complexity, and application needs is the 
key. Some solutions are cheap and easy but not robust. Some are highly robust but expensive and not easy. Somewhere in 
the middle is probably where most real implementations should lie. 


Now, to address a couple of specific items: 


1) Additional GPS and commercial time distribution systems will likely improve reliability. However, only GPS and 
GALILEO are available for free in the US. I'm ignoring GLONASS for various legal and political reasons. GALILEO is a 
valid option but it lives in the same band as GPS, so jamming GPS will usually also jam GALILEO. Utilizing GNSS 
receivers that use the civilian signals in the newer bands would also help. Some commercial solutions are available 
that don't require GNSS, but they're relatively new and not as commonly available as one would like. 


2) For running my own time servers in a service-provider environment, I'd rather specifically designate the exact NTP 
server I want to utilize and not rely on a third party to give me a pool of servers. It's more about ensuring the 
server I use is running a trusted server, and if I delegate the server selection, I lose this ability. On the other 
hand, where I'm not running a NTP server that is critical for many clients, I'll just point it at pool.ntp.org , or 
north-america.pool.ntp.org and skip all of the recommendations that I've made above. I would be cautious about 
requesting pool.ntp.org add entries for "stratum of server" or "origin of time" as this seems like it would tend to 
overload the stratum one servers in the pool with people "optimizing" their configuration to use only stratum one 
servers. Remember that pool.ntp.org is generally intended as an end-user-device service, and providing methods that end 
users can bypass the robustness that a fully distributed pool will provide is probably not a great idea. 


3) This all should hopefully sort itself out over the next few years. GPS and GALILEO are flying new birds that have 
changes designed to improve attack resilience by using cryptography to ensure authentic transmissions (which may rely 
on ground transmission of cryptographic keys). NTP already supports manual cryptographic keys that work, but NTS is a 
pain in the rear. Hopefully, NTPv5 will have a better security mechanism. Other, more secure, time sources are on the 
horizon as the cybersecurity crowd is aware of the issues. 


And finally, as a sort of a tl;dr; Summary: Each operator needs to decide how critical time is to their network and 
pick a solution that works for them and fits the organization's budget. Some operators might point everything at 
pool.ntp.org and not run their own servers. Others might run their own time lab and use that time to provide NTP time 
and precision time and frequency via various methods. Most will be somewhere in between. But regardless of which you 
choose, please be aware that GPS isn't 100% secure, and neither is NTP. If attack resilience matters to you, you should 
think about all of the attack vectors and design something that is robust enough to meet your use case. 







Current thread: