nanog mailing list archives

Re: NTP Issues Today


From: Jared Mauch <jared () puck nether net>
Date: Tue, 20 Nov 2012 14:39:03 -0500


On Nov 20, 2012, at 2:28 PM, Jay Ashworth <jra () baylink com> wrote:

----- Original Message -----
From: "Leo Bicknell" <bicknell () ufp org>

To protect against two falseticking servers (tick and tock, as we saw on
the 19th) you need _FIVE_ servers minimum configured if they are both in
the list. More importantly, if you want to protect against a source
(GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of
_FOUR_ different source technologies in the list as well.

It's not hard, my box that I posted the logs from peers with 18
servers using 8 source technologies, all freely available on the Internet...

I'm curious, Leo, what your internal setup looks like.  Do you have an
internal pair of masters, all slaved to those externals and one another, 
with your machines homed to them?  Full mesh?  Or something else?

In my last big gig, it was recommended to me that I have all the machines 
which had to speak to my DBMS NTP *to it*, and have only it connect to the
rest of my NTP infrastructure.  It coming unstuck was of less operational
impact than *pieces of it* going out of sync with one another...


here's a sample ntp config from one of my systems.

-- snip --
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org

#
server 0.us.pool.ntp.org iburst maxpoll 9
server 1.us.pool.ntp.org iburst maxpoll 9
server 2.us.pool.ntp.org iburst maxpoll 9
server 129.250.35.250 iburst maxpoll 9
server 129.250.35.251 iburst maxpoll 9

-- snip --

You can audit its operation like this:

nat:~$ ntpq -p -n -c ass
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-129.250.35.250  164.244.221.197  2 u   68  512  377   19.248   -0.135   3.195
+129.250.35.251  192.5.41.40      2 u  439  512  377   41.817    1.109  15.660
-206.57.44.17    204.123.2.5      2 u  126  512  377   37.133   -6.443   9.631
+4.53.160.75     209.81.9.7       2 u   48  512  377   25.209    1.551   8.804
-64.73.32.135    192.5.41.41      2 u  349  512  377   23.418   -0.703   1.721
*50.116.38.157   64.250.177.145   2 u  380  512  377   43.021    1.267   2.136
+208.87.221.228  10.0.22.49       2 u  517  512  377   92.000    0.974   0.678
-206.212.242.132 128.252.19.1     2 u  323  512  377   21.781   -2.873   1.304
+38.229.71.1     204.123.2.72     2 u  211  512  377   21.977   -0.055   2.274

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 39973  931a   yes   yes  none   outlyer    sys_peer  1
  2 39974  941a   yes   yes  none candidate    sys_peer  1
  3 39975  9324   yes   yes  none   outlyer   reachable  2
  4 39976  942a   yes   yes  none candidate    sys_peer  2
  5 39977  931a   yes   yes  none   outlyer    sys_peer  1
  6 39978  961a   yes   yes  none  sys.peer    sys_peer  1
  7 39979  9414   yes   yes  none candidate   reachable  1
  8 39980  931a   yes   yes  none   outlyer    sys_peer  1
  9 39981  941a   yes   yes  none candidate    sys_peer  1


What you would have seen is a falseticker from the impacted clocks.

This is a fairly reasonable setup.

I've also been looking at an item like this:

http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP

which is about $300 + misc parts.

Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc.

- Jared



Current thread: