NTP Clock Synch Accuracy – It’s time for microseconds

Making accurate clock signals available has been an ongoing challenge for mankind for millenniums (ref wikipedia).   We have increased accuracy gradually, from half hours (sundials) down to nanosecond (atomic clocks) over all those years.

But an accurate clock is worth little if it is not synchronized with other clocks relevant in a certain setting, i.e. a group of people with a meeting coming up or computer systems registering and sharing transactions.

Network Time Protocol (NTP) is a standardized Internet protocol (ref rfc1305) for clock synchronization between clients and servers. Practically all modern desktops, tablets, smart-phones, front-end and back-end servers apply NTP today.

Example NTP hierachy

NTP infrastructure is hierarchical (see Figure).  Top units, “stratum 1”, synchronize with external source of extreme accuracy, e.g. GPS or atomic clocks. Other units apply the NTP protocol to synchronize their internal (crystal based) clocks to the parent or neighbor unit.

Units send NTP-requests upward in the hierarchy at certain interval. NTP-replies are applied to gracefully adjust the local clock taking into account potential timing disturbance added to the requests replies by the network between the peers.

The default request interval for Linux and FreeBSD servers and desktops is 2¹⁰ seconds, ie. around 17 minutes. With this interval a stratum 2 server at UNINETT achieves an accuracy within 1ms, i.e. the local clock is out of sync with the stratum 1 server with +- 0.5ms. Accuracy oscillates a few times per hour. See figure.

ntp-oliven

Clock accuracy of stratum 2 server relative to 4 of its stratum 1 clock sources.

In 2016 a request interval of 17 minutes seem rather conservative given todays CPU and network capacities. Hence in the evening April 21th UNINETT decided to shorten the request interval to 2⁶ seconds, i.e. from 17 minutes down to 1 minute. As the the figure above illustrates, this update made a significant increase in clock accuracy. The stratum 2 clock server in question has now stable sub-millisecond accuracy and stays in sync with its stratum 1 servers with less than 200 microseconds offset.

Do we need this type of accuracy…? We believe so, first in scientific and back-end settings and later in more common application. Synchronization of database transaction in distributed systems is already relaying in tight clock synchronization.

2 thoughts on “NTP Clock Synch Accuracy – It’s time for microseconds

  1. Mschwager

    What are you using to measure your accuracy? I have a GPS clock on my local switch, and a server on the same switch, and I’m syncing via NTP to the GPS device and getting 6 ms accuracy according to ntpstat. I don’t know what to tweak to get it better. I’m polling the server every 16s.

    Reply
    1. Håvard Eidnes Post author

      We probably do this slightly unscientifically… But, in general, to measure or get a reasonable estimate of accuracy you need something similarly stable or preferably better to compare with. In our case, we also use a local GPS device which gives us a 1pps signal as the preferred synchronization source, input on the DCD pin on a physical serial port on a PC/server. However, to assure ourselves that we’re reasonably on track, we have also configured the NTP server to use quite a number of other stratum-1 servers (and also measure towards some of our own stratum-2 servers) as potential synchronization sources, and let ntpd do the measurements. And … the network paths between our stratum-1 NTP server and those other servers are quite lightly loaded, so that the network itself introduces minimal RTT jitter due to queuing delays.

      We use collected to collect the computed offset values over longer time periods (essentially collected from the “offset” column in the output from “ntpq -c pe”), and then eyeball the results using grafana.

      Below you’ll find copies of some of the plots we’re able to produce — all these plots show the status over the last 7 days (grafana allows you of course to pick time periods of your own choosing).

      All NTP servers and PPS source
      The first one shows all the NTP servers (including the local PPS source), and clearly not every one of those servers give time of good quality. The server on 158.38.2.4 is particularly bad, as it swings upwards of 8ms away from the rest of the bunch.

      Plot without 158.38.2.4
      In grafana you can “shift-click” on the legend for (in this case, a given server) and disable its plot. In the next one I’ve disabled the plot for 158.38.2.4.

      Next we see that some servers have a stable but systematical offset from the local clock. For some of them I have yet to determine the cause of that, it might be slightly asymmetric traffic patterns, although we try our best to avoid that, but since multiple network operators are involved, getting this resolved is sometimes difficult if not impossible.

      Anomalous servers removed
      After clicking away a number of other NTP servers which show some anomalous deviations, we’re left with the plot in the third figure, which at the center shows the GPS_PALISADE and PPS clock signal, and where you can approximately read out the level of variation.

      Two of the ones which are left, sth1.ntp.se and sth2.ntp.se are custom-built NTP servers at Netnod in Sweden, synchronized to local Cesium standards, with an FPGA-implemented(!) NTP server. To read more about the Netnod setup (which is indeed quite impressive), see https://www.netnod.se/ntp.

      Best regards,

      – Håvard

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *