Hacker News new | past | comments | ask | show | jobs | submit login
Clock Synchronization (2020) (signalsandthreads.com)
69 points by tosh 43 days ago | hide | past | favorite | 16 comments



They mention it, but Chrony is really amazing as a time sync solution for most environments. I've used most open source (and some proprietary) implementations of NTP and PTP over the years. Chrony is more reliable, simpler to configure, and better documented than all of them.

These days, I would always choose to use Chrony when possible.

Quoting the Chrony FAQ [1]:

> When combined with local hardware timestamping, good network switches, and even shorter polling intervals, a sub-microsecond accuracy and stability of a few tens of nanoseconds might be possible.

This is a level of accuracy many people think is only achievable using PTP, but Chrony can do it using NTP. Not that most people need this level of accuracy, but some do, and it's just plain cool.

Once you set up time sync, the next step is to make sure you have sufficient monitoring in place, so that you can stop worrying about all the subtle issues caused by out of sync systems.

1. https://chrony.tuxfamily.org/faq.html


Poul-Henning Kamp has shown that you can do this with NTP, with source code that actually follows the standard well. PTP makes a great refclock for more typical NTP implementations, but is not a replacement.

I haven’t played around much with chrony and chronyd, so I would be hesitant to recommend it over anything else. I know of some other software I have experience with that I would recommend against, but I don’t want to start any political wars here.


Having read the article now, I think what they missed was that a good NTP server implementation doesn’t depend on the commercial time servers that you can buy off the rack. They may have good GPS hardware and software, but many times their NTP implementation is … suboptimal.

So, you instead get good GPS refclocks that give you a good direct PPS feed, which you then connect via serial port to your NTP time servers that will have a good implementation and configuration, because you’re making sure they do.

And you make sure that you’ve also got some local-only refclocks in there to help ensure that you remain locked into a good frequency domain, even if you do lose visibility to the GPS network. This is typically a rubidium clock, although there are other options.

You then synthesize the best information you can get from the GPS network as to what the exact time is at the moment you ask, with rubidium refclocks that can give you a very precise indication of exactly how long a second is, and you combine that with 40+ years worth of statistical knowledge and real-world experience through the NTPv3 protocol, and then you serve that time to your clients.

Some vendors, like Meinberg, get all of this right, out of the box. But most don’t.

Generally speaking, NTP easily handles single-digit millisecond resolution (with the right configuration and the right OS, even on pretty basic hardware), and can get down into the single-digit nanosecond range, but it gets harder and harder to push it into smaller and smaller time domains.

The PTP protocol was created more recently, and they have a dependence on more modern hardware. PTP time sync won’t get off the local segment of the LAN, without support from the switches and other network hardware, whereas NTP is designed around basic UDP/IP networking and doesn’t place any requirements of this sort on the network. The requirement for hardware timestamping of network packets can cut both ways there, as can the requirement for using multicast.

But PTP and NTP can be paired together for higher quality and higher resolution timekeeping than either protocol is capable of on their own.

Then you get down to the trick of monitoring all that, so that you can prove what level of time sync you’ve got across your fleet of machines. That seems to be one of the real innovations here, but they don’t talk much about how they achieved that.


+1 for Meinberg. I have worked for the competition (Gorgy Timing) for a few years and boy were we far behind in terms of hardware as well as software. They had a full ntpv3 while we marketed sntp as ntp (i.e. we missed a good part of the statistical treatment). I always have found their time servers wonderful.


Link is to the transcript of a podcast, intro: Clock synchronization, keeping all of the clocks on your network set to the “correct” time, sounds straightforward: our smartphones sure don’t seem to have trouble with it. Next, keep them all accurate to within 100 microseconds, and prove that you did -- now things start to get tricky. In this episode, Ron talks with Chris Perl, a systems engineer at Jane Street about the fundamental difficulty of solving this problem at scale and how we solved it.


"the fundamental difficulty of solving this problem at scale and how we solved it"

Time is hard and the funny thing is - it's relative! Anyone who says they have "solved" time is delusional. The best you can do is reduce errors and problems related to timekeeping to an acceptable level.

I've "solved" time at work with three Raspberry Pis with GPS boards/hats and aerials. My requirements are that logs be correlatable and Kerberos etc work - so milli-second is enough. My Pis have a spread of around 0.002ms ie 2us according to ntpq -p. It doesn't take much to make it far worse. When I update them, I do one at a time per day. Accuracy drops a bit.

I might put in three more at home. ntp needs a lot of sources to converge properly. Ideally five or more. I have quite a decent connection to work.

... reads transcript.

Oh well done - you discovered PTP.

Anyone who actually knows about this stuff like to comment on ntp vs ptp?


The standard rule is that you need 2n + 1 upstream time sources to protect against n falsetickers, but this falls down for the case of n=1. If you want to protect yourself against just one falseticker, you need at least four upstream time sources, to handle the case of what you do after you kick the one falseticker off the island. We explain this at http://support.ntp.org/bin/view/Support/SelectingOffsiteNTPS....

PTP is great for high resolution time synchronization on the local LAN, but doesn’t leave the local lan segment without special network support in the switches and hubs. It is not designed for good stable time sync over the long term. If you’re running a High Frequency Trading operation and it’s really important to know precisely which nanosecond or which picosecond a particular transaction occurred, then you want PTP. If you’re concerned about keeping all your servers in good enough time sync that you can debug the order of operations as they might show up in the log files on different servers, then you want NTP. And NTP can be configured to use PTP as a good quality Stratum 0 refclock.

If you wish to learn more about PTP, the Network Time Foundation supports the Linux PTP project, see https://www.nwtime.org/projects/linuxptp/

Disclaimer: I have been involved in supporting the Public NTP Project since 2003, and related projects like the NTF, the nwtime.org site, etc….


... "local LAN" is a tad redundant. :)


You are correct. I meant the local LAN segment.

I think I may have tried to go back and edit that clarification into the message, but failed. And then failed to post a follow up clarification.

So, good catch! Thanks!


I understood it to mean within the same subnet or switch. If that’s what the author intended then I think it’s not redundant.


It's not hard to make something that works ok most of the time. It's much harder to make it work all the time or at least to know for sure when it didn't work. It all depends on what is at stakes: if your clocks are wrong, are you just mildly inconvenienced? Do you get a cached result that should have expired? Is it a security problem (are you accepting a certificate that has expired)? Is it a correctness problem (is the causal order of a database update been inverted by mistake)? There is a spectrum of severity of problems caused by clock sync, and whether the solutions are trivially solved or provably impossible (or anything in between) ultimately depends on what you're trying to do.


Not sure why you were DVd. You have mentioned many valid points related to timesync.


Build or buy a few GPSDOs, put a couple of antennas outside (up and in the clear), distribute the 1PPS to a few of your servers, put NICs that support hardware timestamping in them, and make them your time servers (they'll run your NTP and PTP daemons).

A dedicated VLAN for PTP will be helpful, by the way.

Anything "critical" can run the PTP client in order to get time sync via PTP (on that dedicated VLAN). Everything else can run an NTP client.

(That Raspberry Pi with a cheap GPS module probably isn't nearly as accurate as you think. In particular, its PPS is likely full of jitter. If it's "good enough" for you, well, that's all that matters.)


"The idea is, I guess, you can do in some sense, the moral equivalent of what NTP does with the two middle timestamps. Where there are two timestamps in NTP that come from the server that’s reporting time. It’s like when it receives and then when it sends out and you get to subtract out from the added noise that gap between those two timestamps, and then the idea is you can do this over and over again across the network, and so delays and noise are introduced by, for example, queueing on the switch, would go away. Like you would essentially know how much those delays were and as a result, you could potentially be much more accurate."

In other words, every node of the network starts clapping, and listening to others. If it's off-beat, it'll adjust.

Humans do the same kind of heartbeat synchronisation all the time: with hand clapping, finger clicking, drumming fingers, dancing to flashing lights... the unity makes us feel more connected and builds trust because we have something in common.

Thank our mothers for teaching us to clap when we were babies. And thank the Source of our universe's clock, whose power can place a breakpoint and pause the debugger whenever we call on His name.


Since NTP is being discussed. There is now a new RFC specifying NTS - Network Time Security for NTP:

https://datatracker.ietf.org/doc/html/rfc8915

Chrony mentioned earlier here supports NTS.

There are public servers supporting NTS, for example:

https://www.netnod.se/time-and-frequency/how-to-use-nts


Yup, NTS is a good thing.

Thanks for bringing that update to the conversation!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: