Hacker News new | past | comments | ask | show | jobs | submit login
Living Without Atomic Clocks (2016) (cockroachlabs.com)
94 points by bobbiechen on March 23, 2020 | hide | past | favorite | 49 comments

My first long-term job involved collecting telemetry data from equipment and displaying it.

The display was technically challenging, but in the meaty way that developers often relish. The clock skew, however, was not.

For liability purposes it was sometimes necessary to know if Event A happens before Event B, which means you have to normalize all of the events across time zones and then correct for drift too.

That experience and a bunch of others (including statistics classes, and studying the Java Memory Model, which other languages have borrowed or stolen) have left me with a lingering doubt about how we record distributed activities.

I really kind of feel like we [all] need a new data model [like the one they mention here] where dependent events are recorded in that way. I don't know exactly what that would look like, but I think it would help a great deal in consensus situations where you have to resolve a conflict, or even just for displaying a sequence of events in proper order.

It feels like we keep trying to get the exact nanosecond when something happens, but the only thing I ever see humans use that information for is to reverse engineer a sequence of events that resulted in a peculiar state in the system.

[edit: tie-in to article]

I think you might find this seminal paper interesting and relevant: https://amturing.acm.org/p558-lamport.pdf

From the abstract:

> The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specializedfor synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

I will read that, thank you.

You know, it seems like most of my quality of life improvements over the past 20 years has been due to my peers and I finally acting on much older information. The future is here, it's just unevenly distributed.

It is only infrequently I encounter something that still feels properly new under any kind of scrutiny, instead of revealing itself to be a refinement of something that already was known. Off the top of my head, I can think of escape analysis, Burrows Wheeler transform, and the object ownership semantics in Rust. I'll throw Raft on there since the joke is that only 12 people understood Paxos.

Sounds a lot like logical clocks https://en.wikipedia.org/wiki/Logical_clock, especially something like a vector clock https://en.wikipedia.org/wiki/Vector_clock.

>It feels like we keep trying to get the exact nanosecond when something happens, but the only thing I ever see humans use that information for is to reverse engineer a sequence of events that resulted in a peculiar state in the system.

Lamport clocks do exist and can provide a partial or total order of events. Not sure if that would've helped with your problem but such algorithms do exist. But it feels like they are rarely used (subjective feeling).

It should be noted that the total ordering that Lamport clocks provide is fairly artificial (in fact it's best to say that there is no natural total ordering of events in a distributed system).

Agreed, though you do see them a lot in CRDTs: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_.... Just about every non-trivial CRDT has something akin to a logical clock embedded inside it.

> In essence, it provides a means to absolutely order events, regardless of which distributed node an event originated at.

Einstein said you can't absolutely order events -- and with widely enough distributed systems and small enough time quanta, sooner or later you're going to run into relativistic implications.

Probably not a problem for most applications we currently have to deal with, but one day -- soon enough that we're already giving new protocols names like "Interplanetary File System" -- our databases will spread out among the stars, and how will we handle time and event ordering then?

Depends what you mean. Any individual can absolutely order events. It's just they might not agree with someone else's ordering. So as long as you designate an observer, or a couple observers and have a conflict resolution protocol that accounts for whatever it needs to account for (not much if all observers are in the same datacenter, a bit more if they happen to be scattered across galaxies), there's no inherent problem.

Everyone on Earth is in basically the same reference frame. They can easily agree on a standard event ordering.

Even between stars the difference isn't that big, but offsets don't matter at that scale anyway. When it takes a decade to send an email from one system to another, it doesn't matter if their timelines are offset by a week.

You are some several decades in the past. The fact that everyone on Earth is not in the same reference frame affects your everyday life nowadays, because it affects Atomic Time and that in turn affects everything that is based upon Atomic Time.

Since the 1970s, TAI construction has had to compensate for the differences between the physical locations of the atomic clocks in laboratories around the world and an ideal surface of equal gravitational potential around the world.

Nowadays, BIPM and other laboratories routinely talk about general relativistic corrections across the width of the measuring devices. To quote Appendix 2 of the SI Brochure:

> In 2013, the best of these primary standards produces the SI second with a relative standard uncertainty of some parts in 10^16. Note that at such a level of accuracy the effect of the non-uniformity of the gravitational field over the size of the device cannot be ignored. The standard should then be considered in the framework of general relativity in order to provide the proper time at a specified point, for instance a connector.

-- https://www.bipm.org/en/publications/si-brochure/

And despite that, we already have a single unified clock standard that we follow!

Being able to measure the drift at a specific point doesn't mean it's relevant to computers timestamping their calculations. If a computer is ten nanoseconds off, that's basically the same as it being one rack to the left, or having some slack in the cable. There's no real effect.

Just syncing your clock once a day is enough to let you completely ignore the effects of relativity.

For most use cases, you’d just give up serializability and implement an eventually consistent model. For anything that requires something more strict, you just wait. Longer. Processing a check used to take several days. Processing an earth bank check from mars would need to take about half an hour on average. Though that’s perhaps not the best example, because even today we process checks using an eventual consistency model.

Edit: A strict consistency earth > mars transaction would actually take longer, because 30 minutes is just an average round trip, and you’d need more than one.

Note that even with Google Cloud Spanner (last I checked anyway), the distributed consistency model was only available among machines within the same datacenter, as orchestrating transactions across regions would be latency prohibitive. I think you can get cross-region read replicas but that's all. (And I may even be mixing that up with some other cloud database).

So interplanetary, yeah one planet is going to be the central transaction handler and everyone else is going to have to deal with it. So likely things that require transactions, you'll have to have enough cash in the bank in your Mars account if you want quick transactions, and expect wiring money from Earth to take a while.

Coming back to the current day, if you need low-latency global transactions just here on Earth, then that's probably how you'd have to design it. You prime each region with a certain account "limit" of whatever it is you're transacting to use in local transactions, and when that limit gets low you transact some more from your central data store back into your regional account, or something along those lines. It'd be a two-stage thing.

That is pretty much how it’s done today. Global transactions are authorised in real time and then settled periodically. If I remember correctly, Visa does a settlement every 3 or 4 hours, then each of the merchants and other processors in the chain run their own schedule, so it can take a few days for balances to correctly reflect a transaction.

I remember being very impressed by Spanner when I first heard about it, but as you can see in the OP, it does make a lot of consistency guarantees through sheer brute force. It simply throws power/resources at problems which would generally be considered impractical when normally designing a distributed system. An approach which does of course have its limits (not to say it isn’t still impressive, it’s a great system).

Interesting point! I guess the galactic version of UTC would have to include specification of the inertial frame the event occurred in. For the purposes of ordering times could be converted to the time in some standard reference frame, the galactic centre of mass maybe.

Libraries for handling time would need to include functions for converting times between frames.

There is an extension to linearizability that takes General Relativity into account. Paywalled unfortunately: https://link.springer.com/chapter/10.1007/978-3-662-45174-8_... I agree that whatever can be done in an eventually consistent fashion should be. For the rest, at least academia has an answer.

(I work at Cockroach Labs and gave an internal talk on this paper some time ago).

CuriousMarc made a video last week where he uses a vintage HP 5061A atomic clock and explains how it works. It's interesting to see that atomic clocks aren't just some giant laboratory thing, but a product you could buy, even in the 1960s.


Yeah I had always thought they would be incredibly expensive. The Spanner paper (which is how I ended up at the posted link) contains this amusing line -

An atomic clock is not that expensive: the cost of an Armageddon master is of the same order as that of a GPS master


CDW has a Cesium reference clock for $92k. https://m.cdw.com/product/microsemi-5071a-high-performance-t...

Pretty similar form factor.

I believe this is the direct successor to the HP 5061A. HP spun off Agilent, who sold the time/frequency product division to Symmetricom, who was then acquired by Microsemi. It seems that the inflation-adjusted price of the Microsemi 5071A is about the same as the ~1970 catalog price of the HP 5061.

You can buy a used rubidium frequency standard on ebay [1] for less than $200.

Of course, it turns out there aren't that many home applications for an atomic clock, other than collecting precision metrology equipment. And if you're running a data centre and want higher precision than NTP, chances are you'll choose a PTP grandmaster clock at the high end, or a GPS receiver with a 1PPS output at the low end, in rather than buying second-hand parts from ebay.

[1] https://www.ebay.com/sch/i.html?_nkw=rubidium

> Of course, it turns out there aren't that many home applications for an atomic clock

For all the people who have a rubidium standard at home, 70% are using it for electronic lab, amateur radio, or NTP at home. But the remaining 30% is another large userbase too - audiophiles. Many audiophiles claim jitter and phase noise in the clock signal significantly affects audio quality, and the most extreme audiophiles use a lab grade frequency standard, rubidium or better, and feed it to all their Hi-Fi gears.

P.S: Given all the factors that affect audio quality, is the phase noise from a PLL synthesizer of reasonable quality really a major factor? Even if it is, does a rubidium standard really have any benefit beyond diminishing return over a crystal oven? Well, of course, these are the questions that are never answered by audiophiles.

> You can buy a used rubidium frequency standard on ebay

There is a continuous source of used rubidium standard coming from retired telecommunication and lab equipment, they are the cheapest atomic standard available. The catch is that the rubidium inside the discharge lamp will eventually get depleted during operation, usually within 10 years, once its life ends, it's useless and needs to be rebuilt completely by the manufacturer. So read the manufacturing date if you are going to power it 7x24.

The real expensive ones are the Cesium Standard, such as the HP 5061A. Recently, the Hydrogen Standard also saw some uses.

The rubidium does not deplete (it's a closed system), but does settle on the glass. Supposedly you can heat the glass to turn all the rubidium into gas again, giving you theoretically infinite lifetime. I picked up a used PRS-10 on ebay for $250. It's a small but great oscillator, still made today. The manufacturer SRS claims 20 years lifetime, and mine has had 13 years of runtime, so I should be able to get 7 years out of it hopefully. The 11W1 DSub connector they used on the other hand is stupidly expensive.

Cesium oscillators on the other hand do deplete their cesium (by design), as mentioned by CuriousMarc in the video. The tube in his unit was probably replaced at least twice, at a cost of >80k. The high performance tubes fare even worse because they "burn" through their cesium at a 3x rate.

The 5071 successor by HP-Agilent-Symmetricom-Microsemi is the most frequently used COTS clock that contributes to UTC.

> Supposedly you can heat the glass to turn all the rubidium into gas again, giving you theoretically infinite lifetime.

Thanks for the tip, that's interesting. I never know that they can be renewed in this way.

> Cesium oscillators on the other hand do deplete their cesium (by design), as mentioned by CuriousMarc in the video. The tube in his unit was probably replaced at least twice, at a cost of >80k.

Sad story. Recently I was browsing a web store that sells decommissioned U.S. military equipment. I was here to search for some cheap RF power meters, and I was surprised to see a HP 5071 in the listing. The vendor noted that he knew nothing about the equipment and information was welcomed. For a moment, I was thinking about sending him a service manual from the HP archive, before I realized it was simply impossible to get it up and running again without replacing the tube :(

> The 5071 successor by HP-Agilent-Symmetricom-Microsemi

Aha, the old Hewlett-Packard was truly a unique company. Its electronic technology has at least three separate chains of succession.

Test Equipment: HP-Agilent-Keysight

Semiconductor: HP-Agilent-Avago-Broadcom (Broadcom ended production of most HP parts, so now it's mostly dead, RIP...)

Frequency Standard: HP-Agilent-Symmetricom-Microsemi

And now:

Frequency Standard: HP-Agilent-Symmetricom-Microsemi-Microchip

Could you explain how one integrates a rubidium device in their hi-fi audio setup exactly? This sounds kind of fascinating. This is the first I've heard of this.

Lots of audio devices can take an external clock for synchronization and avoiding jitter. So you might have it output a 10MHz master clock signal and distribute it with coax cables.

We begin with Pulse Width Modulation. In PCM, an analog audio signal is digitized by an ADC, which performs digitization by measuring its instantaneous voltage, and convert it to a n-bit integer that represents this voltage level, such as 16-bit, at a fixed sampling frequency, such as 44.1 kHz. This process is reserved to playback the audio. "To properly digitize and re-construct digital audio it's relatively important the sampling intervals be as accurate as possible between the recording process and when it's played back." If the timing is not exactly one sample per 1/44.1k second, the signal will be distorted, it's being frequency-modulated by the unstable timing and it produces additional unwanted distortion. Like any digital system, a clock signal produced by an electronic oscillator provides the timebase, also known as the reference clock. Usually, the ADC and DAC chips needs a clock frequency commonly used in digital circuits, from 5 MHz to 30 MHz, and the internal circuitry inside the chip use this frequency as the reference clock to derive all the timings needed by the chip, including the sampling or reconstruction of PCM. Same applies to all other digital encoding formats.

Ideally, the clock frequency should be as stable as possible. Unfortunately, all clock oscillators have inherent short-term instabilities, long-term instabilities, and non-zero temperature coefficient. Short-term instabilities, known as phase noise (in the frequency domain), or jitter (in the time domain) is a particular concern. As long as the audio system is properly optimized and characterized, jitter is not an issue. A good DAC will normally have a jitter around -90 dBc or lower, and it's probably negligible. NwAvGuy (an audio engineer known for his criticism of baseless audiophile practice), has a good explanation of clock jitter: https://nwavguy.blogspot.com/2011/02/jitter-does-it-matter.h...

But of course, some audiophiles want to power their Hi-Fi gears with the best oscillator available, so that it'll have the lowest jitter and the minimum absolute frequency error and temperature coefficient, even if the benefits is dubious to other people. And they realized that a rubidium frequency standard is the best clock oscillator they can find. First, it's possible to DIY. All you needed to do is understanding how the digital part of the Hi-Fi system works and find the crystal responsible for generating the system/ADC/DAC clock. You simply remove the crystal and inject an external clock signal from the rubidium standard to the chip. 10 MHz is a common output frequency from a standard oscillator, if the audio circuit also has a 10 MHz clock, the rubidium standard output can be used directly. If it's not, if the frequency radio of the two is a integer, it's possible to use a frequency multiplier or a divider to generate the needed clock frequency from the rubidium standard output. If it's not an integer, a PLL frequency synthesizer can be used.

Second, in an electronic lab, many gears have a "10 MHz Reference Input" port, their internal timings can be derived from the reference input, so that all signal generators, frequency counters, spectrum analyzer to a master clock oscillator (can be crystal oven, WWVB radio, GPS, or atomic) for consistency, so that all equipment's frequencies don't drift to different directions, all get a good clock with constant performance, and there won't be a disagreement between equipment. In some professional audio equipment for music production, it also allows an external reference to be used for the same purpose. One can simply plug the 10 MHz standard oscillator, including a rubidium standard, into the BNC connector of the audio gears.

Thanks for the wonderfully detailed response. This blog you referenced has some great reading as well. Cheers.

A long time ago I worked in a finance/trading environment and we had GPS clock receivers in our datacenters that synced from satellites. While they weren't that expensive I would imagine at Google's scale they would might be. Is there a reason Google wouldn't have used this same technique if it achieves the same accuracy?

iirc, one big reason is leap second handling. The closed-source GPS receivers' firmware tend to behave strangely around leap seconds, in a way that contradicts their documentation. And their behavior is untestable in advance: a global satellite network for time synchronization is the ultimate un-mockable time bomb input.

Thanks but isn't that a reason to forgo using GPS receivers? Google obviously decided to use them however.

Oh, sorry. I was distracted and misread your comment. I was trying to answer the question of "why an atomic clock rather than a GPS receiver" (and Google's "Armageddon masters" have their own atomic clocks), but I see now you were asking "why not use Cockroach's technique rather than any specialized hardware".

I think one answer is in this blog post:

> A simple statement of the contrast between Spanner and CockroachDB would be: Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval. How long is that interval? Well it depends on how clocks on CockroachDB nodes are being synchronized. Using NTP, it’s likely to be up to 250ms. Not great, but the kind of transaction that would restart for the full interval would have to read constantly updated values across many nodes. In practice, these kinds of use cases exist but are the exception.

Thanks for the clarification, somehow I missed this tradeoff summary. Cheers.

I'm curious what the distribution of time offsets between servers running NTP is. Properly functioning NTP should maintain an offset well under 100ms, but what fraction of the time servers are "properly functioning" is a question I'm interested in knowing.

Talking with ops folks at work, they have seen system clocks in our fleet (which run ntpd) suddenly report times off by _years_ and then go back to "accurate." We have thousands of nodes for the software I work on. We sometimes get negative times when comparing events, if only by a few seconds. You can't fully trust time in a distributed system.

Sure, but it's also possible to write a word out to memory and then read something different back. There are probably many DBs that fail to meet their requirements when that happens. My sense is that time errors are much more common, but I'm interested in actual data surrounding that.

PTP (Precision Time Protocol) can get you sub microsecond synchronization on a LAN. See: https://en.wikipedia.org/wiki/Precision_Time_Protocol

Most recent spec is IEEE 1588-2019.

NTP is more than good enough to get 7ms synchronization on a LAN when working correctly, so the fact that PTP can do even better is perhaps uninteresting in light of the fact that NTP often fails to get anywhere near its theoretical performance (other comments mention clocks off by years).

This is really the key point, and the main thing that many people overlook. The purpose of TrueTime in Spanner is not gained by absolute accuracy. It's gained by knowing the clock uncertainty across all participants and affirmatively ejecting any participant with excessive uncertainty. The atomic clocks are totally besides the point.

But the atomic clock is how they keep the throughput from crashing every time a corner case is encountered.

It shouldn't be too expensive to build and deploy GPS disciplined oscillators based on commodity crystal oscillators (temperature compensated or ovenized). Only one would be needed for a whole datacenter, then local NTP would provide it to the LAN. No need for fancy telecoms grade Rubidium/Cesium stuff.

The fancy oscillators are for longer "holdover" time. It's essential that frequency not go out of spec for CDMA (etc.) networks even if the GPS signal is degraded for whatever reason, so they use oscillators with large holdover times to buy themselves an extra layer of redundancy.

I don't know if you need that for distributed transactions or not. The logic as to whether or not the time is "good" is probably a large part of the complexity of this scheme. Better hardware makes the software simpler and vulnerable to less failure modes.


"It’d be a showstopper to require an external dependency on specialized hardware for clock synchronization."

The latest entry in the surprising crowded genre: How can we copy Spanner, but without all that pesky correctness?

The post is from 2016.

You're right, this is the one that started it all. Sorry for being so jaded; I've been through two distinct efforts of "Spanner, but without thinking" in my recent career.

Since you obviously know about this, it would be good to share some of what you know, so the rest of us can learn. Dismissive comments without information lower discussion quality, not to mention frustrate curious users.


Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact