Clockwork raises $21M to keep server clocks in sync

madars · on March 19, 2022

I can highly recommend Jane Street's Signals & Threads episode on clock synchronization: https://signalsandthreads.com/clock-synchronization/ In fact, the entire podcast is exceptional.

Uptrenda · on March 19, 2022

I was actually reading about this recently and learned about the time problem. Some interesting stuff I remember:

* On LAN: NTP can get up to 1ms accuracy and on WAN about 10ms accuracy.

* People's 'system clocks' use NTP for synchronization but can still be completely off.

* When you call DataTime.now() in Javascript it returns the unix timestamp in UTC time.

* Since it is UTC it will have the same value anywhere in the world - however since it is set based on the the system clock it isn't guaranteed to be very accurate at all.

* Browsers now have a high-precision API for doing measurements of elapsed time called 'performance.' Don't use DateTime.now() for this.

* HTTP servers seem to return a Date field that has a unix timestamp in it. Some projects have attempted to use this to synchronize time in Javascript.

* There appears to be no good, robust Javascript libraries that can synchronize and keep time accurately. Some libraries exist that use a single server + NTP to try calculate clock drift, however. But this isn't as good as the NTP daemon.

* Researchers HAVE been able to write software to synchronize a clock with better accuracy than NTP using distributed networks -- this should be closer to what a lot of people are interested in. Here's a relevant paper I found on this: https://scholar.google.com/citations?view_op=view_citation&c...

* People have done some pretty cool bench-marking hacks to measure elapsed time in Javascript before the existence of the performance counter API. With performance.now() -- its not a clock but a counter and browsers can intentionally limit its accuracy to make 'fingerprinting' harder. https://stackoverflow.com/questions/6233927/microsecond-timi...

TheDong · on March 19, 2022

> With performance.now() -- its not a clock but a counter and browsers can intentionally limit its accuracy to make 'fingerprinting' harder.

The primary reason that performance.now()'s precision is limited is for security reasons.

Spectre showed that it was possible to perform timing attacks to leak other memory on the system, and having a browser leak memory from other processes is quite a dangerous attack. As part of spectre's mitigations, browsers began limiting performance.now()'s precision.

See docs on this: https://developer.mozilla.org/en-US/docs/Web/API/Performance...

Chromium's impl: https://chromium-review.googlesource.com/c/chromium/src/+/85...

Total aside, but it's also more accurate to say that performance.now() is a monotonic clock rather than a counter. Counters don't necessarily have relationships with elapsed time, but performance.now() does, and it's conceptually the same as 'CLOCK_MONOTONIC'.

icelancer · on March 19, 2022

Yeah, until you sync to ~1ms with 99.999% accuracy required for production-specific services, you don't really realize how big of a pain in the ass this particular problem is. And syncing to ~1ms is comparably easy, although annoying as hell.

The problem the OP company is trying to solve is quite a bit tougher.

Of course it sounds extremely stupid, which is why there are so many drive-by "NTP solved this already" comments from people who don't deal with millisecond precision at high accuracy requirements, god forbid sub-millisecond.

hansel_der · on March 19, 2022

tbf NTP did solve this already, but then came variable clockspeeds & powersaving and bam back to the stoneage and clocks regularly skew in the order of multiple seconds.

paxys · on March 19, 2022

One bit of clarification:

> it returns the unix timestamp in UTC time

Unix timestamp is ALWAYS the number of seconds since January 1st, 1970 00:00:00 UTC. It cannot be in any timezone.

Dylan16807 · on March 19, 2022

"ALWAYS" is a bit much when we have leap seconds, though.

zokier · on March 19, 2022

Specifically unix timestamp is number of days since 1970-01-01 × 86400 + seconds since midnight.

The difference arises from the fact that in utc some days are 86401 seconds long, but unix timestamp just repeats the last second of the day instead of being simple ever increasing counter.

chrismorgan · on March 19, 2022

Most software does indeed ignore leap seconds, thereby generally matching UTC for times, but being discontinuous at leap second boundaries, and not matching UTC for durations. Sometimes this matching of POSIX behaviour is spelled out, but mostly it’s not.

As an example, since an ancestor comment was talking about JavaScript:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... says Date.now() “returns the number of milliseconds elapsed since January 1, 1970 00:00:00 UTC”, which wording would suggest the inclusion of leap seconds (since they have certainly elapsed).

However, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... mentions that Date.now(), Date.parse() and Date.UTC() ignore leap seconds; what it fails to mention is that actually it’s just that ECMAScript follows POSIX for time measurement throughout, though in a milliseconds base rather than seconds, meaning that everything to do with Date ignores leap seconds. (Citation: ECMA-262 §21.4.1.1 <https://262.ecma-international.org/12.0/#sec-time-values-and...>; warning, 6.7MB HTML document, slow to load.)

necovek · on March 19, 2022

I think stating it like that is a useful bit if redundancy: it's not overly long, and doesn't require the reader to know the exact definition.

assttoasstmgr · on March 19, 2022

IEEE 1588 (Precision Time Protocol) has already solved this problem for every use case that matters and offers microsecond precision.

https://www.eecis.udel.edu/~mills/ptp.html

PTP takes advantage of special Ethernet switches and other devices which can decode or manipulate the time tags in hardware because yes your switches add latency.

This just sounds like more horseshit out of SV. I applaud them for convincing someone to give them money for this snake oil.

jeffwheeler · on March 19, 2022

Their website links [1] to a presentation at Stanford [2]. Mendel Rosenblum briefly compares their concept with PTP at about 45:54. He says:

> Those people [fintech, etc.] are interested in our stuff, so obviously [that] means that thing [PTP] is not perfect for them. Part of the problem is, you’re trying to measure individual packets that are going through and get times. It means you have some clock there that you’re reading, and when it goes into the switch, and when it comes out, like, what is that clock doing in terms of its varying in frequency, and stuff like that. And then you get one sample and try to make things of it. It sorta pales in comparison to this big data approach we’re looking about.

This seems like a mischaracterization of PTP (802.1AS), which does use multiple samples to syntonize.

[1] https://www.clockwork.io/resource/ [2] https://www.youtube.com/watch?v=Opf9CBwP5R8

assttoasstmgr · on March 19, 2022

Not just that, but PTP switches modify the frame header to take into account the delta of the propagation delay through the switch. Everything is synced to the GM clock which is typically sync'd to GPS and your GM clock can itself be designed to ensure stability (e.g. temperature controlled oscillator). Furthermore, if reading the incoming data using a specialized NIC (they do this in finance, high speed trading for example) the timestamps are decoded in hardware. All this is done at the layer 2 level.

paxys · on March 19, 2022

Applications like Spanner work with nanosecond-level time difference between datacenters. PTP works in microseconds and can even go up to a few milliseconds, making it useless for actual time sensitive use cases.

pclmulqdq · on March 19, 2022

Spanner works fine with millisecond-level timestamps. Spanner needs accuracy, not precision: it relies on knowing the bounds on the the current time, which may be wide.

welterde · on March 19, 2022

I think you are confusing PTP and NTP. NTP is in the range of low microseconds to a few milliseconds (with hardware time-stamping can actually do much better than that though), while PTP can synchronize time to better than 100ns.

lrhegeba · on March 19, 2022

A very good overview of the different solutions can be found here: https://ying-zhang.github.io/time/2021-02-revisiting-sync/ seems to be more of a pushing the envelope than snake oil to me, but what do i know

SeasonalEnnui · on March 19, 2022

From the standpoint of having used 1588, I agree. I assume the greyed out nature of your post means it isn’t popular which is a little baffling given the truth in it.

hn_throwaway_99 · on March 19, 2022

Google Cloud Spanner uses atomic clocks to be able to synchronize using timestamps across distributed DBs. CockroachDB does not require atomic clocks, but I believe there is an "atomic clock mode" available. This approach sounds like it doesn't use atomic clocks, but instead just machine learning algorithms to detect offsets.

Would like to understand if the founders consider their approach to be a viable alternative to "atomic clock mode", but without actual atomic clocks.

dan-robertson · on March 19, 2022

I thought even spanner only relied on sync in the 100us-1ms range. Maybe I’m out of date or it’s more about having clocks advancing at a very reliable rate? Usually ‘atomic clocks’ means ‘gps appliance gets time from atomic clocks in gps satellites’ but maybe not in this case.

jasonwatkinspdx · on March 19, 2022

So Spanner uses TrueTime as an integral part of it's concurrency control algorithm. To preserve external consistency, Spanner sometimes has to wait out an uncertainty interval at transaction commit. The tighter the bounds on timestamp intervals from TrueTime, the less this waiting impacts performance. I've heard through the grape vine that TrueTime currently operates much better than the numbers in the original paper, but can't confirm if that's true.

They use GPS for global sync and a local atomic clock as reference. From conversations over the last few years this seems to be the common setup at big DCs, as the cost for a couple atomic clocks is minuscule compared to the scale of everything else.

hn_throwaway_99 · on March 19, 2022

> Usually ‘atomic clocks’ means ‘gps appliance gets time from atomic clocks in gps satellites’ but maybe not in this case.

No, Spanner has atomic clocks running in each data center.

dchichkov · on March 19, 2022

Why not just synchronize to GPS time? It is pretty straightforward to get hardware timestamp on the 1PPS from the GPS. And then your time is as precise as the GPS clock, which can have a rubidium standard.

wmf · on March 19, 2022

Then you need a GPS receiver in each server and a bunch of amps and cables to distribute the signal to every receiver.

KennyBlanken · on March 19, 2022

1PPS signal distribution is not a complex or remotely difficult problem.

dan-robertson · on March 19, 2022

For some reason companies that care a lot about clock sync don’t seem to do this so perhaps there is some other reason.

dchichkov · on March 21, 2022

A local GPS-disciplined NTP server, distributing time over 1GbE or 10GbE goes a long way. I can't think of any application, besides HFT and maybe some science experiments that need better precision.

And even in HFT, it is silly to try to use your x64 system time to get nanosecond precision. Even if you are really good, using bare hardware, all your code and data is in L2, an x64 system would still be giving you 100ns level jitter. No matter what you do. What you want is to have your network card (or FPGA) timestamping the packets accurately. And then releasing these packets at a precise timestamp.

Saying that, maybe Clockwork is doing something completely different. And not in nanosecond level, but at a millisecond level. They talk about "measure true one-way delays of a packet or remote procedure call, discover network bottlenecks and "hiccups" (outages lasting a few seconds), and identify underperforming VMs arising from "noisy neighbors." Now, that could indeed be useful.

mcshicks · on March 19, 2022

I wonder the exact same thing. I worked on CDMA infrastructure in the 90s and that's exactly how we did it.

gpderetta · on March 19, 2022

This is routinely done in some industries like finance.

netizen-936824 · on March 19, 2022

How long would it take for the sync operation with communication latency? How reliable is the method (how robust)?

nimish · on March 19, 2022

Interesting. CERN has white rabbit for sub-ns timing. Needs special hardware but is now IEEE 1588 (PTP) "High Accuracy" profile.

I wonder how this deals with the numerous retimers and DSPs required for really high speed ethernet.

macroblocks · on March 19, 2022

I’ve been working on fairly large scale broadcast television back-end systems which have been using 10/25/100G SMPTE 2110 IP for video flows instead of HD-SDI, and those have been timed with PTP for a while (SMPTE 2059-2 standard, which is a PTP profile).

The GPS locked master clocks are custom hardware (ex: https://evertz.com/products/5700MSC-IP), but a lot of the edge devices like the video playout servers are standard X86 hardware with Ubuntu Linux, with PTP delivered in-band over the 10G/25G ports to the device (same link as the video/audio flows).

(… Which I realize is not as accurate as the newer White Rabbit related PTP update you refer to - but still odd that the original article referenced NTP and not even regular PTP which has been in use for a while and seems pretty close to their claims even before the more recent enhancements)

nimish · on March 19, 2022

Regular PTP is old hat as you say. ~5ns though I thought needs some real thought, and picosecond sync needs a combo of PTP + SyncE + link compensation, which is now the "high accuracy" profile.

It's all fairly simple in the end if the hw has the tools but good luck buying SyncE or 1588 without $$$.

zbrozek · on March 19, 2022

In a simple 100 mbps network where I needed good synchronization, I used the RXC pin of the MII bus as input to a PLL+VCXO and explicitly configured the PHYs in the link to be master-->slave in the desired clock distribution direction. PTP provided phase. Across a small network (a couple switch layers) I was seeing about ~20 ns of time uncertainty. Clocks were locked within about 2 ppb. Worked well enough for my needs.

In hardware that's actually designed to do this, I think it's called "synchronous ethernet" but you can totally duct tape it into pedestrian hardware the way I did.

nimish · on March 19, 2022

Yeah I think SyncE is what telecom RANs use to get within 5g reqs on timing. White Rabbit operates like 1588 + SyncE + a compensation model for the fiber link. It's pretty slick.

It does need L2-L1 coordination but the ethernet MAC usually has no idea of what's happening on the L1 (it speaks "medium independent interface" or *MII). The CERN people warn against using Base-T since the PHY needs to do pretty complex signal processing which would likely destroy the syntonization.

They then use basic fiber links, but 100g, 400g+ have a gearbox and DSP and FEC in the way.

Nextgrid · on March 19, 2022

Out of curiosity, do you have any write-up or some documentation about this? Can't imagine a scenario where I'd need this any time soon but it sounds very interesting.

hansvm · on March 19, 2022

There are strong lower bounds in the error of pure-software time synchronization techniques. Is the proposal to incorporate additional hardware, or are they just probabilistically increasing the accuracy? If the latter, what applications can benefit from maybe being better synchronized, not being able to measure how much better the synchronization is, and maybe still having the same worst-case bounds that other algorithms give?

pclmulqdq · on March 19, 2022

The lower bounds are based on the use of occasional time measurements. In PTP terms, the clockwork algorithm involves using a high volume (~10k/second) of peer delay requests, and discarding almost all of them. Only the ones that are deemed to be "pure" are used to adjust clocks.

By using a high volume of requests, they can actually average out a lot of the well-behaved jitter sources.

hansvm · on March 19, 2022

The problem is that a pure software solution can't distinguish between clock skew and forward/backward relative time differences. Consequently, no such solution can guarantee error better than RTT/2. If your RTT is 2 microseconds, it's impossible to guarantee synchronization within hundreds of nanoseconds without incorporating additional information, regardless of jitter.

Hence my initial question: are they adding extra information to actually achieve those stated goals, or are there algorithms just "probably" better, and in the latter case what are the use cases? Distributed transactions and whatnot are fundamentally broken if your "better" synchronization might still be wrong.

pclmulqdq · on March 19, 2022

You would enjoy reading the paper. They are making a few assumptions that turn out to be pretty good in a datacenter environment to simplify things. They are also using graph cycles to set clocks, which is a very different approach. My guess is that the precision of their approach comes at an accuracy cost, and the clocks are not particularly accurate.

hansvm · on March 21, 2022

You're right, I did enjoy reading the paper.

The graph cycles are neat, but they even admit in their own paper [0] that the approach is limited to half the max path asymmetry (RTT/2 if the asymmetry is totally unknown) -- pure software clocks have hard lower bounds on accuracy that can't be overcome without additional information (which digging elsewhere on their site it looks like they do actually integrate with sources like GPS antennas).

The rest of it is actually pretty interesting; in a datacenter context you might very well have low asymmetry, and everything else seems well done and likely to be much better than NTP for common scenarios.

[0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18...

mateo1 · on March 19, 2022

That was my fist thought. The second one is, who has any use for such precision. Other than companies that do hft of course.

dan-robertson · on March 19, 2022

Lots of NICs do hardware timestamping of incoming/outgoing packets these days.

bawolff · on March 19, 2022

Not trying to be dismissive but honest question: why would machine learning be good in this problem domain? Normally i think of machine learning as a good way to find patterns in high dimensional data, but clock skew doesn't seem like something that is high dimensional.

hansel_der · on March 19, 2022

> why would machine learning be good in this problem domain?

it doesn't have to be, market will decide.

bawolff · on March 19, 2022

I mean yes of course. The implicit underlying question i am asking is does their product make sense or is it a scam/snakeoil?

aejnsn · on March 19, 2022

How has Chrony not been mentioned here?

https://chrony.tuxfamily.org/

wmf · on March 19, 2022

Chrony is still NTP. Clockwork is aiming to be far more precise.

jakegold · on March 19, 2022

It's hard to get much more accurate than Chrony with NTP. Even real-world PTP implementations don't often aim for more accuracy than is possible with Chrony.

Because it turns out NTP can be much more accurate than most people realize.

From the Chrony FAQ[1]:

> When combined with local hardware timestamping, good network switches, and even shorter polling intervals, a sub-microsecond accuracy and stability of a few tens of nanoseconds might be possible

Good network switches and NICs with hardware timestamping support are commonplace now in server environments. NTP with Chrony is pretty hard to beat in terms of simplicity, reliability, and accuracy.

1. https://chrony.tuxfamily.org/faq.html

bitcharmer · on March 19, 2022

Could you provide examples of NIC models that can achieve that? So far I've only worked with gear that offers HW timestamps but that feature relies entirely on the availability of PTP signals in the first place.

mlichvar · on March 19, 2022

Some models that are known to work well with chrony are Intel I210, I350, and X550.

Those don't care about the protocol as they can provide hardware timestamps for all received packets.

Other popular NICs like the Intel X540 or XXV/XL710 are limited to timestamping of PTP event messages in order to limit the rate of timestamps which needs to be handled by the driver. For those chrony supports an NTP-over-PTP protocol which forces the hardware to trigger the timestamping by wrapping NTP messages in PTP.

Sub-microsecond accuracy is certainly possible. Here is an example with 3 network switches: https://chrony.tuxfamily.org/img/client-hwts-3switch-f323.pn...

The accuracy is limited by asymmetries in network switches.

In any case, whatever algorithms Clockwork is using with their protocol, I'm sure they could be used with NTP too. If additional information needs to be exchanged between the hosts, extension fields can be specified for that.

bitcharmer · on March 19, 2022

Thanks!

uvdn7 · on March 19, 2022

I am skeptical to say the least.

TrueTime, PTP, FB’s time card, all solve one problem that is to provide a bound where “true time” falls in. You need to be able to _guarantee_ the precision. If you say the true time lies between [T - d, T + d], it must be the case. The spanner paper provides data that the probability of TrueTime being wrong is less likely than random hardware failures (bit flip, etc.). Nothing is 100% in computer, but once you have like 20 9’s of reliability, our society has collectively accepted it as good enough (like we assume hash collision would never happen).

Now AFAIK, no machine learning model can come even close to 10 9’s of accuracy. Clock, to me, is a piece of foundational infrastructure that should provide a very solid and simple mental model so we can reason about it and build other things on top. I am skeptical that NTP + ML would work.

ram_rar · on March 19, 2022

I see this is a problem, but how many companies really need this? In my limited experience I have only seen clock sync issues at scale and simple NTP/Chrony setup solves majority of the problems, unless you're in HFT domain where PTP is needed. Nowadays, many of those atscale systems are getting SaaSified where the companies using them dont have to directly deal with such issues. What am I missing here? Is there really a big market for this?

altantiprocrast · on March 19, 2022

Sorry for the ignorance, but what is the point of getting this accurate in the datacenter (outside of scientific research and measurements I'd imagine)?

Wouldn't it be easier to just make distributed servers deal with large 'packets' or large individual tasks on their own?

paxys · on March 19, 2022

There's an entire class of problems where you need to synchronize application state between data centers. For that, if two conflicting requests show up in different data centers at the ~same time, different servers need to agree on which one came first, and for that they need to be on as close to the same time as possible. Even being off by milliseconds doesn't cut it when request volume is high.

dan-robertson · on March 19, 2022

Are there any numbers for what accuracy they are hoping to achieve?

If you don’t try hard you are likely to end up with something accurate to a few ms. AWS has some service which can get you synced to around 300 mics in normal conditions. But 300mics is pretty bad.

Doesn’t this company basically require their customers to not use cloud providers or have they figured out how to get good clock sync despite cloud provider networks? It seems limiting if they don’t work in the cloud.

I think the most common problem we have with clock sync (at least the most common problem I see) come from overloaded network cards slowing down timekeeping packets. I wonder if that’s much of a problem with this company’s solution.

daniel-cussen · on March 19, 2022

This startup genuinely sounds awesome.

I played a lot with an asynchronous chip, meaning it didn't have cores, it had computers, and they didn't have a frequency. Then I connected it to an oscillator (it's easy, just lay one wire) and was getting sub-nanosecond overall measurements with no jitter.

And it's like, why can't you hard-code that in? Like in your test suite, test the code and see if it's as fast as it's supposed to be, or so fast it's clearly not doing the work.

teleforce · on March 19, 2022

One of the very few books dealing with timing and synchronization in service providers, servers and enterprise applications.

Synchronizing 5G Mobile Networks:

https://www.ciscopress.com/store/synchronizing-5g-mobile-net...

gfd · on March 19, 2022

I remember googling how to sync time on aws instances before and was surprised that they use atomic clocks: https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-ti...

mhb · on March 19, 2022

What is the relationship between this service and the advent of cheap (<$2K), small atomic clocks?

radicality · on March 19, 2022

I did read the article and do see the work FB open sourced linked (100 microsecond accuracy), but will link directly to the FB blog here for anyone interested: https://engineering.fb.com/2021/08/11/open-source/time-appli...

I’m curious, who is even the target customer here? Places like FB and Google are working on the problems themselves.

They talked about some “latency sensei” product or something but what is it gonna tell me? How many nanoseconds it took for my API request to go from a load balancer to the web server, with 5 ns precision? Is that precision really needed?

zeckalpha · on March 19, 2022

My guess is they are aiming to exit with a cloud provider that isn’t doing this stuff themselves yet, or one that doesn’t want their competitors to acquire this team.

CoastalCoder · on March 19, 2022

I've vaguely wondered what it means for clocks to be in synch, when talking about time differences much smaller than the time it takes for light to travel between the various clocks.

dharmab · on March 19, 2022

If the clocks are programmed to know their distances from each other, you can account for that.

Dylan16807 · on March 19, 2022

You can, but is there any practical effect?

maronato · on March 19, 2022

Imagine that the same message reaches more than one servers. Which one should record it? If they all are in sync you just use the one that received the event at the lowest timestamp

Dylan16807 · on March 19, 2022

Now imagine the same scenario but one of the servers is eight feet to the left.

Does it matter that a different server picked up the message first?

Now what if the server just thinks it's eight feet to the left because the clock is out by several nanoseconds?

Can't you just ignore that, and still use the server that got the lowest timestamp?

What's the practical difference?

When the servers can communicate in x amount of time, the question was specifically about clock drift "much smaller than" x.

mprovost · on March 19, 2022

This reminds me of djb's clockspeed, but with ML. Not a bad idea.

https://cr.yp.to/clockspeed.html

MichaelRazum · on March 19, 2022

I think the idea is great. Since I had really bad experience with NPT and it was not trivial to sync the clocks. BUT just a bit surprised that they raised so much money.

jiangplus · on March 19, 2022

Is there any public papers about this topic? For high precision clock sync for the datacenter.

evol262 · on March 19, 2022

> “Currently, nobody uses time except for maybe Spanner at Google, CockroachDB or someone doing database things,” Rosenblum said. “We believe that there’s a lot more places, especially as more and more time-critical things came up. We can do time sync, since we figured out how to do that pretty well. And so we asked: is this part of a trend where we’re going to start programming these systems differently? And [researchers] got kind of excited about that possibility of us being able to pull this off.”

Who gave this guy who's apparently never heard of kerberos, ceph, SAML, or any other technology the rights to interview? Sure, their windows are larger, but "nobody uses time" is so clueless it blows the mind, and it's hard to imagine who decided that "NTP, but machine learning" is a $21M idea

paxys · on March 19, 2022

None of the services you mentioned need even close to nanosecond precision between servers to operate. Applications that do (other than the ones in the article like Spanner) are concentrated in supercomputing/scientific computing and spend a LOT of money on the problem. So yeah if you are letting your clocks drift by multiple seconds you aren't "using time" in any real sense.

If these folks can achieve what they want at a couple orders of magnitude cheaper than current prices then you are absolutely going to see a lot more regular use cases show up to take advantage of it.

bitcharmer · on March 19, 2022

> None of the services you mentioned need even close to nanosecond precision between servers to operate. Applications that do (other than the ones in the article like Spanner) are concentrated in supercomputing/scientific computing and spend a LOT of money on the problem

No, not really. I work in finance (HFT, systematic market making) and this industry heavily relies on high precision clock sync. It's even regulated by law (MiFID II); in practice everyone on the street uses PTP because if you don't know how fast you're really going or where your latency spikes you already lost the race, just don't know about it.

There's a couple of other domains where micro- and nanosecond level time sync is of paramount importance. PTP has become much cheaper over the years especially when you are in a data centre and can get it as a service instead of setting up grand-masters, slaves, GPS antennas, etc.

This statement just reads like badly researched topic on their part.

lrhegeba · on March 19, 2022

especially the Huygens-implementation delivering accuracy in nanoseconds could be some part of its secret sauce (https://en.wikipedia.org/wiki/Clock_synchronization#Huygens)

pclmulqdq · on March 19, 2022

They can't do better than a microsecond or so without hardware timestsmps. There aren't a lot of applications that seem to need anything between milliseconds and nanoseconds. MiFID II did open a nice market in finance, but many of those companies use PTP anyway.

I considered starting a competitor in 2018 when I first saw this company, but I don't have the same connections to customers that these guys do.

My conclusion was that PTP precision with software timestamps would be a good company, but not something between. I hope they can prove me wrong!

dan-robertson · on March 19, 2022

My understanding is that PTP is neither necessary nor sufficient. The main advantage of it is that some hardware will only support adding hardware timestamps to PTP messages (ie you want to use the time packets left/arrived at your network card rather than the time you processed them in the kernel/userspace). But I think the specialness of PTP can lead to dealing with a lot of bugs from bad implementations.

Gelob · on March 19, 2022

Telco workloads and 5g require precise time using technologies like ptp. The cheaper, more commodity those solutions get the better.

_3u10 · on March 19, 2022

If only these devices had a clock capable of measuring time to within 1 nanosecond.

If one has such an accurate clock they might be able to accurately plot their position on earth to within 1 ft merely by checking the timed pulses of orbiting satellites.

jazzyjackson · on March 19, 2022

I guess you're downvoted for being sarcastic or maybe people don't know 20 dollar GPS chips provide a ~10 nanosecond resolution clock

SAI_Peregrinus · on March 19, 2022

And GPS-disciplined oven controlled crystal oscillators (OCXOs) provide some of the best frequency accuracy and stability around. They range from about a hundred dollars up through a few thousand, depending on phase noise.

_3u10 · on March 19, 2022

Haha I have no idea. But yeah just call the GPS algo a Time AI and voila, $21 m in funding to go get some kids in shenzen to put a GPS chip on a USB, PCIe, NVME slot.

Hey look we got a GPU to learn the Kalman filter algo.

Many are actually better than 10 ns basically a 1 foot CEP is 1 ns. If they are fixed and always on the accuracy it can build over a day is incredible.

jazzyjackson · on March 20, 2022

granted it does need an unobstructed view of the sky, so getting it on a PCI-e board would be an innovation (:

krisoft · on March 19, 2022

> So yeah if you are letting your clocks drift by multiple seconds you aren't "using time" in any real sense.

If you arrive to the railway station 10 minutes +-2 minutes before your train leaves you are using time. In a very real sense. In robotics we sync multiple computers on the robot to about miliseconds, that is using time. In a real sense.

Maybe more accurate, cheaper sync will enable more applications. Maybe it is a good business to specialise on it. But saying that nobody excep for Google, CockroachDB or someone doing database things uses time is bulshit and can be called out for what it is.

ctvo · on March 19, 2022

> If you arrive to the railway station 10 minutes +-2 minutes before your train leaves you are using time. In a very real sense. In robotics we sync multiple computers on the robot to about miliseconds, that is using time. In a real sense.

Why are you in a very specific technical discussion correcting what is obvious? Yes, you're using time. You're not using time at the precision this company is targeting. At that precision, there are few current use cases, but if they make it cheaper, there will be more. What's there to argue here? You're offended you're not being included as users of time?

"We're lunching more accurate satellite imaged maps. The current users are largely nation states and niche industries due to its cost. This will enable more map use cases"

I use Google Maps all the time. I used it for my last trip! How dare you.

krisoft · on March 19, 2022

> You're offended you're not being included as users of time?

Please refrain from personal attacks. It is unecessary and doesn’t add to the conversation. Thank you.

> I use Google Maps all the time. I used it for my last trip! How dare you.

Do note that your example didn’t say “nobody uses maps”. If an imaginary salesperson would say “nobody uses maps” I would call bulshit on that too.

> Yes, you're using time.

Great, so the company representative should simply not say “nobody is using time”. And the commenter I was responding to shouldn’t say that applications which are fine with a lower accuracy “aren't ‘using time’ in any real sense”.

Heck, all I know maybe nanosecond accuracy is where the bees knee is, and everyone is going to be amazed by the awesome new applications it is going to enable. But you don’t need to disregard all the history of timekeeping and all the current applications to make that point.

On 28 April 1789 a mutiny broke out on HMS Bounty. The ship’s former captain, departing on a rowboat, demanded that the mutineers give him K2 the ship’s chronometer. The mutineers refused, because the chronometer was worth about as much as the whole ship, and they needed it for their onward navigation. Somebody should have told them that they are not using time in any real sense. Probably they would have laughed at that idea.

gnufx · on March 19, 2022

What are those applications? They've passed me by unless that's meant to cover large-scale physics measurements of some sort. I don't understand how it would even make sense in mainstream supercomputers. It should be good to synchronize OS scheduling in HPC nodes to minimize jitter in tightly-coupled applications, but that's not ns stuff and I haven't come across it actually being done (though Bull had some support as free software).

I don't see how something like Kerberos or TOTP, that fails without some sort of synchonization, can be defined as not "using time" in a real sense. (If cluster nodes were drifting by seconds, I'd be checking them, even if that was only going to affect something like make.)

ignoramous · on March 19, 2022

Google uses precision time (SunDial) for observability, telemetry, congestion control too: https://www.youtube.com/watch?v=GEXP68yNPPM

Saris · on March 19, 2022

Makes me curious how accurate a GPS clock is, they must be pretty good given the timing precision needed for accurate positioning.

WinterMount223 · on March 19, 2022

For GPS precision is much more important than accuracy.

Dylan16807 · on March 19, 2022

Precision vs. accuracy of what measurement?

For GPS to work at all, you need very accurate knowledge of the offset between each satellite's signal.

The only way I can think of to have precision but not accuracy is if you design a system where there's a buffer between the antenna and the signal processing and you don't know how long the buffer is. Is a design like that a practical concern?

karpierz · on March 19, 2022

They're saying that as a receiver of GPS signals, it doesn't matter if your clock is out of sync with everyone else's clocks, as long as it measures the length of a nanosecond accurately.

Dylan16807 · on March 19, 2022

But GPS gives you a clock signal.

WinterMount223 · on March 19, 2022

Of UTC. Theoretically GPS can work out of sync with respect to UTC as long as all satellites are in sync among themselves.

Dylan16807 · on March 19, 2022

Oh, you're talking about a theoretical situation where the GPS satellites are reprogrammed to use a different synchronization source? Personally I'd still call that highly accurate, but to a different time standard. But I see where you're coming from.

ahazred8ta · on March 19, 2022

With specialized GPS receivers, about +/-2ns. Closer to 40ns with commercial receivers.

nightfly · on March 19, 2022

statum-1 time server

galeaspablo · on March 19, 2022

Yup. Going back to the principles of distributed systems, I’m really wondering how “NTP but machine learning” can be used to minimize the clock skew to hundreds of nanoseconds with software only. The RTT between two servers can vary a lot — and way past the calculated minimum RTT.

The figures (eg $21M) and names dropped (eg Stanford) are an appeal to authority, which does make me curious.

I’d love to see some papers, I went through all of Balaji Prabhakar‘s publications (titles only), and didn’t see a single paper that sounded like “NTP but machine learning.

If anyone else knows more about this, I’d love to hear from you. Surely $21M doesn’t get dropped with at least someone doing due dill on the tech?

kloading · on March 19, 2022

I also haven't seen any work of his using ML, that may just be a buzzword thrown in the PR release. But if anyone's interested, I believe this is the paper alluded to in the article:

Clock synchronization: https://www.usenix.org/conference/nsdi18/presentation/geng

I think the TechCrunch article doesn't really explain their application of clock synchronization well. Here are the other relevant papers and my attempt at explaining the general idea below.

Network edge-based measurement (one-way delay?): https://www.usenix.org/system/files/nsdi19-geng.pdf

Congestion Control: https://www.usenix.org/system/files/nsdi21-liu.pdf

Each paper I've listed builds on the last. Their method to synchronize clocks made accurate and efficient measurement of one-way delay possible with commodity hardware. This measurement of one-way delay occurs at the edge, allowing them to "hold" incoming packets at the edge for extremely small periods to reduce congestion (latency) while maintaining throughput. From my understanding, traditional congestion control algorithms require rich telemetry from the entire network, which is likely not accessible in a public cloud environment. Balaji and clockwork's algorithms only need to make these measurements from the edge (which customers in public cloud have access to).

I'm curious to see how all this will scale for multi-region deployments. If the latency between VMs from region 1 and region 2 is significant, I wonder if the measurement will actually be useful in deciding to "hold" the packets.

danzheng · on March 19, 2022

I'm from the Clockwork team, thanks for listing the relevant papers.

Accurate clocks sync enables true one-way delay measurements (instead of RTT/2), this allows for edge-based network visibility. We launched Latency Sensei beta – a sensor, monitor and auditor that provides visibility into cloud deployments. The gallery has cloud fitness reports on GCP, AWS and Azure. Some interesting reports include: 1)how VM colocation impairs network bandwidth, and 2) tale of 2 cloud regions, London vs Singapore. Take a look and we'd love to get some feedback https://sensei.clockwork.io/user/gallery/

On Congestion control, an edge-based solution is coming soon. If you're interested in a private beta, email us at hello@clockwork.io

galeaspablo · on March 19, 2022

This is exactly what I was looking for. Thank you. It’s a shame fundamental publications aren’t part of PR articles. I’d also love to see better details on their website. Oh well, I cant expect everything to work like academia — c’est la vie.

shiftingleft · on March 19, 2022

> I also haven't seen any work of his using ML, that may just be a buzzword thrown in the PR release.

The first article you reference includes something about Support Vector Machines (SVMs).

insaneirish · on March 19, 2022

> The figures (eg $21M) and names dropped (eg Stanford) are an appeal to authority, which does make me curious.

They have the 'names' because they are a reputable group of people. They have been at this problem for a while, first with extensive research that required clock sync as a pre-requisite, and then to evolving into clock sync as a formidable problem unto itself.

Clockwork is a rename of the company as far as I can tell. Their original name (Tick Tock Networks) [1] was probably too close to what has become a very popular homophone.

[1]: https://opencorporates.com/companies/us_de/6947372

galeaspablo · on March 19, 2022

Sure, but ‘names’ don’t explain how they do what they claim to do. Another reply kindly provided references to publications (which I couldn’t find myself) — they made for great reading.

You seem to know their work, so if you have further publications, I’d love to get them please.

insaneirish · on March 19, 2022

Here's a talk Balaji Prabhakar gave: https://www.youtube.com/watch?v=YQSA99nrFEU. Looks like the clock sync part starts about 30 minutes in.

Haven't watched this one, but might be complementary: https://www.youtube.com/watch?v=Opf9CBwP5R8.

unmole · on March 19, 2022

> Surely $21M doesn’t get dropped with at least someone doing due dill on the tech?

Juicero raised $120M.

charcircuit · on March 19, 2022

Juicero only raised $18.5M in their series A.

unmole · on March 19, 2022

jacquesm · on March 19, 2022

> Surely $21M doesn’t get dropped with at least someone doing due dill on the tech?

Just some words:

Theranos.

Ubeam.

Moller.

And many, many others. Investors do this all the time. FOMO + 'wouldn't it be nice' are powerful ways to part fools from their money.

isatty · on March 19, 2022

> Surely $21M doesn’t get dropped with at least someone doing due dill on the tech?

You’d be surprised, I was.

ahazred8ta · on March 19, 2022

From the article, time synchronization is only a small part of what they do. Their Big Thing is traffic shaping and latency management within and between datacenters.

galeaspablo · on March 19, 2022

Those products are built on time synchronization. E.g., a latency measurement needs a _true_ difference between the timestamps of two servers.

nikanj · on March 19, 2022

To be fair, their pitch deck did say both Machine Learning and Stanford

chillfox · on March 19, 2022

Reminds me of early in my career where I encountered a home made time sync system because the creator had never heard of NTP.

stevekemp · on March 19, 2022

20 years ago I went on a ridealong with a bunch of men who were changing the times on the clocks installed in a bunch of church towers.

It was fascinating to get to watch the process. I'd always taken it for granted the big hotels, railway stations, and churches would often have big clocks you could see from a distance and that they'd be self-correcting. But of course a lot of old clocks are purely mechanical so when the time changed an hour forwards/backwards somebody would need to physically change them.

pmontra · on March 19, 2022

Even the clock in my car (2016) doesn't self correct. It goes ahead of about 1 minute every month.

jiveturkey · on March 19, 2022

How so? Clearly these guys have heard of NTP. They are talking about ns precision systems, not ms.

jrowley · on March 19, 2022

Exactly! what would Leslie lamport say? ;)

https://lamport.azurewebsites.net/pubs/time-clocks.pdf

unfocussed_mike · on March 19, 2022

> it's hard to imagine who decided that "NTP, but machine learning" is a $21M idea

Bless you. Are you younger than 40? :-)

Tech people who are over about 40 have, alas, no trouble imagining this at all!

oefrha · on March 19, 2022

Epitome of garbage HN drive-by dismissal: pick a seemingly stupid sentence from the article, quote it out of context, conclude that author/subject expert is clueless about <insert basic things>. Usually just a case study of the Dunning-Kruger effect.

evol262 · on March 19, 2022

Example of a garbage HN drive-by comment: this.

I've spent a _long time_ working in distributed systems with components which are "close to the metal". There are use cases for this, many of which are better solved by "hardware timestamping NICs" and "run a stratum 1/2 NTP server with high locality (and if you're geo-distributed, run multiple on systems dedicated to that purpose, because ensuring you're within nanoseconds of _someone else's_ infrastructure is generally an order or magnitude less important than internal coherence).

That was the best possible quote from the author/subject trying to explain the use cases. Did you even read TFA? It's completely unclear from their article what the intended uses cases are, what the problem space is, and how much working knowledge they have of existing solutions.

Naming people who invested in other companies who are investing in this one and showing what looks like a dashboard for a timekeeping solution is normal TechCrunch garbage, but that quote was a step above.

Comments like yours (going all the way back to /.) days really just tell me that you didn't read the article, and it makes it look like your primary goal is sophistry.

Nextgrid · on March 19, 2022

Given how much complete garbage has been VC-funded (or rather, overfunded, since the correct amount of funding should've been $0) it's good to remain skeptical.

oefrha · on March 19, 2022

If your way of remaining skeptical is opening the HN discussion and jumping into the first crappy hot take concluding that all people involved are clueless idiots without even skimming the article, your skepticism doesn’t mean much.

mc4ndr3 · on March 21, 2022

just make CSAC's cheaper