To me that looks like they are reinventing NTP, but not addressing all the issues of PTP.
A big problem with the PTP unicast mode is an almost infinite traffic amplification (useful for DDoS attacks). The server is basically a programmable packet generator. Never expose unicast PTP to internet. In SPTP that seems to be no longer the case (the server is stateless), but there is still the follow up message causing a 2:1 amplification. I think something like the NTP interleaved mode would be better.
It seems they didn't replace the PTP offset calculation assuming a constant delay (broadcast model). That doesn't work well when the distribution of the delay is not symmetric, e.g. errors in hardware timestamping on the NIC are sensitive to network load. They would need to measure the actual error of the clock to see that (the graphs in the article seem to show only the offset measured by SPTP itself, a common issue when improvements in time synchronization are demonstrated).
I think a better solution taking advantage of existing PTP support in hardware is to encapsulate NTP messages in PTP packets. NICs and switches/routers see PTP packets, so they provide highly accurate timestamps and corrections, but the measurements and their processing can be full-featured NTP, keeping all its advantages like resiliency and security. There is an IETF draft specifying that:
An experimental support for NTP-over-PTP is included in the latest chrony release. In my tests with switches that work as one-step transparent clocks the accuracy is same as with PTP (linuxptp).
I listened to a fascinating podcast from Jane Street on their solution.
Pasted the relevant part
https://signalsandthreads.com/clock-synchronization/
--quote
"So, we’re trying to build a proof of concept. At the end of the day, we sort of figured, “All right. We have these GPS appliances.” We talked about hardware timestamping before on the GPS appliances, and how they can’t hardware timestamp the NTP packets, so that’s problematic. We thought, “How can we move time from the GPS appliances off into the rest of the network?” And so we decided that we could use PTP to move time from the GPS appliances to a set of Linux machines, and then on those Linux machines we could leverage things, like hardware timestamping, and the NTP interleaved mode to move the time from those machines onto machines further downstream.
The NTP interleaved mode, just to give a short overview of what that means… when you send a packet if you get it hardware timestamps on transmission the way you use that hardware timestamp is you get it kind of looped back to you as an application. So I transmit a packet, I get that hardware timestamp after the packet’s already gone out the door. That’s not super useful from an NTP point of view, because really you wanted the other side to receive that hardware timestamp, and so the interleaved mode is sort of a special way in which you can run NTP and that when you transmit your next NTP packet you send that hardware timestamp that you got for the previous transmission, and then the other side, each side can use those. I don’t want to get into too much of the details of how that works, but it allows you to get more accuracy and to leverage those hardware timestamps on transmission."
-- end quote
SPTP does look a lot like NTP over PTP. I’m guessing they deployed this last year when nothing like it existed - the ietf draft (dated Jan 24, 2024) came much later after. They might even be involved with it. Anyways, it’s nice to see progress towards a simpler protocol that retains the precision of PTP.
> TIL there's a regular heartbeat in the quantum foam; there's a regular monotonic heartbeat in the quantum Rydberg wave packet interference; and that should be useful for distributed applications with and without vector clocks and an initial time synchronization service
> A big problem with the PTP unicast mode is an almost infinite traffic amplification (useful for DDoS attacks). The server is basically a programmable packet generator. Never expose unicast PTP to internet. In SPTP that seems to be no longer the case (the server is stateless), but there is still the follow up message causing a 2:1 amplification. I think something like the NTP interleaved mode would be better.
Facebook has little concern for traffic amplification that doesn't affect them. I can't find a source article for it now, but there was a time when you could take down a website hosting an image by simply posting <URL>/?<RANDOM>. I believe Facebook's (many) cache servers would individually make requests to the server until they inevitably saturated the image host's connection. I remember people complaining and it falling on deaf ears.
but this is not about facebook, this discussion is about the protocol
given how this industry protocols work, is likely that other big corporations that run data centers are also part of the real protocol discussion, some of those will be corncerned about traffic amplification
A big problem with the PTP unicast mode is an almost infinite traffic amplification (useful for DDoS attacks). The server is basically a programmable packet generator. Never expose unicast PTP to internet. In SPTP that seems to be no longer the case (the server is stateless), but there is still the follow up message causing a 2:1 amplification. I think something like the NTP interleaved mode would be better.
It seems they didn't replace the PTP offset calculation assuming a constant delay (broadcast model). That doesn't work well when the distribution of the delay is not symmetric, e.g. errors in hardware timestamping on the NIC are sensitive to network load. They would need to measure the actual error of the clock to see that (the graphs in the article seem to show only the offset measured by SPTP itself, a common issue when improvements in time synchronization are demonstrated).
I think a better solution taking advantage of existing PTP support in hardware is to encapsulate NTP messages in PTP packets. NICs and switches/routers see PTP packets, so they provide highly accurate timestamps and corrections, but the measurements and their processing can be full-featured NTP, keeping all its advantages like resiliency and security. There is an IETF draft specifying that:
https://datatracker.ietf.org/doc/draft-ietf-ntp-over-ptp/
An experimental support for NTP-over-PTP is included in the latest chrony release. In my tests with switches that work as one-step transparent clocks the accuracy is same as with PTP (linuxptp).