
Making every (leap) second count with our new public NTP servers - scommab
https://cloudplatform.googleblog.com/2016/11/making-every-leap-second-count-with-our-new-public-NTP-servers.html
======
liotier
“Leap Smearing must not be used for public-facing NTP servers” -
[https://tools.ietf.org/html/draft-ietf-ntp-
bcp-02](https://tools.ietf.org/html/draft-ietf-ntp-bcp-02)

~~~
klodolph
Wow, that's a really boneheaded thing to put in a standard. I think we can all
agree that it's important to make leap smearing available for those who want
to use it, especially considering the bugs in leap second handling for common
NTP clients.

~~~
paulajohnson
I disagree. The point of NTP, and of time services in general, is that
everyone agrees about the time. If an organisation wants to use non-standard
time it can, but public-facing NTP servers should all agree and all provide
the standard time. Google, for whatever reasons, is making its NTP servers
deliberately wrong, and there is no mechanism in NTP for a server to say "I'm
using time-smearing". So they shouldn't be doing this on public-facing NTP.

~~~
klodolph
Then NTP has already failed. Most systems are already incapable of agreeing on
whether it is 23:59:59 or 23:59:60 on days with leap seconds. There is simply
not an API that will let you distinguish the two.

It is better to be deliberately wrong in a controlled fashion than to be
accidentally wrong because you never expected your clock to be non-monotonic.
You seem to be arguing for the status quo, are you aware of just how deeply
broken the status quo is?

~~~
creshal
What is your definition of "most systems"? Because we had very few (if
somewhat high-profile) leap second bugs since its introduction in 1972.

~~~
klodolph
Unix, for example. That's a pretty big example. Look at gettimeofday.
Completely incapable of handling leap seconds in any reasonable way, except if
you use smoothing.

Windows, for example. That's another pretty big example. Just ignores the leap
second bit and goes backwards at the next synchronization.

I'm not even talking about bugs here—these are straight up design flaws.

~~~
JdeBP
There are actually two reasonable ways of handling leap seconds with
gettimeofday(). The first, which is in actual use by a range of people, is to
define that the kernel time is actually a TAI-10 count not a UTC count. Arthur
David Olson's "right" timezone system does this. The second is to allow the
microseconds count to go up to 2,000,000.

* [http://www.madore.org/~david/computers/unix-leap-seconds.htm...](http://www.madore.org/~david/computers/unix-leap-seconds.html#tai-minus-10)

~~~
klodolph
I wonder how many clients handle that correctly. How many log files will have
timestamps at "23:59:59.1500" instead of "23:59:60.500"? If you are going to
break APIs you might as well make a new one instead.

And if you replace a simple API with one that requires distributing leap-
second tables…

~~~
creshal
> And if you replace a simple API with one that requires distributing leap-
> second tables…

Not much worse than the distributed time zone tables we already need to update
thrice a year. At least leap seconds aren't decided on by politicians.

------
brandmeyer
> Instead of adding a single extra second to the end of the day, we'll run the
> clocks 0.0014% slower across the ten hours before and ten hours after the
> leap second, and “smear” the extra second across these twenty hours.

Holy leaping second, batman! Unilaterally being off by up to a half second
from the rest of the world's clocks is a pretty aggressive step. I think I
would have preferred to see a resolution made by an independent body on
something this drastic.

~~~
JoshTriplett
Smearing leap seconds does make sense, but it's an odd step to take
unilaterally, rather than coordinating with other NTP servers and with Linux
timekeeping (which currently handles leap seconds via a 61-second minute
instead).

~~~
ac29
>Linux timekeeping (which currently handles leap seconds via a 61-second
minute instead)

Google doesn't think so: "No commonly used operating system is able to handle
a minute with 61 seconds"

~~~
tyldum
One of the big problems is application support. How many will break by seeing
60 as current second as opposed to 59 twice?

~~~
m45t3r
I am sure tons of applications that use gettimeofday() to keep track of time
can break in subtle ways when seeing 59 twice. Of course, they're broken
considering that there is clock_gettime(), however this is a POSIX interface
that is not really monotonic too by default, and the monotonic versions of it
are Linux-only implementations.

~~~
JoshTriplett
> I am sure tons of applications that use gettimeofday() to keep track of time
> can break in subtle ways when seeing 59 twice.

gettimeofday doesn't return hour/minute/second divisions; it just returns
seconds/microseconds since the epoch. Functions like strftime and gmtime
handle the components of time. And leap seconds don't make applications see 59
twice; they make them see 60 once (58, 59, 60, 0, 1, ...).

Quoting the manpages for gmtime and strftime:

> tm_sec The number of seconds after the minute, normally in the range 0 to
> 59, but can be up to 60 to allow for leap seconds.

> %S The second as a decimal number (range 00 to 60). (The range is up to 60
> to allow for occasional leap seconds.) (Calculated from tm_sec.)

------
nullc
I predicted this for leap smear a while back-- we have time sync because
having systems with different times is a source of problems... logical fix:
get them onto the same time.

Smear is a workaround for those who care about phase alignment but don't care
about frequency error. ... and who don't need to exchange times with anyone
else. This last point reduces the set to no one, since it can't extend to
everyone (some parties care a lot more about frequency error than phase
error!).

This circus is enhanced by NTP's inability to tell you what timebase it's
using (or, god forbid, offsets between what its giving you and other
timebases...)

It's going be especially awesome when NTP daemons with both smear and non-
smear peers get both the smear frequency error AND get a leap second.

I for one welcome this great opportunity for an enhanced trash fire to help
convince the world that we need to stop issuing leap seconds. (It's absurd--
causes tens of millions in disruption easily, -- and it would take 4000 years
to even drift an hour off solar time, at which point timezones could be
rotated if anyone really cared).

~~~
detaro
> _Smear is a workaround for those who care about phase alignment but don 't
> care about frequency error. ... and who don't need to exchange times with
> anyone else. This last point reduces the set to no one, since it can't
> extend to everyone (some parties care a lot more about frequency error than
> phase error!)._

I don't quite understand that point. E.g. the typical web server doesn't have
much of a need to exchange precise time with others. HTTP, TLS, ... require
timestamps, timestamps are shown to users occasionally, but as long as they
are roughly right that is enough. As long as all internal systems work off the
same standard it is fine. Which seems to be the reasoning under which Google
choose to use it, even though one might argue that with their cloud offerings
they are not as insular.

------
leephillips
A lot of interesting geophysics in the unpredictable need for leap seconds. I
mention Google's "smearing" approach here:

[http://arstechnica.com/science/2016/04/the-leap-second-
becau...](http://arstechnica.com/science/2016/04/the-leap-second-because-our-
clocks-are-more-accurate-than-the-earth/)

------
leni536
Why the hell aren't time servers and clients sync to TAI instead? Dealing with
leap seconds should be a client side problem.

~~~
russdill
Yup, it seems awesome, but software needs to be written to handle it properly.
I think it'd work no problem with software already written against monotonic
clocks, but everything else would probably need some fixing.

~~~
paulajohnson
The problem is that time_t (seconds since 1970) implicitly assumes 86400
seconds per day. You would have to redefine time_t and rewrite every piece of
code that uses it.

~~~
kazinator
Leap seconds is what allows that assumption to work.

Leap seconds exist only in real time, not in historic recorded time.

There are in fact 86400 "calendar seconds" in a day, exactly.

Essentially, when a day is done, we _call_ it 84600, even though it's actually
86400.epsilon.

Only special applications need to know the exact physical number of seconds
between two calendar times, rather than the calendar seconds.

------
antoncohen
For people talking about Google unilaterally doing this, it has been common to
smear the leap second for the last couple years. Usually companies do it
internally by having their NTP servers skew time, either with Chrony or `ntpd
-x`. Standards bodies have not been able to react quickly enough to the need
to smear the leap second in a consistent way. I'm thankful that Google has
decided to run public NTP servers with consistently smeared leap seconds.

Here are two Red Hat articles on how to deal with the leap second, from 2016
and 2015:

[https://access.redhat.com/articles/15145](https://access.redhat.com/articles/15145)

[http://developers.redhat.com/blog/2015/06/01/five-
different-...](http://developers.redhat.com/blog/2015/06/01/five-different-
ways-handle-leap-seconds-ntp/)

~~~
JdeBP
I hope that there are people on standards bodies who remember or learned what
it was like before UTC when civil time seconds _were not_ one SI second long,
and in effect "smearing" happened all the time.

------
newman314
Does anyone know if Google has open sourced the time smearing algorithm?

~~~
detaro
They discuss various ways of smearing here, but I haven't seen their actual
implementation code:
[https://developers.google.com/time/smear](https://developers.google.com/time/smear)

