
How and why the leap second affected Cloudflare DNS - jgrahamc
https://blog.cloudflare.com/how-and-why-the-leap-second-affected-cloudflare-dns/
======
HappyTypist
Time smearing should be standard. It's not perfect, but when Google, Amazon,
and Microsoft all evaluate and accept that a unit of time may differ by 11 ppm
for 20-24 hrs every number of years; maybe people can stop being so anal about
smearing and end up with bugs like these.

------
NelsonMinar
Excellent postmortem. They blame their code and Golang, but another problem is
that their kernel ran time backwards to implement the leap second. That's
really not the right way to do leap seconds, but is a common kludge because
POSIX defines days as always having exactly 86,400 seconds. Details on that:
[https://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_n...](https://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_number)

~~~
zigzigzag
It's not the kernel's fault. The kernel provides several clocks, one of which
is defined to be monotonic. However Go does not expose that clock.

------
esailija
I habitually do <= checks for things that should never be below 0 because I'm
paranoid like that. I don't think it's good practice, but I keep doing it.

------
zigzigzag
Alex Forster wins:

[https://news.ycombinator.com/item?id=13294746](https://news.ycombinator.com/item?id=13294746)

The Golang bug discussion is kind of lame. The Cloudflare post-mortem doesn't
really get into the root cause issue which is that mission critical
infrastructure appears to be written in a language that isn't actually
suitable for it. Isn't this the _exact_ use case Go was designed for? And yet
design problems with the standard library led to an outage on a major,
important service?

The conclusion simply says they are "inspecting their service" to find other
leap second bugs. But leap second bugs are not a new or unanticipated problem.
I'd have liked to see a deeper analysis that gets to the root of why an
apparently immature language that doesn't provide basic systems programming
tools was chosen for something as vital as a DNS server.

------
LukasRos
An experience that, I assume, a lot of developers have made: massive amounts
of code and a complex systems architecture and when things go wrong it's
because of a single line of code or even a single character (< instead of =)!

------
facorreia
Leap seconds must go before they cause real damage.

