
The trouble with timestamps - sethev
http://aphyr.com/posts/299-the-trouble-with-timestamps
======
zheng
A general thanks to aphyr for exploring all of these kinds of issues in
distributed datastores at a level where someone without a lot of database
knowledge can understand and reason through. If anyone hasn't read his Jepsen
series and is interested in these kinds of things, it is well worth a read.

Of course, I wouldn't read it if you store very important data in any of the
datastores he talks about, you might be scared to learn how your system
actually operates =).

P.S. - The Spanner link is broken, you have some invisible character being
added to the end, gets encoded to %E2%80%8E

------
dmk23
A little surprising that this article does not mention HBase which uses
timestamps as part of the persisted key, while using an entirely different
concept of a "WriteNumber" as an internal ID to resolve conflicting updates
(MVCC).

This presentation might provide a helpful overview:
[http://www.slideshare.net/cloudera/3-learning-h-base-
interna...](http://www.slideshare.net/cloudera/3-learning-h-base-internals-
lars-hofhansl-salesforce-final)

The relevant discussion starts at Slide 16

~~~
aphyr
WriteNumbers are a logical clock provided by (last I checked) a single
coordinating HBase node, which is a totally valid and commonly used way to
order operations. For the purposes of this article, though, I'm trying to
focus on LWW data models and wall clocks. A discussion of every database's
approach to ordering would take a little longer. ;-)

------
thaumasiotes
So last time the leap second issue came up, I learned that it's an "issue"
because (as this article mentions) POSIX defines one day as exactly 86400
seconds. That's clearly incorrect; why do we still want to keep that
definition around?

~~~
aphyr
It's a complex problem. The best overview I know of is here:

[http://www.ucolick.org/~sla/leapsecs/onlinebib.html](http://www.ucolick.org/~sla/leapsecs/onlinebib.html)

~~~
thaumasiotes
Well... that page refers several times to an apparently heated dispute over
whether "day" shall be defined solely with reference to cesium atoms, or
whether "day" shall be defined with reference to the rotation of the earth. It
notes the clash noted in the article between wanting to know time elapsed vs.
wanting to know current absolute time.

I don't see that that speaks particularly to my question. That clash only
exists because of a background assumption that a day must necessarily consist
of 86400 seconds with unique names. Then, when an 86401-second day comes
along, the elapsed-time people say we should give them all unique names, and
the absolute-time people appear to believe that that would be a catastrophic
failure of the principle that there are 86400 of them. But that's not actually
a catastrophe. It's easy to determine absolute time given time elapsed from
another absolute time.

We could analogously ask whether "year" should be defined with reference to
the orbit of the earth around the sun, or solely by reference to the rotation
of the earth, and in fact this was a historical dispute the wrong end of which
persists today in the russian orthodox calendar. But we all know the russian
orthodox calendar is wrong. It wasn't difficult to say that a year is usually
365 days but sometimes 366. It's also not difficult to say that a day is
usually 86400 seconds but sometimes 86401... and all the technical arguments
for why that shouldn't be the case seem (to me) to be equally applicable to
the variable year "problem". What's the value of keeping the POSIX standard
out of touch with reality, just because it was originally written to be out of
touch with reality?

The state of the art now seems to be Google's system for complying with POSIX
by lengthening the duration of a second on days which contain leap seconds.
That also used to be standard practice; the day was twelve hours long by
definition, and the night was twelve (different) hours long by definition, and
day-hours and night-hours varied in length with the seasons. Should we go back
to that system? Why is it a good idea for seconds? The entire point of
defining seconds in terms of cesium was to stop defining them as 1/86400 of a
day. But if you stop defining them as 1/86400 of a day, how does it come as a
surprise that a day might contain other than 86400 of them?

~~~
drewhk
Why does everyone refer to this as "Google's system"? The idea of slowing down
clocks instead of turning them back is very-very old. In fact 10 years ago one
of the first thing we were thought at university in the Embedded Systems
course is that you never turn clocks back. Am I missing something here?
(honest question, no irony or sarcasm intended)

~~~
Someone
This is not slowing down a clock for a while because it runs fast, but slowing
it down to make it run too slow, and then skipping a leap second.

The first is good; it corrects an error without introducing the catastrophe of
a clock running backwards. The second, according to some is bad because it
introduces a temporary error without any benefit. According to others, it does
introduce a benefit: you won't have to deal with leap seconds.

~~~
drewhk
> This is not slowing down a clock for a while because it runs fast, but
> slowing it down to make it run too slow, and then skipping a leap second.

Ok, I see the difference. The issue here is not preserving monotonicity but
keeping the semantics of the "wall clock".

