Hacker News new | past | comments | ask | show | jobs | submit login

I would also love to see an explanation of “why do we need this much accuracy?” that actually goes through the derivation of how much accuracy you need.

Some of the justification for Google’s TrueTime is found in the Spanner docs:

https://cloud.google.com/spanner/docs/true-time-external-con...

Basically, you want to be able to do a “snapshot read” of the database rather than acquiring a lock (for reasons which should be apparent). The snapshot read is based on a monotonic clock. You can get much better performance out of your monotonic clock if all of your machines have very accurate clocks. When you write to the database, you can add a timestamp to the operation, but you may have to introduce a delay to account for the worst-case error in the clock you used to generate the timestamp.

More accurate timestamps -> less delay. From my understanding, less delay -> servers have more capacity -> buy fewer servers -> save millions of dollars -> use savings to pay for salaries of people who figured out how to make super precise timestamps and still come out ahead.

This kind of engineering effort makes sense at companies like Google and Meta because these companies spend such a large amount of money on computer resources to begin with.




Meta uses some variations on Hybrid Logical Clocks, which are very similar to TrueTime, so yes this does apply. Besides performance they very much want to avoid consistency issues that could result in a security breach, eg, if I block Alan and then post "Alan is a dookie head" you don't want some node seeing the second event first. Well really the bigger concern is someone spots this as a potential vulnerability point and scripts something.


This is something like my third attempt to read the spanner paper - I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?


> I get how it helps ordering of transactions but I am confused if it is used in making transactions atomic across machines ?

AIUI, you cannot quiet think of it like a regular database where a particular row has a particular value which would necessitate only one writer doing (atomic) updates at a time.

Rather it is a MVCC-like database and a bit like an append-only log: as many writers as needed can write and there are 'multiple values' for each row. The "actual" value of the row is the one with the highest transaction ID / timestamp. So updates can happening without (atomic) locking by just adding to the value(s) that already exist.

When reading, applications just generally get served the value with the highest-value timestamp, and since time is synchronized to such a tiny interval, it is a reasonably sure bet that the highest value is the most recent transaction.

This is similar in concept to a vector clock (see also Lamport):

* https://en.wikipedia.org/wiki/Vector_clock

But instead of logical clocks with 'imaginary time', 'real time' is used down to the sub-microsecond level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: