
Ntpd won't save you from one particular rogue bit - dantiberian
http://rachelbythebay.com/w/2017/09/27/2153/
======
wruza
>Time is hard.

I learned that once tried to implement "universal" datetime library for fun
and education. There is wall clock, which leaps for political decisions;
universal time coordinated which leaps on schedule to ease astronomical
differences; atomic time of few sorts; numerous computer clocks; time zones
which can go back and forth for north and south, land and sea; non-gregorian
calendars and the fact that there is no 0 AD and no 0 BC; that dates were
offset for many days few times in gregorian history; special and general
relativity errors; and of course integer overflow issues.

I may have missed few points, but still, time is hard.

~~~
Dylan16807
> universal time coordinated which leaps on schedule to ease astronomical
> differences

"Schedule" is a bit generous there.

Oh and there's TAI for just counting seconds but you're discouraged from using
it that way because of some slightly-suspicous reasoning about retrospective
calibration.

~~~
deepsun
That's interesting, could you elaborate, or give a hyperlink on why TAI is
discouraged?

~~~
scottlamb
I think it boils down to compatibility - POSIX and SUS specify that
CLOCK_REALTIME is UTC, and platform libc time conversion routines expect this.
So everyone uses UTC.

On Linux, there's now a CLOCK_TAI, but I'm not sure how usable it is. Does it
have the correct offset on boot? when ntpd starts? and you have to convert
back and forth to interact with the rest of the world, and there's no call to
get the time and the conversion factor atomically, which is what you'd
probably want if you're using the same timestamp internally and externally. (I
think there's some non-privileged call to get the conversion; you could call
that, get the time, and then call it again; retry if it's changed. but that's
annoying.) And of course this is non-portable.

edit: additionally, if you have a stored, bare TAI value, what can you do with
it other than see how many seconds older it is than the current time? nothing
on a standard system stores when previous leap seconds happened iirc so you
can't convert to UTC (and thus can't convert to civil time) unless you roll
your own table (and conversion routines) or always store the offset with every
TAI (increasing storage overhead).

------
userbinator
_This means you can 't actually persist the bad time to your RTC._

I wonder where that limit comes from, since the RTC in a standard PC stores
both the year(00-99) and century(also 00-99) in BCD, so a date in 2153 should
be representable.

This reminds me, one interesting thing I've noticed over the years is that
early PC's RTCs were pretty accurate, but the most recent ones, as in the past
few years, are horrible (as much as +/\- several seconds per day), and the
ones in smartphones even worse. Maybe because they're assuming NTP, so have
cut costs by using less accurate crystals? I've had systems in storage for
many years, and the RTC was within a few seconds of the current time when
turned on again.

~~~
wruza
>and the ones in smartphones even worse

Oh, I bet my tooth that I can hear my phone playing music slightly faster than
usual _sometimes_. Never experienced that on PC. Maybe it is not connected to
RTC and is purely biological condition (never cared enough to test two devices
side by side), but if someone experienced that too and has an explanation, it
would be great to know.

~~~
wyldfire
The RTC is not a factor for problems in music playback on phones or elsewhere.
It's only for "wall clock" time, and generally only for when the system boots.
The OS timer tick that drives the scheduler is often generated by the CPU and
some operating systems don't use it anymore -- Linux has offered a tickless
config for a while that I think many android configs capitalize on for the
sake of conserving battery. The scheduler could be a culprit for playback
discontinuity (stalls) but not problems with the tempo.

I'm wondering -- "What might cause music playback to be faster than normal?" I
don't think I know enough to tell. Perhaps the clocks on the audio playback
device are skewed? Most SoCs use a dedicated DSP which doesn't seem terribly
different from PC's. Not sure, but we can say for certain that it's definitely
immune to changes in wall clock time (whether in the RTC or its representation
in the OS).

~~~
tonyarkles
I had a USB sound card that, on windows, would play back at 48kHz instead of
44.1kHz. Slight pitch shift upwards, playback at 8% faster. Most apps just let
the audio driver provide backpressure and don't actively try to send data at
the rate the soundcard is expecting, they just feed bytes when asked.

Even more fun was that it was a dual-boot machine, and the audio worked
perfectly in Linux but shifted up in Windows. I honestly thought I was losing
my mind for a while. I'd listen to the same song in Spotify on both OSes and
just get this "something is wrong here..." feeling in my gut.

------
gerdesj
Well that was fun! I tried out the timeshift program and my PC duly went a bit
mad. ntpq -p showed all being well.

Chromium threw a fit because news.ycombinator.com's SSL cert had expired and
offered to reset the clock which it could not do, given that I'm not in the
habit of running apps as root. My Kerberos tickets all expired so Evolution
lost contact and so did quite a few other things. MariaDB dumped core.
systemd's timers went berserk so things like my LetsEncrypt cert tried renew
itself.

I'm still hunting through some odd looking log files but overall things seem
to be back to normal when I reran timeshift and returned to current time.

~~~
winkywooster
Similarly with my Mac, within moments it became unusable. Even trying to
reboot from the command line and I got a message that it was waiting to
acquire a lock on /. Hard reset and everything is back to normal.

~~~
netsharc
iOS devices had a bug that hard locks it if you set the time to 1/1/1970 (on
US timezones, that sets it to negative territory), I can't believe they don't
do any tests for their OSes to survive this...

~~~
ReverseCold
Is there a problem with locking current time to the release date of the
product?

------
whouweling
Interesting edge case! This is why it may be sensible to do some extra checks
in things like product expiration batch jobs, to check if the previous run was
not to far in the future or past and refuse to run in that case.

Scary to think what might happen if some database purge process is running
after this bit flip!

~~~
viraptor
This is going to give me actual nightmares. Not only database cleanups, but
there are quite a few automated backup cycling scripts which would happily
throw out terabytes of data this way...

~~~
nolok
As parent said, it's usually a very good idea to check for time delay and
ensure it "makes sense" before doing anything destructive. Doesn't cost much,
can save a lot, and adding a command switch to bypass it allows to skip the
"haven't booted in a couple week and now everything refuses to run" problem.

A much more common case of when such a check help is "last time run is in the
future", that you face every time there is a clock issue. Some scripts...
Don't react too well about that.

------
wyldfire
One source of RTC weirdness is the fact that inb/outb to/from the RTC and
other legacy parts of the ISA aren't protected by mutex on linux. So if you're
unlucky enough to collide with some other program doing something innocuous
with the RTC you can accidentally set bits that will never be set by the RTC.
Many RTCs decide not to tick anymore when that happens.

If all your programs use /dev/rtc and ioctl()s you are probably a little safer
because there will be coarse locks around the RTC itself, those will serialize
the activity. But IIRC the inb/outb stuff can be done from user space (as
superuser) and even if you're only reading you have to write to the address
register which could break a write-in-progress by sending its output to the
wrong RTC field.

~~~
exikyut
> _Many RTCs decide not to tick anymore when that happens._

Of course I laughed when I read that, but then I realized I was interpreting
that line to mean "it declares shenanigans and stops reporting time so you
realize something broke."

Just wanted to clarify - do you mean the above, or " _of course software
cannot kill hardware!!1_ " "won't tick _anymore_ "?

~~~
wyldfire
Yeah, IIRC it really stopped ticking (until you write a valid value at which
point it would resume).

Note that it doesn't stop reporting the time in this condition it will just
give you back all the garbage fields that you wrote before. So an interesting
thing happens when the system tries to transform that into a UTC wall clock
basis and it usually ends up with a really wild interpretation of the date
(decades/centuries off, similar to the problem described in TFA).

~~~
exikyut
> _Yeah, IIRC it really stopped ticking (until you write a valid value at
> which point it would resume)._

Ah, okay then. Good to know I can't accidentally thousands of dedicated
servers :P (ie, via NTP MITM, writing to hw RTC...)

> _Note that it doesn 't stop reporting the time in this condition it will
> just give you back all the garbage fields that you wrote before._

I don't know why I didn't remember this last night: Linux uses the RTC as a
poor-man's NVRAM that will persist across a reboot. Provides <24 bits of data
to work with (yay! ...not).
[https://wiki.ubuntu.com/DebuggingKernelSuspend](https://wiki.ubuntu.com/DebuggingKernelSuspend),
useful info in
[https://github.com/torvalds/linux/blob/e34bac726d27056081d02...](https://github.com/torvalds/linux/blob/e34bac726d27056081d0250c0e173e4b155aa340/drivers/base/power/trace.c)

> _So an interesting thing happens when the system tries to transform that
> into a UTC wall clock basis and it usually ends up with a really wild
> interpretation of the date (decades /centuries off, similar to the problem
> described in TFA)._

Right.

------
poizan42
> If the resulting time that ntpd sees is more than a few milliseconds off,
> it'll step the clock, and that will clear out the future time.

It shouldn't do that, otherwise it's going to break in 2036.

(The "future time" it's going to clear off will set the time back to 1900 once
we are past February 7, 2036)

Edit: I don't know if I didn't explain myself well enough. Because the
protocol only has 32 bits of seconds it cannot tell apart September 27, 2017
and November 4, 2153. This means that ntpd absolutely must trust that it is in
the correct 68-year span. But according to the author ntpd "clears out the
future time" if it is more than a few milliseconds off. This violates the spec
and is also inconsistent as it happily keeps up with the future date as long
as it doesn't have to step the clock.

~~~
nolok
I often see this type of comment on many subject and I find it super weird in
a way. I mean, if it took you less than a minute after reading it to spot the
issue despite not being familiar with the details before, it's usually safe to
assume whoever made the specs and spent days on it also did.

~~~
poizan42
Where did I disagree with the spec?

~~~
nolok
I didn't say you did? You pointed an obvious error. I'm saying, yes, I'm
pretty sure we can assume it's been handled already given how easy it is to
spot (and the other comment to your message seems to imply it is indeed)

~~~
poizan42
You made it sound like? Of course it is handled by the spec, the issue here is
in ntpd not following the spec (assuming what the author says is true, I
haven't checked).

Or actually there is this little tidbit from RFC 5905:

> Eras cannot be produced by NTP directly, nor is there need to do so. When
> necessary, they can be derived from external means, such as the filesystem
> or dedicated hardware.

So it _could_ be that ntpd looks at some file or the rtc to determine the era
rather than assuming the current system time is in the correct era, and it
would be allowed by the spec. But it's quite inconsistent if it only does it
if the system time is more than a few milliseconds off (presumably beyond the
limit of when it corrects the time by time stretching). I'm going to go with
it just being a bug in ntpd.

~~~
nolok
Sorry if you understood me that way. I was reflecting on some things that I
see often on hn, eg a common case is when Google annonce a change in crawling
and people go "but it can be gamed by...".

Your message merely reminded me of that and so I put my message here as I saw
similarity. Having not read the ntpd spec in question I can't answer you.

------
jlgaddis
It won't save you from this particular case but you're safe so long as your
clock is off by no more than ~68 years.

~~~
cgsmith
The other use case is manufactured hardware with their time set for the unix
epoch and depend on it for updating.

------
petecooper
I get numerous requests to my (macOS) computer for ntpd to connect to shady
subnets when I'm connected to a particular commercial VPN:

[https://twitter.com/petecooper/status/911946604759977984](https://twitter.com/petecooper/status/911946604759977984)

[https://pbs.twimg.com/media/DKficrvW4AA1Hxm.jpg:large](https://pbs.twimg.com/media/DKficrvW4AA1Hxm.jpg:large)

Numerous hosts across numerous networks, perhaps two or three an hour.

I've wondered what exactly would be gained by resetting a clock to a different
time – this is a useful article.

~~~
geofft
That sounds like

1\. your VPN provider is giving you an actual public IP address (??)

2\. people are scanning your computer for NTP vulnerabilities or something
(this happens if you have a public IP, regardless of network)

3\. NTP is using UDP and so connectionless, and so Little Snitch can't
distinguish "ntpd wants to reply to someone who contacted it" from "ntpd wants
to connect to someone"

An alternative explanation for 1/2 is that your VPN provider is not isolating
you from other VPN users (less surprising than giving you your own public IP)
and someone else on the VPN is trying to conduct NTP amplification attacks
using you: [https://blog.cloudflare.com/understanding-and-mitigating-
ntp...](https://blog.cloudflare.com/understanding-and-mitigating-ntp-based-
ddos-attacks/)

In either case, the solution is basically to make your ntpd not listen for
requests from other machines and only handle time from your local computer +
initiating requests to time.apple.com or whatever your chosen NTP server is.
It shouldn't be trying to reply at all to unexpected packets, even to send a
refusal message (again, because UDP is connectionless, it's easy for an
attacker on your LAN to send spoofed packets and convince you to send replies
to some random computer on the internet, and I guess on this VPN, other
customers are your LAN). I'm surprised that macOS's default NTP server isn't
configured this way out-of-the-box, though.

~~~
jandrese
It seems strange that the firewall would block the outbound packet after
letting the udp packet from some completely random host in.

~~~
geofft
Little Snitch is not so much a firewall in the usual sense as a phone-home
prevention device. It's primarily interested in blocking outbound traffic
(exfiltration), not inbound traffic.

------
mnarayan01
Unless you're running it with additional configuration (e.g. the -g option), I
don't think ntpd will save you from any bit flips other than the 10 least
significant: You'll be outside the 1000s panic threshold.

~~~
Fnoord
Its not recommended to use ntpd with -g argument in production. An attacker
can MITM NTP protocol. The 1000s threshold severely limits this attack. The
attack can be used e.g. in defeating TOTP.

I'm not sure if this rogue bit can be used to attack TOTP. Can anyone clarify?

------
tonyg
We will continue to suffer problems like this so long as we continue to use
languages which offer machine words in place of general integers.

------
simcop2387
I wonder if any of the other ntp implementations (chrony et al) suffer from
this same issue?

~~~
jlgaddis
Yeah, it's an issue with NTP the "protocol" not NTP the "program".

~~~
grogers
To expand on this, NTP uses timestamps with 32 bit seconds (plus fractions of
a second), so if you manually step your clock some multiple of ~136 years, the
protocol inputs will be the exact same so you'd have no way of knowing you
were off from the server.

However, you could imagine an NTP implementation which hardcodes the
approximate starting time to get the right era. You'd only have to recompile
every lifetime or so to keep it up to date.

~~~
jlgaddis
> _However, you could imagine an NTP implementation which hardcodes the
> approximate starting time to get the right era. You 'd only have to
> recompile every lifetime or so to keep it up to date._

Yep, see "NTP pivot dates" [0]:

> When ntpd(8) receives a unresolved timestamp from an upstream server that
> timestamp could be based in any era ... To resolve this ambiguity, NTP also
> uses an internal pivot date ... An ntpd(8) instance’s pivot date will be the
> date it was compiled and built.

[0]:
[https://docs.ntpsec.org/latest/rollover.html#ntp_pivots](https://docs.ntpsec.org/latest/rollover.html#ntp_pivots)

------
ifoundthetao
Hm, I wasn't able to replicate it. I'm using a Kali Linux VM. I'll try some
other OSes and see if it works on there.

Was anyone else successful with the PoC code?

~~~
sp332
Is your VM host resetting the time in the guest?

~~~
ifoundthetao
That's possible. I'll try it on a physical system in a bit. See what I can do.

------
gumby
This is a nice bug! And not really a clock issue -- many programs/protocols
could have such a bug.

