Hacker News new | past | comments | ask | show | jobs | submit login

Yes, this is why we (Netflix) default to tsc over the xen clocksource. I found the xen clocksource had become a problem a few years ago, quantified using flame graphs, and investigated using my own microbenchmark.

Summarized details here:

https://www.slideshare.net/brendangregg/performance-tuning-e...




This reminds me: I should give an updated version of that talk for 2017...


I've been in a couple positions recently where they mention your name and I look at your work and think to myself..here is a sysadmin with modest skills who (by exposure) has become notably vocal and somewhat adept at scale computing. In general if a company mentions Netflix or Brendan Gregg I flinch. Just an FYI.


Sorry to make you flinch! I'm curious what of my work you were looking at; on this thread I had mentioned this:

https://www.slideshare.net/brendangregg/performance-tuning-e...

I think it's a pretty good summary, and includes work from my team and some original work of my own.

Is there something I could change in it that would make it more helpful for you?


Please do, I would be very interested in this!


Honestly everyone should be defaulting to the TSC on modern x86. Timekeeping on a single OS image over the short term[1] is a hardware feature available at the ISA level now. It's not something to which the OS can add value, and as we see in circumstances like this it tends to muck things up trying to abstract it.

[1] Long term issues like inter-clock drift and global synchronization are a rather different problem area, and the OS has tools to help there.


The gettimeofday vDSO does use the TSC. The purpose of the vDSO is making visible the continuously updated values necessary for userland to adjust and correct TSC-based calculations. Many of those values are still necessary even when the TSC is shared and constant-rate.

A pure TSC implementation will sacrifice accuracy (because it's not being trained by the HPET or corrected by NTP), performance (because it'll need to do a full syscall occasionally), or both.

If you're sophisticated like NetFlix you can probably assure yourself it's no big deal. But it's a bad idea for others to blindly do the same thing. Look at the issue with Go's timeouts. Go used gettimeofday rather than CLOCK_MONOTONIC because the authors assumed the behavior of Google's system's clock skewing algorithm. That assumption broke spectacularly for many other people not using Google's servers.


Can you share if you needed to do anything to deal with time drift issues when using tsc? For my own systems, incorrect timestamps would cause a lot of issues.


Well, it's been a few years and we haven't switched it back. :)

We have had a number of clock issues, and one of the first things I try is taking and instance and switching it back to xen for a few days, but those issues have not turned out to be the clocksource. Usually NTP.

AWS can comment more about the state (safety/risk) of these clocksources (given they have access to all the SW/HW internals).





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: