Hacker News new | past | comments | ask | show | jobs | submit login
Two frequently used system calls are ~77% slower on AWS EC2 (packagecloud.io)
415 points by jcapote on March 7, 2017 | hide | past | favorite | 96 comments

Yes, this is why we (Netflix) default to tsc over the xen clocksource. I found the xen clocksource had become a problem a few years ago, quantified using flame graphs, and investigated using my own microbenchmark.

Summarized details here:


This reminds me: I should give an updated version of that talk for 2017...

I've been in a couple positions recently where they mention your name and I look at your work and think to myself..here is a sysadmin with modest skills who (by exposure) has become notably vocal and somewhat adept at scale computing. In general if a company mentions Netflix or Brendan Gregg I flinch. Just an FYI.

Sorry to make you flinch! I'm curious what of my work you were looking at; on this thread I had mentioned this:


I think it's a pretty good summary, and includes work from my team and some original work of my own.

Is there something I could change in it that would make it more helpful for you?

Please do, I would be very interested in this!

Honestly everyone should be defaulting to the TSC on modern x86. Timekeeping on a single OS image over the short term[1] is a hardware feature available at the ISA level now. It's not something to which the OS can add value, and as we see in circumstances like this it tends to muck things up trying to abstract it.

[1] Long term issues like inter-clock drift and global synchronization are a rather different problem area, and the OS has tools to help there.

The gettimeofday vDSO does use the TSC. The purpose of the vDSO is making visible the continuously updated values necessary for userland to adjust and correct TSC-based calculations. Many of those values are still necessary even when the TSC is shared and constant-rate.

A pure TSC implementation will sacrifice accuracy (because it's not being trained by the HPET or corrected by NTP), performance (because it'll need to do a full syscall occasionally), or both.

If you're sophisticated like NetFlix you can probably assure yourself it's no big deal. But it's a bad idea for others to blindly do the same thing. Look at the issue with Go's timeouts. Go used gettimeofday rather than CLOCK_MONOTONIC because the authors assumed the behavior of Google's system's clock skewing algorithm. That assumption broke spectacularly for many other people not using Google's servers.

Can you share if you needed to do anything to deal with time drift issues when using tsc? For my own systems, incorrect timestamps would cause a lot of issues.

Well, it's been a few years and we haven't switched it back. :)

We have had a number of clock issues, and one of the first things I try is taking and instance and switching it back to xen for a few days, but those issues have not turned out to be the clocksource. Usually NTP.

AWS can comment more about the state (safety/risk) of these clocksources (given they have access to all the SW/HW internals).

Another option is to reduce usage of gettimeofday() when possible. It is not always free.

Roughly 10 years ago, when I was the driver author for one of the first full-speed 10GbE NICs, we'd get complaints from customers that were sure our NIC could not do 10Gbs, as iperf showed it was limited to 3Gb/s or less. I would ask them to re-try with netperf, and they'd see full bandwidth. I eventually figured out that the complaints were coming from customers running distros without the vdso stuff, and/or running other OSes which (at the time) didn't support that (Mac OS, FreeBSD). It turns out that the difference was that iperf would call gettimeofday() around every socket write to measure bandwidth. But netperf would just issue gettimeofday calls at the start and the end of the benchmark, so iperf was effectively gettimeofday bound. Ugh.

> Another option is to reduce usage of gettimeofday() when possible. It is not always free.

haha, it's amazing how much software is written that basically does something ridiculous like "while (gettimeofday()) clock_gettime();".

I found the articles on your blog about that topic quite interesting.

Syscalls in general far too often gets treated as if they're as cheap as function calls, with people often never profiling to see just how much they can affect throughput.

Apart from gettimeofday() other "favourites" of mine that people are often blind to include apps that do lots of unnecessary stat-ing of files, as well as tiny read()/write()'s instead of buffering in userspace.

Maybe you could set up some caching.

Caching is basically what the vdso things do. In my recollection, they basically grab a good time from the kernel occasionally, and then use userspace accessible things like rdtsc() to offset from that authoritative timestamp. So it turns millions of syscalls into one.

For time?

Yes, for time too. For one, if you don't need over second precision, they why have some of your servers e.g. ask for the current time thousands of times per second? There are ways to get a soft expiration that don't involve asking for the time.

In case someone is interested in a concrete example, I first learned about caching time by discovering this package in my dependencies: https://hackage.haskell.org/package/auto-update

Its README basically says instead of having every web request result in a call to get current time, it instead creates a green thread that runs every second, updating a mutable pointer that stores the current time.

Yeah, with a TTL, and on each round you just check the time to see if it's expired.

yeah just call gettimeofday() to see if it expired yet.

Obviously you don't use clock-TTL.

Sorry, I couldn't resist.

The title is misleading. 77% slower sounds like the system calls take 1.77x the time on EC2. In fact, the results indicate that the normal calls are 77% faster - in other words, EC2 gettimeofday and clock_gettime calls take nearly 4.5x longer to run on EC2 than they do on ordinary systems.

This is a big speed hit. Some programs can use gettimeofday extremely frequently - for example, many programs call timing functions when logging, performing sleeps, or even constantly during computations (e.g. to implement a poor-man's computation timeout).

The article suggests changing the time source to tsc as a workaround, but also warns that it could cause unwanted backwards time warps - making it dangerous to use in production. I'd be curious to hear from those who are using it in production how they avoided the "time warp" issue.

77% faster is not correct either. "Speed" would probably by ops/s.

4.5x longer = 350% slower.

Even this is confusing as hell.

Just say the native calls take 22% of the time they do on EC2. Or that the EC2 calls take 450% of the time of their native counterparts.

"Faster" and "slower" when going with percentages are ripe with confusion. Please don't use them.

I can't agree. Speed is usually units/time, and everyone knows that 100 mph is 2x as fast as 50 mph, or 100% faster.

This is what decibels are for.

> Some programs can use gettimeofday extremely frequently

This is what's usually considered the "root cause" of this problem, though. It's easy enough, if it's your own program, to wrap the OS time APIs to cache the evaluated timestamp for one event-loop (or for a given length of realtime by checking with the TSC.) Most modern interpreters/VM runtimes also do this.

Why cache it when the vdso has already solved the problem? Seems best to not stack another mitigation on top of it.

Because you're writing portable code and not every (or even most) systems do said caching.

Yeah like the php Xdebug extension. In 5.3 at least even with it just loaded and nothing enabled it called gettimeofday 1000s of times and would add seconds to web app render times for me (also on Xen with slow gettimeofday)

I prefer the way Solaris solved this problem:

1) first, by eliminating the need for a context switch for libc calls such as gettimeofday(), gethrtime(), etc. (there is no public/supported interface on Solaris for syscalls, so libc would be used)

2) by providing additional, specific interfaces with certain guarantees:


This was accomplished by creating a shared page in which the time is updated in the kernel in a page that is created during system startup. At process exec time that page is mapped into every process address space.

Solaris' libc was of course updated to simply read directly from this memory page. Of course, this is more practical on Solaris because libc and the kernel are tightly integrated, and because system calls are not public interfaces, but this seems greatly preferable to the VDSO mechanism.

This is precisely what the vDSO does. The clocksources mentioned explicitly list themselves as not supporting this action, hence the fallback to a regular system call.

Not quite; vdso is a general syscall-wrapper mechanism. The Solaris solution is specifically just for the gettimeofday(), gethrtime() interfaces, etc.

The difference is that on Solaris, since there is no public system call interface, there's also no need for a fallback. Every program is just faster, no matter how Solaris is virtualized, since every program is using libc.

There's also no need for an administrative interface to control clocksource; the best one is always used.

Not quite. The vDSO provides a general syscall-wrapper mechanism for certain types of system call interfaces. It also provides implementations of gettimeofday clock_gettime and 2 other system calls completely in userland and acts precisely as you've described.

Please see this[1] for a detailed explanation. For a shorter explanation, please see the vDSO man page[2]. Thanks for reading my blog post!

[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g... [2]: http://man7.org/linux/man-pages/man7/vdso.7.html

I'm aware of the high level about VDSO implementation, but I would still say that the Solaris implementation is more narrowly focused and as a result does not have the subtle issues / tradeoffs that VDSO does.

Also, I personally find VDSO disagreeable as do others although perhaps not in as dramatic terms as some:


I think Ian Lance Taylor's summary is the most balanced and thoughtful:

Basically you want the kernel to provide a mapping for a small number of magic symbols to addresses that can be called at runtime. In other words, you want to map a small number of indexes to addresses. I can think of many different ways to handle that in the kernel. I don't think the first mechanism I would reach for would be for the kernel to create an in-memory shared library. It's kind of a baroque mechanism for implementing a simple table.

It's true that dynamically linked programs can use the ELF loader. But the ELF loader needed special changes to support VDSOs. And so did gdb. And this approach doesn't help statically linked programs much. And glibc functions needed to be changed anyhow to be aware of the VDSO symbols. So as far as I can tell, all of this complexity really didn't get anything for free. It just wound up being complex.

All just my opinion, of course.


> Not quite; vdso is a general syscall-wrapper mechanism.

It's not. On 32-bit x86, it sort of is, but that's just because the 32-bit x86 fast syscall mechanism isn't really compatible with inline syscalls. Linux (and presumably most other kernels) provides a wrapper function that means "do a syscall". It's only accelerated insofar as it uses a faster hardware mechanism. It has nothing to do with fast timing.

On x86_64, there is no such mechanism.

> It's true that dynamically linked programs can use the ELF loader. But the ELF loader needed special changes to support VDSOs. And so did gdb. And this approach doesn't help statically linked programs much.

That's because the glibc ELF loader is a piece of, ahem, is baroque and overcomplicated. And there's no reason whatsoever that vDSO usage needs to be integrated with the dynamic linker at all.

I wrote a CC0-licensed standalone vDSO parser here:


It's 269 lines of code, including lots of comments, and it works in static binaries just fine. Go's runtime (which is static!) uses a vDSO loader based on it. I agree that a static table would be slightly simpler, but the tooling for debugging the vDSO is a heck of a lot simpler with the ELF approach.

This all seems predicated on the fact that Solaris doesn't support direct system calls and the fact that they ship their kernel and libc as one unified whole (like BSDs). Solaris is free to update the layout of their shared data structures whenever they want[1].

Because Linux kernel interfaces are distinct and separate from libc, and given Linus' policy on backwards compatibility, Linux had two choices for an _interface_: 1) export a data structure to userland that could never change, or 2) export a code linking mechanism to userland that could never change. In that light the latter choice seems far more reasonable.

[1] The shared data structures for this particular feature. There are other kernel data structures that leak through the libc interface and for which Solaris is bound to maintain compatibility.

The fallback isn't there because there's a public system call interface: the fallback is there because some of the kernel-side implementations of gettimeofday() (in particular, the Xen one) currently require the process to do a proper syscall.

This is separate from the fact that the gettimeofday() system call still exists too, which is a backwards-compatibility issue. The overwhelming majority of Linux applications do their system calls through libc too, so this doesn't affect them.

For those actually curious about the implementation on solaris/illumos, heres a quick rundown (from looking at current illumos source):

- comm_page (usr/src/uts/i86pc/ml/comm_page.s) is literally a page in kernel memory with specific variables that is mapped (usr/src/uts/intel/ia32/os/comm_page_util.c) as user|read-only (to be passed to userspace, kernel mapping is normal data, AFAICT)

- the mapped comm_page is inserted into the aux vector at AT_SUN_COMMPAGE (usr/src/uts/common/exec/elf/elf.c)

- libc scans auxv for this entry, and stashes the pointer it containts (usr/src/lib/libc/port/threads/thr.c)

- When clock_gettime is called, it looks at the values in the COMMPAGE (structure is in usr/src/uts/i86pc/sys/comm_page.h, probing in usr/src/lib/commpage/common/cp_main.c) to determine if TSC can be used.

- If TSC is usable, libc uses the information there (a bunch of values) to use tsc to read time (monotonic or realtime)

Variables within comm_page are treated like normal variables and used/updated within the kernel's internal timekeeping.

Essentially, rather than having the kernel provide an entry point & have the kernel know what the (in the linux case) internal data structures look like, here libc provides the code and reads the exported data structure from the kernel.

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

This summary only applies to Illumos. The Solaris implementation diverged significantly around build 167 (2011) long after the last OpenSolaris build Illumos was based on (build 147). It changed again significantly in 2015.

I believe Circonus contributed an alternate implementation that does some of the same things as Solaris in 2016:


With that said, you are correct that whether or not it will read from a memory page instead depends on which interfaces you are using (e.g. get_hrusec()) and other subtle details.

So the only things I'm seeing in the linked circonus code that differ from illumos:

1. no use of a kernel supplied page, determines skew/etc itself in userspace 2. stores information on a per-cpu level, and tries to execute cpuid on the same cpu as rdtsc.

I'm presuming you're talking about #2 (and #1 is just due to the linked item being a library without kernel integrations)? Perhaps with some more kernel support so that the actual cpu rdtsc ran on can be reliably determined?

This still doesn't clarify the part about "shared page in which the time is updated" and is read from. This statement appears to imply TSC is not (necessarily) used (otherwise I'd categorize it under "uses values from memory page to fixup TSC", like Illumos' current implimentation). I'm still not sure how that can be done reasonably.

Is there just a 1 micro second timer running whenever a user task is being executed that is bumping the value? Wouldn't that be quite a bit of overhead? Or some HW trick? I mean, you could generate a fault on every read, and have the kernel populate the current data, but that seems just as bad as a syscall.

You just described the old method Linux used that was vulnerable to info leaks iirc and why it now a vDSO

The Solaris method doesn't have the problem the other implementation did.

How does solaris find the page? If it's mapped to a fixed address then it does have that problem.

The default is to map the shared page to a randomized, available address within the process space.

libc gets the address of the page by looking it up in an auxiliary vector table that belongs to the process.

Sounds like the clock resolution would be limited to the ticket interrupt in this case - how does it handle high resolution timers?

Author here, greetings. Anyone who finds this interesting may also enjoy our writeup describing every Linux system call method in detail [1].

[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...

Nitpick - `77 percent faster` is not the inverse of `77 percent slower`. The line that says `The results of this microbenchmark show that the vDSO method is about 77% faster` should read `446% faster`.

Should that not be 346% faster? If A takes 1 second and B takes two seconds, then B is 100% faster than A. So the calculation would be (B/A - 1) * 100. Applying this here gives around 346%.

EDIT: B would, of course, take 100% longer than A, rather than be 100% faster.

How can something that takes twice as long be faster?

You're right, of course: hadn't had the morning coffee. It should have been 'takes 100% longer' in the 1 second/2 seconds example. The point I was trying to make is that you have to factor in the initial 100% which doesn't contribute to the final value.

Yes. Foot meet mouth. :)

I will def check that out. Anyone who find that interesting may also enjoy "The Linux Programming Interface" :D

Nitpick: slower _than_ what? It's implied, but "slower" (or "greater", or anything-er) is in relation to another thing.

This is rather out of date. Everything works quite similarly, but the kernel code is very different these days.

For anyone looking at the mentions of KVM "under some circumstances" having the same issue and wondering how to avoid it with KVM: KVM appears to support fast vDSO-based time calls as long as:

- You have a stable hardware TSC (you can check this in /proc/cpuinfo on the host, but all reasonably recent hardware should support this).

- The host has the host-side bits of the KVM pvclock enabled.

As long as you meet those two conditions, KVM should support fast vDSO-based time calls.

So… it's not that the syscalls are slower, it's that the Linux-specific mechanism the Linux kernel uses to bypass having to actually perform these calls does not currently work on Xen (and thus EC2).

Depends on if you're looking at this from userspace or kernelspace. From the latter, you're spot on. From the former, the headline's spot on.

> From the former, the headline's spot on.

Only if you're using Linux guests and assuming vDSO so not really. The headline made me first go to issues with the host/virtual hardware and some syscalls being much slower than normal across the board.

This was also presented at the last AWS re:Invent in December. See AWS EC2 Deep Dive: https://de.slideshare.net/mobile/AmazonWebServices/aws-reinv...

Interesting way to find out the version of the hypervisor kernel. If the gtod call returns faster than the direct syscall for it, then you know the kernel version is prior to that of the patch fixing the issue in xen.

I expect there are many such patches that you could use to narrow down the version range of the host kernel. Once you've that information, you may be in a better position to exploit it, knowing which bugs are and are not patched.

If anybody is interested, Google Compute Engine VM's result.

    blog   ~ touch test.c
    blog   ~ nano test.c
    blog   ~ gcc -o test test.c
    blog   ~ strace -ce gettimeofday ./test
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
      0.00    0.000000           0       100           gettimeofday
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000000

Previous related discussion: https://news.ycombinator.com/item?id=13697555

vDSO maintainer here.

There are patches floating around to support vDSO timing on Xen.

But isn't AWS moving away from Xen or are they just moving away from Xen PV?

Does anyone have any intuition around how this affects a variety of typical workflows? I imagine that these two syscalls are disproportionally likely to affect benchmarks more than real-world usage. How many times is this syscall happening on a system doing things like serving HTTP, or running batch jobs, or hosting a database, etc?

You can use strace and see!

Go to your staging environment, use `strace -f -c -p $PID -e trace=clock_gettime` (or don't use -p and just launch the binary directly), replay a bit of production traffic against it, and then interrupt it and check the summary.

HTTP servers typically return a date header, often internally dates are used to figure out expiration and caching, and logging almost always includes dates.

It's incredibly easy to check the numbers of syscalls with strace, so you really should be able to get an intuition fairly easily by just playing around in staging.

> hosting a database

this will very likely be calling time related system calls, especially clock_gettime with CLOCK_MONOTONIC.

Is this just an EC2 problem, or does it affect any Xen/KVM guest?

I ran the test program on a Hyper-V VM running CentOS 7 and got the same result: 100 calls to the gettimeofday syscall. Conversely, I tested a vSphere guest (also running CentOS 7), which didn't call gettimeofday at all.

It depends on a number of things. My DigitalOcean instance has this problem. The virtual machine I spun up on Oracle's Bare Metal Cloud platform doesn't (disclaimer, I work for the team).

>Is this just an EC2 problem, or does it affect any Xen/KVM guest?

Looks like it's how the Xen hypervisor works.

It is slower because it misses an optimization where you can get the current time without having to enter the kernel. The trick is using the RDTSC instruction, which is not a privileged instruction, so you can call it from userspace. The Time Stamp Counter is a 64 bit register (MSR actually), which gets incremented monotonically. You can get the current time by calibrating it against a known duration on boot or get the frequency from a system table first, then with a simple division and adding an offset. There are sone caveats though, like you have to check if the CPU has an invariant TSC using CPUID and every core has a separate register. I think the problem with XEN is that the VM could be moved across hypervisors or CPUs which would suddenly change the value of the counter. The latter could be mitigated by syncing the TSCs across cores (did I mention that they are writable?) and XEN supports emulating the RDTSC instruction too. I'm not sure how it's configured on AWS, so it may be perfectly safe or mostly safe.

Wasn't a workaround posted for this some time ago, that requires setting the TZ environment variable?


It seems very closely related, unless I am mistaken.

You are not mistaken in that the topics are (somewhat) related, they all have to do with time. But setting the TZ environment variable doesn't mean your programs don't execute the syscalls discussed in this article.

This is about the speed of execution of the mentioned syscalls, which will be called regardless of the TZ environment variable, and how vDSO changes that. However, by setting the TZ environment variable you can avoid an additional call to stat to as it tries to determine if /etc/localtime exists.

I wonder why the blog post claims setting clock source to 'tsc' is considered dangerous.

Because if the clock rate changes, tsc can become out of sync.


Not really. Recent CPUs (at least those from Intel, which is what EC2 runs on) implement constant_tsc, so the frequency does not affect the tsc.

A worse issue is that the counters may not be synchronized between cpus, which may be an issue when the process moves between sockets.

But I wouldn't call that "dangerous", it's simply a feature of the clock source. If that's an issue for your program, you should use CLOCK_MONOTONIC anyway and not rely on gettimeofday() doing the right thing.

how does constant_tsc interact with VMs being silently migrated from one physical machine to another?

EC2 doesn't do any kind of live machine migration. The only times a machine may start on a different host is if it is stopped and then started. Even reboots don't allow them to move.

You see this a lot when AWS lets you know about maintenance on a physical host and gives you the option to avoid the automated move by doing these steps manually at a time of your choosing before the maintenance window.

Not sure, but it can't be better than moving processes between CPUs I guess. Also, does EC2 silently move VMs like this?

Even without migration, the synchronization can be an issue. In older multi-core machines, tsc synchronization was an issue among cores. Modern systems take care of this. And core CPU clock frequency change is also taken care of, so that constant rate is available via tsc. However, when hypervisors such as VMWare or paravirtualization like Xen come into play, there are further issues, because RDTSC instruction either has to be passed through to physical hardware or emulated via a trap. When emulated a number of considerations come into play. Xen actually has PVRDTSC features that are normally not used but can be effective in paravirtual environments. The gettimeofday() syscalls (and clock_gettime) are liberally used in too many lines of existing software. Their use is very prevalent due to historical reasons as well as many others. One reason is that the calls are deceptively "atomic" or "isolated" or "self-contained" in their appearance and usage. So liberal use is common. A lot of issues come about due to their use, especially in time sensitive applications (e.g. WAN optimization). This is especially true in virtual environments. There are complex issues described elsewhere that are kind of fun to read. https://www.vmware.com/pdf/vmware_timekeeping.pdf and https://xenbits.xen.org/docs/4.3-testing/misc/tscmode.txt. The issue becomes even more complex in distributed systems. Beyond NTP. Some systems like erlang has some provisions, like http://erlang.org/doc/apps/erts/time_correction.html#OS_Syst.... Other systems use virtual vector clocks. And some systems, like google TrueTime as used in Spanner, synchronize using GPS atomic clocks. The satellite GPS pulses are commonly used in trading floors and HFT software. This is a very interesting area of study.

It's complex stuff, no doubt about that.

For me, it's much simpler - I come from the PostgreSQL world, so gettimeofday() is pretty much what EXPLAIN ANALYZE does to instrument queries. Good time source means small overhead, bad time source means instrumented queries may take multiples of actual run time (and be skewed in various ways). No fun.

It is complex and interesting. I am a novice database user. But I do know many databases use 'gettimeofday' quite a lot. Just strace any SELECT query. Most databases I have used, including Postgresql, also have to implement MVCC which mostly depend on timestamps. Imagine the hypervisor CPU and memory pressure induced time drift, or even drift in distributed cluster of database nodes. It hurts my head to think of the cases that will give me the wrong values or wrong estimate for getting the values. It is an interesting area.

MVCC has nothing to do with timestamps, particularly not with timestamps generated from gettimeofday(), but with XIDs which you might imagine as a monotonous sequence of integers, assigned at the start of a transaction. You might call that a timestamp, but the trouble is that what matters is commit order, and the XID has nothing to do with that. Which is why the MVCC requires 'snapshots' - a list of transactions that are in progress.

We don't really know what EC2 does or precisely the type of hardware your VM will be spun up on. I've erred on the side of being cautious due to the vast amount of work being invested in timekeeping in various hypervisors. If EC2 knows that the TSC clocksource is safe on all of its hardware, perhaps modifying the Amazon Linux AMI to set TSC as the default clocksource would reassure many folks, myself included.

Advanced users that can run their own analysis or who have applications which would withstand potential time warps, are of course, free to ignore my warning at their own risk ;)

How common are get time calls so that they would actually be an issue?

I've worked on quite a few systems and can't think of a time where an api for getting the time would have been called so much that it would affect performance?

Timestamped logs, transaction timeouts, http keepalive timeouts, cache expiration/eviction, etc.

Apache and nginx for example, both call gettimeofday() a lot.

Edit: Quick google searches indicate software like redis and memcached also call it quite often.

So does cassandra.

Any application code that includes logging of any sort is going to grab time. All of my code (quite a private set and scientific in nature) calls time() at critical points with identifiers so I can easily investigate issues.

OpenJDK has an open issue about this in their JVM: https://bugs.openjdk.java.net/browse/JDK-8165437

> All programmers deploying software to production environments should regularly strace their applications in development mode and question all output they find.

Or, instead, you could just not do that. Then you could go back to being productive, instead of wasting time tracking down unstable small tweaks for edge cases that you can barely notice after looping the same syscall 5 million times in a row.

When will people learn not to micro-optimize?

Crapulent and without merit.

Just curious to know the status on Azure;


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact