"A sleep(1) is put into all SSL_read() and SSL_write() calls..."

kqr2 · on June 25, 2010

It looks like someone forgot to remove the "speed-up loop".

http://thedailywtf.com/Articles/The-Speedup-Loop.aspx

fourneau · on June 25, 2010

That's both hilarious and frightening all at once. Luckily static analysis tools catch that kind of stuff very easily now...

dschobel · on June 25, 2010

Additionally I think you'd be hard pressed to find a modern compiler (for some loose definition of modern) which would not optimize that loop away.

someone_here · on June 25, 2010

At that point they would just turn on the optimizations as an "insurance".

eru · on June 25, 2010

Don't you mean off?

sprout · on June 25, 2010

Not in the cloud they don't.

dredge · on June 25, 2010

As the bug report mentions, the sleep(1) call appears to be wrapped in a #define that (I assume) won't be active in normal OpenSSL builds. At least in v0.9.8o (the only one I checked) the only references I see are:

ssl/s2_pkt.c:

    #ifdef PKT_DEBUG
        if (s->debug & 0x01) sleep(1);
    #endif

There are two references like that to PKT_DEBUG (read and write); the only other is:

ssl/ssl_locl.h:

    /*#define PKT_DEBUG 1   */

I suspect this is a non-issue. Interesting though.

hartror · on June 25, 2010

Wow I wonder how many people gave up using the ssl lib as too slow due to this bug?

po · on June 25, 2010

I bet removing this causes something somewhere to break.

typedef_void · on June 25, 2010

is sleep(1) noisy?

if so, it might be vs timing attacks

jakevoytko · on June 25, 2010

Sleep(1) is very noisy on Windows machines. The time slice given by default is ~15-20ms [1], and calling Sleep(<15) relinquishes the rest of the time slice. Windows has a multimedia API that can be used to get intervals down to 1ms, but it requires system calls that increase your system load. So you usually just need to use proper synchronization anyways, which is what the Python guys should have done :)

This crummy Sleep() implementation has some nice effects on programmers. Those who like to solve problems with lots of copy/paste code are forced to think about using proper synchronization primitives when running high resolution loops that wait for events, or their code just won't run very fast.

[1] http://social.msdn.microsoft.com/forums/en-US/clr/thread/fac...

bd · on June 25, 2010

Aha, so that's probably why JavaScript timers on Windows have around 15ms accuracy:

http://ejohn.org/blog/accuracy-of-javascript-time/

drawkbox · on June 25, 2010

True however Sleep on windows is different than sleep on posix/unix.

Sleep() on windows takes ms.

sleep() on nix takes seconds.

Windows:

    VOID WINAPI Sleep(
      __in  DWORD dwMilliseconds
    );

Sleep(1) is as fast as it can go which turns out to be 15-20ms.

nix:

    #include <unistd.h>
    unsigned int
    sleep(unsigned int seconds);

usleep() can be used for more granular delay on nix.

Xurinos · on June 25, 2010

When I did driver programming in Windows, it was well-known that Sleep had a resolution of 10 ms; it is based on the interrupt timer (not the high frequency timer). You could change the interrupt timer's duration, but its ticks are what guide Sleep. Not counting the effect of context switching, since you are waiting for the timer ticks, your actual times vary from 10 ms to 19.9999 ms. 15 ms is a nice way to say "on average", but I would not rely on that measure.

Timers are hard to get right. Tread warily, programmers! This is one of those areas where it is good to understand some things about the computer hardware behind the software.

EDIT: I should add that the high frequency timer is not a panacea either. It will work for you most of the time, but there are two circumstances that will occasionally trip you:

(1) At least in Windows XP and 2000, there is a KB (I do not remember it now) that explains that for a certain small set of manufacturers, if there is a sudden spike in high load on the PCI bus, the high frequencer timer will be jumped ahead to account for the lost time during that laggy spike. This correction is not accurate. This means that if your initial timestamp is X, and you are waiting for it to be X+Y, wall clock time may be between X and X+Y, but Windows itself altered the timestamp to be X+Y+Z, and your software thinks the time has elapsed. I personally experienced this bug.

(2) You actually have more than one high frequency timer -- one for each CPU on your system. Once you start playing on a system with multiple CPUs, how do you guarantee that the API is providing you a timestamp from the same timer? I remember there may have been way to choose if you dropped to assembly to make the query but that the API at the time did not support a choice. The timer starts upon power-up. If one CPU powers up after the other, you will have a timestamp skew. Some high frequency software algorithms attempt to correct for this skew. I do not know all the details to that now.

tptacek · on June 25, 2010

Presumably the high frequency counter is driven off the cycle counter, and not the Intel High Precision HPET Timers.

Xurinos · on June 25, 2010

http://en.wikipedia.org/wiki/Time_Stamp_Counter

"The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU."

The entry also mentions that hibernation can affect the counters. I wonder if power savings implementations that speed up or slow the CPU could also have an effect.

tptacek · on June 25, 2010

Yes. The TSC, which counts cycles, is what WinAPI reads with GetTickCount(), and is not the same thing as the HPET timers.

tptacek · on June 25, 2010

This has nothing to do with timing attacks and would do nothing to defend against them. Read 'daeken's comment below, though.

igravious · on June 25, 2010

Excuse my ignorance: what do you mean by noisy?

mustpax · on June 25, 2010

Noisy means there is a lot of variance in the actual time the process spends sleeping. When you say sleep(1) most OSes interpret that as saying, sleep as short as you can. Based on the scheduler internals, that can vary a lot.

tptacek · on June 25, 2010

Which OS interprets sleep(1) (ie, "sleep for 1 second") as "sleep for as short as you can"?

On WinAPI, Sleep is denominated in milliseconds.

On BSD, sleep(3) is a library wrapper around nanosleep(2).

Linux's man pages make no mention of the magic number "1" as a "sleep 1 timeslice" shortcut; also, older Linux man pages warn that sleep(3) can be implemented in terms of alarm(1), which is used all over POSIX as an I/O timeout and would blow up the world if it alarmed in milliseconds.

If you want to sleep "as short as you can", sleep for 0 seconds, or call any other system call to yield your process back to the scheduler.

anaisbetts · on June 25, 2010

Windows also only guarantees that your process will sleep at least as long as you specify. Not that it will sleep exactly as long.

mustpax · on June 25, 2010

Thanks for the correction. I was just talking about the de facto behavior I have seen on Linux and BSD for very short sleep intervals (way shorter than 1 second), not necessarily about the behavior as specified by the system call. I should have been clearer.

tptacek · on June 25, 2010

I really don't think you've ever seen BSD return in milliseconds after a 1-second sleep. Respectfully, I think you're pretty much just wrong.

mustpax · on June 25, 2010

Well, that is not what I meant at all, I meant that I've seen BSD return in say 2 microseconds after calling usleep(1).

But that's usleep not sleep, which is the inaccuracy I was admitting to in the first place.

tptacek · on June 25, 2010

Ok, but (a) this article is talking about literally POSIX sleep(3), and (b) there is a ton of confusion on this thread about whether sleep is ms-denominated or seconds-denominated.

Sorry to pile on you, though.

Dylan16807 · on June 25, 2010

Read it again. He's talking about near-zero sleep calls, not one second sleep calls.

tptacek · on June 25, 2010

POSIX sleep() doesn't take subsecond intervals. Maybe he's talking about usleep()?

ComputerGuru · on June 25, 2010

Sleep is noisy.

When you do sleep, depending on the hardware, the OS, the configuration, the kernel flags, etc. the minimum you actually get is around 38.

But that varies.

tptacek · on June 25, 2010

Wait, what? If I call sleep(1), you're saying it's going to sleep for THIRTY EIGHT SECONDS?

People, it's right there in the man pages.

Are you maybe thinking about WinAPI's Sleep? That's ms-denominated. It would make sense that attempting to sleep for 1 millisecond wouldn't work, and would build in the time for the scheduler and the timeslices for every other process. We're talking about OpenSSL and POSIX sleep(3).

ComputerGuru · on June 25, 2010

My apologies. Yes, I do mean MS - I started off doing *nix development, but now am doing Win32 programming 18 hours a day.

Please discard my above comment, everyone.

typedef_void · on June 25, 2010

on second thought, my comment makes little sense,

if they wanted noisy sleep, it should be something like

sleep(func(rand()))

JoachimSchipper · on June 25, 2010

That makes no sense either, an attacker can usually average such things out. Besides, there are better (faster) ways to guard against timing attacks.

daeken · on June 25, 2010

People seem to have a skewed perception of how to defeat timing attacks, generally. At the end of the day, it's more about making things constant time than trying to make the timing difficult to detect. Simple example:

You have two hashes and want to see if they're equal. The naive approach is to iterate over each byte in both hashes and compare them, then break when you find a byte that doesn't match. That approach, however, could be vulnerable to a timing attack because you could potentially measure how many times it iterates. An implementation that's resistant to timing attacks could XOR each byte of each hash and accumulate across them; if that accumulator is zero at the end of the loop, it's equal. That approach is constant time, rather than being dependent on the data you're dealing with.

fexl · on June 25, 2010

Besides, I see no evidence that the sleep was intended to thwart timing attacks. It looks like it's all about easing the debug process somehow.

Incidentally, the primary problem here is not the mere presence of a debug flag that governs a sleep, it's the fact that PySSL_SSLdo_handshake sets that debug flag. Right?

In other words, it's not a bug in OpenSSL itself, but rather the Python wrapper for OpenSSL. That's how I understand it.

JoachimSchipper · on June 25, 2010

Yes, exactly.

timf · on June 25, 2010

A second long sleep on every read or write? If this was actually happening, it sounds like it could create unheard of performance issues for any significant transfer.

Or was this not noticed because all the major frameworks like cherrypy and twisted are still using the pyopenssl wrapper?

Is there any evidence that this bugfix actually changes the performance?

MrRage · on June 25, 2010

Don't you mean a millisecond long sleep? I never saw a sleep function that interpreted its parameter as seconds. Edit: Well, now I know better.

dmm · on June 25, 2010

Both the POSIX sleep() program and the sleep(unsigned int seconds) function in unistd.h interpret the argument as seconds.

kingkilr · on June 25, 2010

It does on Linux, from `man 3 sleep`:

http://paste.pocoo.org/show/229678/

timf · on June 25, 2010

This is what I was going on, I checked that man page, incredulously...

Aaronontheweb · on June 25, 2010

I love Ben's response in the discussion prior to the SVN check-in.

jamesseda · on June 25, 2010

I have been working on a replay tool in python. I had to add sleeps because it was sending packets faster than the NIC could send them in order.

ars · on June 25, 2010

That's not a reliable way to do that. It should block.

jamesseda · on June 25, 2010

it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock. You do know when a sleep will end though. I also want a variable delay between writes.

ars · on June 25, 2010

Uhhh. I think you need to study this topic some more.

> it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock.

A block will end when the nic can handle more data. You can't just wait a second and assume the nic can handle the data. That's where the "unreliable" part comes in. You assume it can handle the data, but you are not checking. And the way to check is either by polling, blocking, or receiving a signal. Waiting is not a way to check.

> You do know when a sleep will end though.

It makes no difference that you know when the sleep will end. It's irrelevant - all you care about is can the nic accept more data or not.

> I also want a variable delay between writes.

If you want variable delays then do that, but that has nothing whatsoever to do with making sure the nic doesn't lose data.

jamesseda · on June 25, 2010

cool, I'll check the nic status before sending if I ever want to maximize throughput. But I'll probably rewrite it in C at that point too.

tptacek · on June 25, 2010

"Check the NIC status"? What network API are you using? Even if you're injecting with libpcap, you can still just select() on the device handle.

jamesseda · on June 25, 2010

I am not using an API, I am writing to the driver

tptacek · on June 25, 2010

Just use Winpcap instead of wasting time reinventing this wheel.