Hacker News new | comments | show | ask | jobs | submit login
"A sleep(1) is put into all SSL_read() and SSL_write() calls..." (python.org)
105 points by shmeedogg 2464 days ago | hide | past | web | 50 comments | favorite



It looks like someone forgot to remove the "speed-up loop".

http://thedailywtf.com/Articles/The-Speedup-Loop.aspx


That's both hilarious and frightening all at once. Luckily static analysis tools catch that kind of stuff very easily now...


Additionally I think you'd be hard pressed to find a modern compiler (for some loose definition of modern) which would not optimize that loop away.


At that point they would just turn on the optimizations as an "insurance".


Don't you mean off?


Not in the cloud they don't.


As the bug report mentions, the sleep(1) call appears to be wrapped in a #define that (I assume) won't be active in normal OpenSSL builds. At least in v0.9.8o (the only one I checked) the only references I see are:

ssl/s2_pkt.c:

    #ifdef PKT_DEBUG
        if (s->debug & 0x01) sleep(1);
    #endif
There are two references like that to PKT_DEBUG (read and write); the only other is:

ssl/ssl_locl.h:

    /*#define PKT_DEBUG 1   */
I suspect this is a non-issue. Interesting though.


Wow I wonder how many people gave up using the ssl lib as too slow due to this bug?


I bet removing this causes something somewhere to break.


is sleep(1) noisy?

if so, it might be vs timing attacks


Sleep(1) is very noisy on Windows machines. The time slice given by default is ~15-20ms [1], and calling Sleep(<15) relinquishes the rest of the time slice. Windows has a multimedia API that can be used to get intervals down to 1ms, but it requires system calls that increase your system load. So you usually just need to use proper synchronization anyways, which is what the Python guys should have done :)

This crummy Sleep() implementation has some nice effects on programmers. Those who like to solve problems with lots of copy/paste code are forced to think about using proper synchronization primitives when running high resolution loops that wait for events, or their code just won't run very fast.

[1] http://social.msdn.microsoft.com/forums/en-US/clr/thread/fac...


Aha, so that's probably why JavaScript timers on Windows have around 15ms accuracy:

http://ejohn.org/blog/accuracy-of-javascript-time/


True however Sleep on windows is different than sleep on posix/unix.

Sleep() on windows takes ms.

sleep() on nix takes seconds.

Windows:

    VOID WINAPI Sleep(
      __in  DWORD dwMilliseconds
    );
Sleep(1) is as fast as it can go which turns out to be 15-20ms.

nix:

    #include <unistd.h>
    unsigned int
    sleep(unsigned int seconds);
usleep() can be used for more granular delay on nix.


When I did driver programming in Windows, it was well-known that Sleep had a resolution of 10 ms; it is based on the interrupt timer (not the high frequency timer). You could change the interrupt timer's duration, but its ticks are what guide Sleep. Not counting the effect of context switching, since you are waiting for the timer ticks, your actual times vary from 10 ms to 19.9999 ms. 15 ms is a nice way to say "on average", but I would not rely on that measure.

Timers are hard to get right. Tread warily, programmers! This is one of those areas where it is good to understand some things about the computer hardware behind the software.

EDIT: I should add that the high frequency timer is not a panacea either. It will work for you most of the time, but there are two circumstances that will occasionally trip you:

(1) At least in Windows XP and 2000, there is a KB (I do not remember it now) that explains that for a certain small set of manufacturers, if there is a sudden spike in high load on the PCI bus, the high frequencer timer will be jumped ahead to account for the lost time during that laggy spike. This correction is not accurate. This means that if your initial timestamp is X, and you are waiting for it to be X+Y, wall clock time may be between X and X+Y, but Windows itself altered the timestamp to be X+Y+Z, and your software thinks the time has elapsed. I personally experienced this bug.

(2) You actually have more than one high frequency timer -- one for each CPU on your system. Once you start playing on a system with multiple CPUs, how do you guarantee that the API is providing you a timestamp from the same timer? I remember there may have been way to choose if you dropped to assembly to make the query but that the API at the time did not support a choice. The timer starts upon power-up. If one CPU powers up after the other, you will have a timestamp skew. Some high frequency software algorithms attempt to correct for this skew. I do not know all the details to that now.


Presumably the high frequency counter is driven off the cycle counter, and not the Intel High Precision HPET Timers.


http://en.wikipedia.org/wiki/Time_Stamp_Counter

"The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU."

The entry also mentions that hibernation can affect the counters. I wonder if power savings implementations that speed up or slow the CPU could also have an effect.


Yes. The TSC, which counts cycles, is what WinAPI reads with GetTickCount(), and is not the same thing as the HPET timers.


This has nothing to do with timing attacks and would do nothing to defend against them. Read 'daeken's comment below, though.


Excuse my ignorance: what do you mean by noisy?


Noisy means there is a lot of variance in the actual time the process spends sleeping. When you say sleep(1) most OSes interpret that as saying, sleep as short as you can. Based on the scheduler internals, that can vary a lot.


Which OS interprets sleep(1) (ie, "sleep for 1 second") as "sleep for as short as you can"?

On WinAPI, Sleep is denominated in milliseconds.

On BSD, sleep(3) is a library wrapper around nanosleep(2).

Linux's man pages make no mention of the magic number "1" as a "sleep 1 timeslice" shortcut; also, older Linux man pages warn that sleep(3) can be implemented in terms of alarm(1), which is used all over POSIX as an I/O timeout and would blow up the world if it alarmed in milliseconds.

If you want to sleep "as short as you can", sleep for 0 seconds, or call any other system call to yield your process back to the scheduler.


Windows also only guarantees that your process will sleep at least as long as you specify. Not that it will sleep exactly as long.


Thanks for the correction. I was just talking about the de facto behavior I have seen on Linux and BSD for very short sleep intervals (way shorter than 1 second), not necessarily about the behavior as specified by the system call. I should have been clearer.


I really don't think you've ever seen BSD return in milliseconds after a 1-second sleep. Respectfully, I think you're pretty much just wrong.


Well, that is not what I meant at all, I meant that I've seen BSD return in say 2 microseconds after calling usleep(1).

But that's usleep not sleep, which is the inaccuracy I was admitting to in the first place.


Ok, but (a) this article is talking about literally POSIX sleep(3), and (b) there is a ton of confusion on this thread about whether sleep is ms-denominated or seconds-denominated.

Sorry to pile on you, though.


Read it again. He's talking about near-zero sleep calls, not one second sleep calls.


POSIX sleep() doesn't take subsecond intervals. Maybe he's talking about usleep()?


Sleep is noisy.

When you do sleep, depending on the hardware, the OS, the configuration, the kernel flags, etc. the minimum you actually get is around 38.

But that varies.


Wait, what? If I call sleep(1), you're saying it's going to sleep for THIRTY EIGHT SECONDS?

People, it's right there in the man pages.

Are you maybe thinking about WinAPI's Sleep? That's ms-denominated. It would make sense that attempting to sleep for 1 millisecond wouldn't work, and would build in the time for the scheduler and the timeslices for every other process. We're talking about OpenSSL and POSIX sleep(3).


My apologies. Yes, I do mean MS - I started off doing *nix development, but now am doing Win32 programming 18 hours a day.

Please discard my above comment, everyone.


on second thought, my comment makes little sense,

if they wanted noisy sleep, it should be something like

sleep(func(rand()))


That makes no sense either, an attacker can usually average such things out. Besides, there are better (faster) ways to guard against timing attacks.


People seem to have a skewed perception of how to defeat timing attacks, generally. At the end of the day, it's more about making things constant time than trying to make the timing difficult to detect. Simple example:

You have two hashes and want to see if they're equal. The naive approach is to iterate over each byte in both hashes and compare them, then break when you find a byte that doesn't match. That approach, however, could be vulnerable to a timing attack because you could potentially measure how many times it iterates. An implementation that's resistant to timing attacks could XOR each byte of each hash and accumulate across them; if that accumulator is zero at the end of the loop, it's equal. That approach is constant time, rather than being dependent on the data you're dealing with.


Besides, I see no evidence that the sleep was intended to thwart timing attacks. It looks like it's all about easing the debug process somehow.

Incidentally, the primary problem here is not the mere presence of a debug flag that governs a sleep, it's the fact that PySSL_SSLdo_handshake sets that debug flag. Right?

In other words, it's not a bug in OpenSSL itself, but rather the Python wrapper for OpenSSL. That's how I understand it.


Yes, exactly.


A second long sleep on every read or write? If this was actually happening, it sounds like it could create unheard of performance issues for any significant transfer.

Or was this not noticed because all the major frameworks like cherrypy and twisted are still using the pyopenssl wrapper?

Is there any evidence that this bugfix actually changes the performance?


Don't you mean a millisecond long sleep? I never saw a sleep function that interpreted its parameter as seconds. Edit: Well, now I know better.


Both the POSIX sleep() program and the sleep(unsigned int seconds) function in unistd.h interpret the argument as seconds.


It does on Linux, from `man 3 sleep`:

http://paste.pocoo.org/show/229678/


This is what I was going on, I checked that man page, incredulously...


I love Ben's response in the discussion prior to the SVN check-in.


I have been working on a replay tool in python. I had to add sleeps because it was sending packets faster than the NIC could send them in order.


That's not a reliable way to do that. It should block.


it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock. You do know when a sleep will end though. I also want a variable delay between writes.


Uhhh. I think you need to study this topic some more.

> it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock.

A block will end when the nic can handle more data. You can't just wait a second and assume the nic can handle the data. That's where the "unreliable" part comes in. You assume it can handle the data, but you are not checking. And the way to check is either by polling, blocking, or receiving a signal. Waiting is not a way to check.

> You do know when a sleep will end though.

It makes no difference that you know when the sleep will end. It's irrelevant - all you care about is can the nic accept more data or not.

> I also want a variable delay between writes.

If you want variable delays then do that, but that has nothing whatsoever to do with making sure the nic doesn't lose data.


cool, I'll check the nic status before sending if I ever want to maximize throughput. But I'll probably rewrite it in C at that point too.


"Check the NIC status"? What network API are you using? Even if you're injecting with libpcap, you can still just select() on the device handle.


I am not using an API, I am writing to the driver


Just use Winpcap instead of wasting time reinventing this wheel.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: