

"A sleep(1) is put into all SSL_read() and SSL_write() calls..." - shmeedogg
http://bugs.python.org/issue9075

======
kqr2
It looks like someone forgot to remove the "speed-up loop".

<http://thedailywtf.com/Articles/The-Speedup-Loop.aspx>

~~~
fourneau
That's both hilarious and frightening all at once. Luckily static analysis
tools catch that kind of stuff very easily now...

~~~
dschobel
Additionally I think you'd be hard pressed to find a modern compiler (for some
loose definition of modern) which would not optimize that loop away.

~~~
someone_here
At that point they would just turn _on_ the optimizations as an "insurance".

~~~
eru
Don't you mean off?

------
dredge
As the bug report mentions, the sleep(1) call appears to be wrapped in a
#define that (I assume) won't be active in normal OpenSSL builds. At least in
v0.9.8o (the only one I checked) the only references I see are:

ssl/s2_pkt.c:

    
    
        #ifdef PKT_DEBUG
            if (s->debug & 0x01) sleep(1);
        #endif
    

There are two references like that to PKT_DEBUG (read and write); the only
other is:

ssl/ssl_locl.h:

    
    
        /*#define PKT_DEBUG 1   */
    

I suspect this is a non-issue. Interesting though.

------
hartror
Wow I wonder how many people gave up using the ssl lib as too slow due to this
bug?

------
po
I bet removing this causes something somewhere to break.

~~~
typedef_void
is sleep(1) noisy?

if so, it might be vs timing attacks

~~~
jakevoytko
Sleep(1) is _very_ noisy on Windows machines. The time slice given by default
is ~15-20ms [1], and calling Sleep(<15) relinquishes the rest of the time
slice. Windows has a multimedia API that can be used to get intervals down to
1ms, but it requires system calls that increase your system load. So you
usually just need to use proper synchronization anyways, which is what the
Python guys should have done :)

This crummy Sleep() implementation has some nice effects on programmers. Those
who like to solve problems with lots of copy/paste code are forced to think
about using proper synchronization primitives when running high resolution
loops that wait for events, or their code just won't run very fast.

[1] [http://social.msdn.microsoft.com/forums/en-
US/clr/thread/fac...](http://social.msdn.microsoft.com/forums/en-
US/clr/thread/facc2b57-9a27-4049-bb32-ef093fbf4c29/)

~~~
Xurinos
When I did driver programming in Windows, it was well-known that Sleep had a
resolution of 10 ms; it is based on the interrupt timer (not the high
frequency timer). You could change the interrupt timer's duration, but its
ticks are what guide Sleep. Not counting the effect of context switching,
since you are waiting for the timer ticks, your actual times vary from 10 ms
to 19.9999 ms. 15 ms is a nice way to say "on average", but I would not rely
on that measure.

Timers are hard to get right. Tread warily, programmers! This is one of those
areas where it is good to understand some things about the computer hardware
behind the software.

EDIT: I should add that the high frequency timer is not a panacea either. It
will work for you most of the time, but there are two circumstances that will
occasionally trip you:

(1) At least in Windows XP and 2000, there is a KB (I do not remember it now)
that explains that for a certain small set of manufacturers, if there is a
sudden spike in high load on the PCI bus, the high frequencer timer will be
jumped ahead to account for the lost time during that laggy spike. This
correction is not accurate. This means that if your initial timestamp is X,
and you are waiting for it to be X+Y, wall clock time may be between X and
X+Y, but Windows itself altered the timestamp to be X+Y+Z, and your software
thinks the time has elapsed. I personally experienced this bug.

(2) You actually have more than one high frequency timer -- one for each CPU
on your system. Once you start playing on a system with multiple CPUs, how do
you guarantee that the API is providing you a timestamp from the same timer? I
remember there may have been way to choose if you dropped to assembly to make
the query but that the API at the time did not support a choice. The timer
starts upon power-up. If one CPU powers up after the other, you will have a
timestamp skew. Some high frequency software algorithms attempt to correct for
this skew. I do not know all the details to that now.

~~~
tptacek
Presumably the high frequency counter is driven off the cycle counter, and not
the Intel High Precision HPET Timers.

~~~
Xurinos
<http://en.wikipedia.org/wiki/Time_Stamp_Counter>

"The issue has two components: rate of tick and whether all cores (processors)
have identical values in their time-keeping registers. There is no promise
that the timestamp counters of multiple CPUs on a single motherboard will be
synchronized. In such cases, programmers can only get reliable results by
locking their code to a single CPU."

The entry also mentions that hibernation can affect the counters. I wonder if
power savings implementations that speed up or slow the CPU could also have an
effect.

~~~
tptacek
Yes. The TSC, which counts cycles, is what WinAPI reads with GetTickCount(),
and is not the same thing as the HPET timers.

------
timf
A second long sleep on every read or write? If this was actually happening, it
sounds like it could create unheard of performance issues for any significant
transfer.

Or was this not noticed because all the major frameworks like cherrypy and
twisted are still using the pyopenssl wrapper?

Is there any evidence that this bugfix actually changes the performance?

~~~
MrRage
Don't you mean a millisecond long sleep? I never saw a sleep function that
interpreted its parameter as seconds. Edit: Well, now I know better.

~~~
kingkilr
It does on Linux, from `man 3 sleep`:

<http://paste.pocoo.org/show/229678/>

~~~
timf
This is what I was going on, I checked that man page, incredulously...

------
Aaronontheweb
I love Ben's response in the discussion prior to the SVN check-in.

------
jamesseda
I have been working on a replay tool in python. I had to add sleeps because it
was sending packets faster than the NIC could send them in order.

~~~
ars
That's not a reliable way to do that. It should block.

~~~
jamesseda
it is actually more reliable to sleep than to block. by definition blocking is
unreliable because you don't know exactly when it will unblock. You do know
when a sleep will end though. I also want a variable delay between writes.

~~~
ars
Uhhh. I think you need to study this topic some more.

> it is actually more reliable to sleep than to block. by definition blocking
> is unreliable because you don't know exactly when it will unblock.

A block will end when the nic can handle more data. You can't just wait a
second and assume the nic can handle the data. That's where the "unreliable"
part comes in. You assume it can handle the data, but you are not checking.
And the way to check is either by polling, blocking, or receiving a signal.
Waiting is not a way to check.

> You do know when a sleep will end though.

It makes no difference that you know when the sleep will end. It's irrelevant
- all you care about is can the nic accept more data or not.

> I also want a variable delay between writes.

If you want variable delays then do that, but that has nothing whatsoever to
do with making sure the nic doesn't lose data.

~~~
jamesseda
cool, I'll check the nic status before sending if I ever want to maximize
throughput. But I'll probably rewrite it in C at that point too.

~~~
tptacek
"Check the NIC status"? What network API are you using? Even if you're
injecting with libpcap, you can still just select() on the device handle.

~~~
jamesseda
I am not using an API, I am writing to the driver

~~~
tptacek
Just use Winpcap instead of wasting time reinventing this wheel.

