

Is libcURL slower than hand creating socket requests in C? - konceptz
http://www.konceptz.com/2014/05/why-you-shouldnt-use-libcurl.html

======
twic
_I still haven 't answered a very important question. Why is libcUrl so much
slower?_

I would be very hesitant to be giving this kind of advice until i had an
answer to that.

~~~
kelnos
Agreed. libcurl may be broken, or it may be that there's a some default that's
been compiled in that does weird things like try IPv6 DNS or... something
weird. Could be that a simple options flag when initializing the library might
fix the problem, and you have the benefit of using a well-used and well-tested
library rather than your own likely-buggy code.

------
_wmd
Far too little detail in this post.. e.g. how was the application handling the
curl callbacks? I fixed one app that used repeated strcat() calls (which was
never correct.. it was also handling binary data)

Would be nice if OP profiled his new code vs. his old code, though I'm
guessing libcurl isn't the issue even without seeing profiling results

~~~
sneak
Agree, but it's still a bit of useful data, even if more would have been more
useful.

------
aegiso
A curious result, but the conclusion is ridiculous. I can think of a handful
of explanations for this result, none of which would implicate libcurl.

-Incomparable harnesses

-Misuse of the library's api

-Build switches

-UA sniffing

-Not measuring or controlling for system/net load

This isn't a post about why you shouldn't use libcurl; it's a post about why
you shouldn't benchmark your way to blind conclusions.

~~~
konceptz
Author here:

You're right about the conclusion, I'll reword the title.

As far as why, my calls are certainly in the post somewhat buried in the
github code linked to directly on the post.

Certainly more attention should be paid to how testing was done, I'll update
that later tonight.

~~~
hobofan
If you do benchmarks you should also keep in mind that EC2 micro instances are
terrible because they have highly variable CPU speeds which will be greatly
reduced if the CPU load is high for a short time.

~~~
konceptz
This is very interesting, it's the first I've heard of this but I have
experienced some issues implementing pthreads. This could be a reason.

Do you know where I might find some documentation on this?

~~~
hobofan
There is only a small part on the EC2 website, where Amazon is mentioning it,
as far as I know. However there are some blog posts about it:
[http://blogs.adobe.com/digitalmedia/2011/02/amazon-
ec2-micro...](http://blogs.adobe.com/digitalmedia/2011/02/amazon-ec2-micro-
instance-and-stolen-cpu/) [http://hughht5.blogspot.de/2012/05/amazon-
ec2-micro-instance...](http://hughht5.blogspot.de/2012/05/amazon-ec2-micro-
instance-and-stolen.html?m=1)

------
pipeep
Given the final paragraph:

> I still haven't answered a very important question. Why is libcUrl so much
> slower?

I'd assume it's likely a result of misuse of libcUrl, or cUrl properly
implementing some part of the spec that this hand-rolled implementation
ignores.

On top of this, the author's C code isn't very well written. There's use of
`sprintf` without arithmetic bounds checks (really he should use `snprintf`),
unnecessary construction of a one-character array (that's what a dereference
is for), inconsistent whitespace, `malloc` when a stack allocation would make
more sense, use of `unsigned char` instead of `char` for strings, etc.

From what I can tell, the libcUrl guys are serious about performance, and I
have trouble believing such a wild allegation without any further analysis.

~~~
konceptz
>I'd assume it's likely a result of misuse of libcUrl, or cUrl properly
implementing some part of the spec that this hand-rolled implementation
ignores.

Author again. I have my code in the post, it would be helpful for me to see
what mistakes I've made in libcUrl utilization.

~~~
pipeep
I'm not familiar with libcUrl, so I can't say what you missed, but I am
suspicious. Do however look at my complaints about your code. I understand
that writing good safe C code is hard (I've written plenty of bad C code
myself), but that's one reason why you should use cUrl. Any code you write
yourself is another liability.

~~~
konceptz
I will certainly apply your suggestions to the code. Thanks

------
carloscm
Smells of a DNS cache issue. Maybe libcurl is using its own resolver? I know
at least it has its own in-process DNS cache that for example may not be
activated by default with the easy API. Or that it can be compiled with c-ares
support, etc.

~~~
konceptz
I wasn't aware of this, thank you I'll check it out!

------
ddebernardy
Would it not be more likely that libcurl is battle-tested and adds layers upon
layers of validation, correction, ssl support, cookie management and what not
that hand-crafted requests don't?

------
notacoward
I recently tested some software that used libcurl, and it was also
horrendously slow. I don't remember libcurl being so slow when I used it for a
project a few years ago. It makes me wonder whether some more recent change
made things a lot worse, at least for some configurations. Maybe it's DNS-
related, as some have suggested. This is probably neither experimenter error
nor the last word on the subject. More likely, it's a strong signal that
there's something here worth investigating.

------
sitkack
I had to search on hn.aloglia because this ridiculously bad science got
rightfully killed.

Did the OP even test with ruby/python/wget/curl ? Could be a DNS issue and
nothing to do with libcurl.

Why spend all that time and not run `gprof` ?!

~~~
Restful
Looks like you're confusing things. While the article wasn't well done testing
with your suggestions adds nothing more than unrelated , in the case of wget
especially, background noise.

~~~
sitkack
It does in terms of narrowing down why libcurl could be slow. Are you
presupposing it is slow for some reason? Nearly all http libraries should be
running at wire speed which is << native | interpreted.

Showing curl and wget from the command would have been highly beneficial.
Would you not have tested that before writing an HTTP client in C?

~~~
Restful
I see your point here for some baseline but I'm opposed to the comment about
adding implementation of every library under the sun.

~~~
sitkack
I said nothing about an impl, I mean

    
    
        wget -vv http://localhost:8080/bigbin.bin --header="Range:bytes=1000-20000" -O test.out
        curl -vvv http://localhost:8080/bigbin.bin --header "Range: bytes=1000-200000" -o test.out
    

Both of these got 350MB/s +

for impl, it isn't terribly difficult

    
    
        In [2]: from requests import Request, Session
        In [3]: s = Session()
        In [8]: req = Request('GET', 'http://localhost:8080/bigbin.bin', headers = { 'Range' : 'bytes=1000-2000000' } )
        In [9]: prepped = s.prepare_request(req)
        In [10]: %timeit resp = s.send(prepped)
        1 loops, best of 3: 238 ms per loop
    

All of these are faster than what he outlined. So there is some _other_ issue
going on. His hand rolled http request is faster but not for the reason he
thinks. Crack out some wireshark, not some code.

------
dsjoerg
Title should be "libCurl is slow" since the article fails to present any
alternative other than "make your own", which is unfeasible in many
situations.

~~~
justinsb
Please don't draw that conclusion yet. There are some serious issues with the
blog post, I think "benchmarking is hard" might well prove to be a better re-
title!

------
nickodell
It seems kinda suspicious that 50 requests and 5 MB takes as long as 1000000
requests and 100 MB. It makes me think that something is wrong with your
benchmark.

------
justinsb
It's suspicious that doing N requests of 100 bytes is slower than doing N
requests of 10000 bytes. That doesn't pass the sniff-test for me.

~~~
konceptz
Good point, I found this odd as well. Because the requests are coming from
EC2(NE) to S3 I believe that the pipe is very large and 100 byte vs 10000
bytes is not a large enough difference.

