
Python urllib CRLF injection vulnerability - robin0
https://coocoor.com/advisory/cve/CVE-2019-9740
======
kccqzy
This is far from uncommon. Back in DEFCON 2017 Orange Tsai gave a talk about
inconsistencies in different URL parsing libraries in different languages. The
opening example was a single URL that had a different hostname when parsed by
urllib, urllib2, and requests. He also demoed examples of using unusual
characters like spaces and newlines to talk to Redis or SMTP while pretending
to be HTTP.

Slides:
[https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20pre...](https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20presentations/DEFCON-25-Orange-
Tsai-A-New-Era-of-SSRF-Exploiting-URL-Parser%20in-Trending-Programming-
Languages-UPDATED.pdf)

~~~
jetru
Orange actually reported this bug to urllib. The ticket in the HN link is
actually a DUP of Orange's original finding

------
haikuginger
Python urllib3 maintainer here. urllib3 made a change to be more RFC-compliant
in December, and which fixed this issue, but that change has not been released
yet. We are in the process of looking into that.

I have verified that Requests, which uses us, appears to have its own
handling, back at least to requests 2.0 (released in 2013) that prevents this
when used directly as an abstraction layer on top of urllib3.

~~~
aldoushuxley001
Interesting. I was recently debating whether to use Requests or just urllib3
directly. Figured I'd minimize dependencies by just using urllib3 but didn't
think it might actually be more secure to use Requests. Great work btw!

~~~
diminoten
What is your use case that would make minimizing dependencies to this extreme
a valuable activity?

~~~
aldoushuxley001
I was just using urllib3 to post a form on another website and get the
resulting html page, then parsed it with BeautifulSoup.

Since it was just a one off use case and ultimately very simple, I didn't see
the need for any more functionality. Why bother with the extra packages? Or do
you think it's still worthwhile to use Requests even still? Is it not just
unnecessary bloat that might slow runtime?

~~~
diminoten
There's a lot to unpack in your comment, but I'll just work with the most
easily verifiable thing for you; what was the response time of the resource
you were querying with urllib3, and do you think using requests instead of
urllib3 directly would be an order of magnitude (or two) more or less runtime?

~~~
aldoushuxley001
I admittedly didn't test the response times between the two, but just felt
adding additional dependencies was unnecessary. I don't realistically expect
the speed to be too different between the two, but the less I have to rely on
external libraries the better. If I can get the job done with urllib3 why use
Requests?

Though admittedly, after reading OPs statement, I see that Requests might
actually have some extra security that urllib3 alone might not have. But
barring security improvements or the need for extra features that Requests
has, seems like using Requests for my usecase would be adding unnecessary
complexity.

~~~
diminoten
> but just felt adding additional dependencies was unnecessary

This notion, especially in Python and HTTP client programming, is wrong and
will cost you many many more hours than it will save you.

Requests is an entire order of magnitude easier to use than urllib3, and while
we may be dealing in minutes for this specific scenario, you will make up for
any time investment you pay to learn Requests the very next time you need to
do HTTP related work in the language.

It's a matter of not overreacting to a cost, and you're paying way more than
you should to get a much smaller gain than you could, if you paid that cost
elsewhere (by learning/using Requests and how to manage dependencies in
Python, which you have to do anyway with bs4).

------
cbsks
The link should probably be changed to the actual bug:
[https://bugs.python.org/issue36276](https://bugs.python.org/issue36276)

~~~
tyingq
Which appears to be a duplicate of another bug filed in 2017:
[https://bugs.python.org/issue30458](https://bugs.python.org/issue30458)

~~~
cbsks
I just noticed that. I guess they didn't think it was actually an exploitable
bug?

Edit: this bug sat around for almost 2 years, it will be interesting to see if
it gets fixed now that it is getting attention on Hacker News

------
jaybosamiya
Relevant (and super cool) previous work, done by Orange Tsai:
[https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-
Ne...](https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-
SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf)

------
1wd
Python 3 urllib and other stdlib protocol modules also use `splitlines` which
splits on various unicode "newlines". Could that also be exploitable somehow?
[https://discuss.python.org/t/changing-str-splitlines-to-
matc...](https://discuss.python.org/t/changing-str-splitlines-to-match-file-
readlines/174)

------
peterwwillis
Key takeaway: don't expect a library to do the safe thing; always sanitize all
your input. (If your language supports taint mode, enabling it can prevent
these bugs)

------
anaphor
Does anyone know if this also affects the Requests library? Does it use these
under the hood, or is it all httplib? (I'm pretty sure that's the case)

~~~
hannob
it probably does. Requests is built on top of urllib3 and the bug report
mentions that urllib3 is affected as well.

~~~
fireattack
Are urllib and urllib3 same thing?

~~~
jwandborg
No, urllib is a standard library module in python 3. urllib3 is a 3rd-party
package. See also
[https://news.ycombinator.com/item?id=19423367](https://news.ycombinator.com/item?id=19423367)

------
vldo
seems like an ad for coocoor

actual CVE entry: [http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2019-9740](http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2019-9740)

------
hannob
Probably worth checking other implementations. The comments already mention
that urllib3 is affected as well.

~~~
wereHamster
Why does python need three different versions of urllib?

~~~
kevin_thibedeau
Urllib2 introduced breaking changes with urllib so a new lib was added to
preserve the functionality of the old one. Urllib3 also has breaking changes
but it purposely doesn't live in the standard so it can be changed more
readily.

