Exploiting URL Parser in Programming Languages (2017) [pdf]

treve · on Sept 10, 2018

Am I correct to understand that the main exploit here is:

A) A server that makes HTTP requests. B) It does so based on a white or blacklist C) These white or blacklists are circumvented due to bad parsing D) This can be used to call unintended services (and sometimes protocols).

I had an idea to build a generic Webhook service. My thought was that the actual machine that does the requests should be 100% isolated from our VPS and only have public network access. Would that largely stop this attack? Perhaps it should also have a rate limiter to prevent DDOS attacks.

Waiting for every url parser to be fixed seems impracticable. This to me is a great example of why the WHATWG url specification is so terrible. It's so much harder to implement than RFC 3986.

Things that aren't browsers should just implement RFC3986 and reject anything invalid.

userbinator · on Sept 11, 2018

This to me is a great example of why the WHATWG url specification is so terrible. It's so much harder to implement than RFC 3986.

You're not the only one who thinks that:

https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

(Be sure to read the link to the WHATWG's GitHub issue.)

nothrabannosir · on Sept 11, 2018

Check out the TLS SNI exploit. The attack allows embedding CRLF in a TLS payload. The entire handshake leading up to it is ignored by an SMTP server, and the next command is valid SMTP. Meaning that if you ask something to e.g. load an image from, or send a callback to "https://smtp.targetcorp.com<space><CR><LF>...smtp instructions to send mail here...:25/" it'll actually work. As far as I understand this is the TLS handshake so even before any HTTP is done; post, get, whatever. All works.

Good moment to lock down outgoing requests, I guess. At least to port 25 :)

userbinator · on Sept 11, 2018

I've not communicated with SMTP servers manually all that much, but I remember my ISP's SMTP server would just close the connection if you sent it something it didn't understand.

Kalium · on Sept 11, 2018

That sounds like most SMTP servers - they just reject what they don't understand.

The wrinkle here that might be, in the opinion of some, worth noting is that this gives SMTP servers something they understand fully. The HTTP server has been convinced to send perfectly well-formed and valid SMTP commands.

jrochkind1 · on Sept 11, 2018

Wait, how does "targetcorp.com<space>etc" even DNS resolve? Trying to wrap my head around this one.

jsjohnst · on Sept 11, 2018

That’s covered in the slides. Check Glibc section on gethostbyaddr().

treve · on Sept 11, 2018

That sounds like a good second mitigation then. Only allow known HTTP ports.

simcop2387 · on Sept 11, 2018

> I had an idea to build a generic Webhook service. My thought was that the actual machine that does the requests should be 100% isolated from our VPS and only have public network access. Would that largely stop this attack? Perhaps it should also have a rate limiter to prevent DDOS attacks.

This would at least prevent it from attacking your own internal services but wouldn't prevent it from being made to make requests to other outside services. Effectively leaving you up as a potential proxy for other attacks. I.e. the SMTP TLS stuff, making you able to send spam on someone else's behalf. A bit nuts.

jsjohnst · on Sept 11, 2018

> but wouldn't prevent it from being made to make requests to other outside services.

But could be done via host firewall rules to some extent.

Kalium · on Sept 11, 2018

Very true! Firewall rules with a strict filter could contain this!

Of course, that assumes you're not doing something like running in AWS and using the AWS firewall. And that your firewall never has any issues at all.

In general, defense in depth is a preferable strategy. Your entire defense should never be reliant upon a single control.

jsjohnst · on Sept 11, 2018

> Your entire defense should never be reliant upon a single control.

Exactly why I used “to some extent” as a suffix to the statement. One tool alone doesn’t necessarily make you safe, but it might be enough for practical purposes depending on your risk profile.

jkarneges · on Sept 11, 2018

Some discussion about handling webhooks with evil URLs: http://blog.fanout.io/2014/01/27/how-to-safely-invoke-webhoo...

nicolaslem · on Sept 11, 2018

The IP blacklisting section seems fragile to me. What about IPv6 or multiple A records?

I would rather let the OS connect the way it prefers and once it is connected, check that the remote is a legit IP address before sending anything.

jkarneges · on Sept 11, 2018

IPv6: blacklist similar address ranges. Multiple A records: be sure to check against each record before connecting.

Connecting first and then checking the remote IP before sending anything could work I suppose, but I think you'd still need to check against a blacklist.

bzbarsky · on Sept 11, 2018

The problem is that in practice things that implement RFC3986 do random things with stuff that is invalid instead of rejecting it, in an attempt to comply with Postel's law.

So, again in practice, the security consequence of implementing RFC3986 is typically much worse than that of implementing the WHATWG URL spec. At least the latter actually defines handling for all inputs, instead of leaving implementations to make things up as they go.

gohbgl · on Sept 11, 2018

> C) These white or blacklists are circumvented due to bad parsing

The main problem is that the parsing is inconsistent between different subsystems, i.e. the parse function interprets the URL one way and the fetch function interprets the URL in another way. This is why I think the most practical solution is to add another layer of validation that checks that a given URL belongs to a very small subset of all valid URLs that is interpreted consistently across all your functions.

subcosmos · on Sept 11, 2018

Saw this talk at Defcon last year. Left the room .... pretty nervous ....

This isn't it, but same title and content https://www.youtube.com/watch?v=D1S-G8rJrEk

charlysl · on Sept 11, 2018

Yet another one:

https://www.youtube.com/watch?v=2MslLrPinm0

nebulous1 · on Sept 11, 2018

This doesn't seem designed to be read without the accompanying speech

Go0the0gophers · on Sept 11, 2018

Just fuzz every parser you write. And the problem is solved.

whyever · on Sept 11, 2018

I agree with the idea, but I don't think it is enough: A fuzzer does not necessarily find all problems.

Go0the0gophers · on Sept 11, 2018

I wrote several parsers that I fuzzed with tools like AFL or gofuzz. And I promise you that fix lot ugly corner cases.

gsnedders · on Sept 11, 2018

The problem fundamentally is different parsers parsing the same string into different component values. Fuzzing won't solve that.

LolNoGenerics · on Sept 11, 2018

Wow this is brutal! I wonder how safe apache and nginx are?

enedil · on Sept 10, 2018

It was posted before [1] and thus I know the paper isn't from 2018. Should it be added to title?

[1] https://news.ycombinator.com/item?id=14879488

dang · on Sept 11, 2018

Added now. Thanks!

billpg · on Sept 11, 2018

It was only a year ago. Would it be substantially different if written today?

dang · on Sept 11, 2018

It's just an HN convention to have the year in the title when the article isn't the current year. Not only does it not imply badness, it's associated with goodness, since most things from previous years have lost their interest.

grzm · on Sept 11, 2018

One of the purposes of adding the date is to make people aware of whether they may have already read it. It's not an indication of whether the content is out-of-date.