Hacker News new | comments | ask | show | jobs | submit login
Exploiting URL Parser in Programming Languages (2017) [pdf] (blackhat.com)
125 points by tosh 5 months ago | hide | past | web | favorite | 30 comments



Am I correct to understand that the main exploit here is:

A) A server that makes HTTP requests. B) It does so based on a white or blacklist C) These white or blacklists are circumvented due to bad parsing D) This can be used to call unintended services (and sometimes protocols).

I had an idea to build a generic Webhook service. My thought was that the actual machine that does the requests should be 100% isolated from our VPS and only have public network access. Would that largely stop this attack? Perhaps it should also have a rate limiter to prevent DDOS attacks.

Waiting for every url parser to be fixed seems impracticable. This to me is a great example of why the WHATWG url specification is so terrible. It's so much harder to implement than RFC 3986.

Things that aren't browsers should just implement RFC3986 and reject anything invalid.


This to me is a great example of why the WHATWG url specification is so terrible. It's so much harder to implement than RFC 3986.

You're not the only one who thinks that:

https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

(Be sure to read the link to the WHATWG's GitHub issue.)


Check out the TLS SNI exploit. The attack allows embedding CRLF in a TLS payload. The entire handshake leading up to it is ignored by an SMTP server, and the next command is valid SMTP. Meaning that if you ask something to e.g. load an image from, or send a callback to "https://smtp.targetcorp.com<space><CR><LF>...smtp instructions to send mail here...:25/" it'll actually work. As far as I understand this is the TLS handshake so even before any HTTP is done; post, get, whatever. All works.

Good moment to lock down outgoing requests, I guess. At least to port 25 :)


I've not communicated with SMTP servers manually all that much, but I remember my ISP's SMTP server would just close the connection if you sent it something it didn't understand.


That sounds like most SMTP servers - they just reject what they don't understand.

The wrinkle here that might be, in the opinion of some, worth noting is that this gives SMTP servers something they understand fully. The HTTP server has been convinced to send perfectly well-formed and valid SMTP commands.


Wait, how does "targetcorp.com<space>etc" even DNS resolve? Trying to wrap my head around this one.


That’s covered in the slides. Check Glibc section on gethostbyaddr().


That sounds like a good second mitigation then. Only allow known HTTP ports.


> I had an idea to build a generic Webhook service. My thought was that the actual machine that does the requests should be 100% isolated from our VPS and only have public network access. Would that largely stop this attack? Perhaps it should also have a rate limiter to prevent DDOS attacks.

This would at least prevent it from attacking your own internal services but wouldn't prevent it from being made to make requests to other outside services. Effectively leaving you up as a potential proxy for other attacks. I.e. the SMTP TLS stuff, making you able to send spam on someone else's behalf. A bit nuts.


> but wouldn't prevent it from being made to make requests to other outside services.

But could be done via host firewall rules to some extent.


Very true! Firewall rules with a strict filter could contain this!

Of course, that assumes you're not doing something like running in AWS and using the AWS firewall. And that your firewall never has any issues at all.

In general, defense in depth is a preferable strategy. Your entire defense should never be reliant upon a single control.


> Your entire defense should never be reliant upon a single control.

Exactly why I used “to some extent” as a suffix to the statement. One tool alone doesn’t necessarily make you safe, but it might be enough for practical purposes depending on your risk profile.


Some discussion about handling webhooks with evil URLs: http://blog.fanout.io/2014/01/27/how-to-safely-invoke-webhoo...


The IP blacklisting section seems fragile to me. What about IPv6 or multiple A records?

I would rather let the OS connect the way it prefers and once it is connected, check that the remote is a legit IP address before sending anything.


IPv6: blacklist similar address ranges. Multiple A records: be sure to check against each record before connecting.

Connecting first and then checking the remote IP before sending anything could work I suppose, but I think you'd still need to check against a blacklist.


The problem is that in practice things that implement RFC3986 do random things with stuff that is invalid instead of rejecting it, in an attempt to comply with Postel's law.

So, again in practice, the security consequence of implementing RFC3986 is typically much worse than that of implementing the WHATWG URL spec. At least the latter actually defines handling for all inputs, instead of leaving implementations to make things up as they go.


> C) These white or blacklists are circumvented due to bad parsing

The main problem is that the parsing is inconsistent between different subsystems, i.e. the parse function interprets the URL one way and the fetch function interprets the URL in another way. This is why I think the most practical solution is to add another layer of validation that checks that a given URL belongs to a very small subset of all valid URLs that is interpreted consistently across all your functions.


Saw this talk at Defcon last year. Left the room .... pretty nervous ....

This isn't it, but same title and content https://www.youtube.com/watch?v=D1S-G8rJrEk



This doesn't seem designed to be read without the accompanying speech


Just fuzz every parser you write. And the problem is solved.


I agree with the idea, but I don't think it is enough: A fuzzer does not necessarily find all problems.


I wrote several parsers that I fuzzed with tools like AFL or gofuzz. And I promise you that fix lot ugly corner cases.


The problem fundamentally is different parsers parsing the same string into different component values. Fuzzing won't solve that.


Wow this is brutal! I wonder how safe apache and nginx are?


It was posted before [1] and thus I know the paper isn't from 2018. Should it be added to title?

[1] https://news.ycombinator.com/item?id=14879488


Added now. Thanks!


It was only a year ago. Would it be substantially different if written today?


It's just an HN convention to have the year in the title when the article isn't the current year. Not only does it not imply badness, it's associated with goodness, since most things from previous years have lost their interest.


One of the purposes of adding the date is to make people aware of whether they may have already read it. It's not an indication of whether the content is out-of-date.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: