I can add to the list of attack vectors a case where the WAF introduced a reflected cross-site scripting vulnerability. The site it was supposedly protecting was blank, i.e. it just returned a 404 error or something. But just by appending a URL parameter with JS in it, the WAF would trigger and reflect the code. So I was able to build an outlook web app lookalike for phishing on a site with the domain of the company.
There are many valid points. Though it doesn't cover many things modern wafs systems do in addition to the regex rules.
The question what is the alternative and the suggested alternative is that everyone become perfect security expert. It is even less likely to succeed than creating security-aimed software by professionals.
Consider what is the proportion of sites that are created by people who knows zero about security. Wordpress is like on 40%+ web sites (ridiculous).
WAFs unlikely to prevent targeted attacks, they don't have to to be useful. In practice, simple measures can prevent many common attacks.
I was kinda with you until you made this statement:
> No, "defense in depth" is not a valid excuse to use a WAF anyway, because it provides no real defense!
I have to disagree here. You are making assumptions that every developer in an org will always do the correct thing and deploy code that won't be exploitable to SQL injections, XSS, file inclusion, etc... That's just not the case. I'm all for doing the correct thing, and not just performing security theater, but WAFs do offer some protection. You need multiple layers of security covering the holes that may left in other layers. And a WAF can be one of those layers of protection.
> Can you believe people pay money for "web application firewalls"?
I think it's like a lot of things in computer security, in that system owners just don't want to be the slowest gazelle in the herd. If an attacker is mass-exploiting some new remote vulnerability, then maybe the WAF means that you're one of the lucky ones who doesn't get hit. And yes, that's a very big maybe there.
WAFs don't do much to prevent targeted attacks except to require the attacker to craft a WAF bypass. As you've shown.
They also have an organizational purpose. Once an attack happens you can shift the blame onto the WAF. And the WAF provider can, if needed, claim this to be a novel attack against which they have prepared for the future. Even issue an emergency patch that detects and blocks the novel backslash-newline line break technique.
I think that important difference though. WAFs are very useful when they are used as a quick fix for a zero-day vulnerability. Especially if you are running some generic software like WordPress and can subscribe to a managed WordPress rule set.
The key is that they should mitigate specific vulnerabilities and ideally once the proper fix is deployed the rules are then removed from the WAF.
WAFs have near-zero value for things like trying to detect shell-code or generic SQL injections as they just turn into fuzzy bug injectors. Any real attacker will very quickly find a way to structure their exploit to avoid the heuristics.
WAFs also have minimal value for custom software. Because even if you use it to deploy a quick block for an exploit unless you are just blocking a whole endpoint the attacker will also likely find a way to work around it.
Also, never underestimate the power of FIPS999999999 or whatever compliance. If it’s a checklist item to have a bowl full of M&Ms without brown ones in your data center, your security people will make sure that box gets checked. It doesn’t matter how outdated the requirement is.
Honestly, so many of these complaints about web application firewalls fly so far past the point that I'm not surprised folk don't understand them.
Yes, they can be trivial to bypass. And yes, they don't block everything. But also, yes, they can be very useful in some situations.
My employer's security team maintains a WAF, and while it may be frustrating at times (like when anti-directory-traversal rules broke page names with '...' in them) I mostly prefer that they continue to do so for two big reasons: script kiddies and botnets.
It doesn't matter that a bypass is trivial when in practice your attacker won't mutate their attack -- if the attacker was more sophisticated, the defence could be too, but if the attack is dumb then there's no point in a sophisticated defence.
Botnets mean that purely reputation-based defence is insufficient. The best defence to a distributed attack is one that's really cheap to evaluate. If all an attacker ever tries is to hit our homepage with a fixed user agent string, then all we need to do is block that UA from hitting our home page. A simple WAF entry is sufficient to block that particular attacker.
This precise example is indeed poorly-applied, as the system is intended to receive arbitrary text of arbitrary technical complexity. But I wouldn't mind the rule being applied to my team's endpoints, as we can be confident that anyone sending shell has malicious intent regardless of whether there's any chance that my services would try to execute the code (they won't).
So long as it is possible to bring down services without any effort, skiddies will keep trying to do that. And so long as we've people trying dumb attacks in infrastructure, dumb defences can have a worthwhile effect. And if the dumb defences start catching stuff they're not supposed to catch, like the example with '...', they're dumb enough that we can understand why they're doing that and if we can safely turn them off.
Probably, the WAF, specifically Cloudflare specials, matches a number of things. And as a lot of it is just regex matching the context of where the match occurs isn't precise.
Additionally cloudflare doesn't know what is safe for a given site, so it has to be a little conservative. The sites that can handle malicious input, or are tech sites that expect things that are SQL or commands that may contain directory traversal, these are in the minority.
Essentially these are false positives, which are typically viewed as more acceptable than false negatives as those would allow attacks through.
These things are configurable by the site owners, but the issue here is that the site owners are not shown the code of the rules, so have to guess from the names and descriptions whether something is safe to disable, meaning everyone just leaves everything enabled. Usually reporting this to a site owner with the cloudflare trace id is sufficient to enable the site owner to disable a rule that is causing false positives, as the site owner can use the cloudflare dashboard to search the trace id.
I do not work there any longer (left 3 years ago), but did write significant parts of the firewall and also manage the firewall, WAF, and DDoS protection teams.
Any code including netcat (for it's tendency to be used in reverse shells) or SQL (for it's tendency to be used in SQL injections) tends to be blocked across the entire cloudflare-net these days.
Sites like HN could have disabled WAF. It's entirely configurable on HN's side. Let's just wait until dang wakes up and implement the required changes.
“Thanks for the heads-up! We'll probably be off Cloudflare fairly soon - I think that's more likely the better fix. But if we end up being forced to stay on it, we'll look into configuring those rules.” --reply to an email I sent
I see it. It explains why cloudflare is involved at all. But I don't see how this is related to the comment you replied to, which is saying that the text filter is ridiculous.
AFAIK most of these filters are disabled by default when setting up your website on Cloudflare, so most websites using the Cloudflare network likely have this turned off.
Also, it's not one magic switch. You can switch each of those rules on/off. Ideally HN would allow most of the injection ones, otherwise we won't be able to post examples of specific SQL patterns and the like.
Found a webshop once which issued IP bans when you triggered their WAF. Coincidentally, some product permalinks (containing the product name) triggered their WAF. Great conversion rate on those, I’m sure.
YOU HAVE BEEN BLOCKED FOR MALICIOUS ACTIVITY surely has to be good for business. Not that most would know, considering the trackers won’t load when this happens.
Even if they got the link, read it, they probably didn't fully understand the concepts.
I wish this was a joke, but just last month I spent literally hours arguing with multiple people -- on shore -- that that kind of query rewrite/rejection approach was never going to work properly, and only properly parameterised queries were correct.
Nope.
Fix after fix, then fixes for the fixes, then workarounds for the glitches, and then... on and on.
It was incredible to me that in 2023, supposedly senior technical team leads would have heated arguments rejecting parameterised queries and favouring regex WAF instead.
> query rewrite/rejection approach was never going to work properly, and only properly parameterised queries were correct
...what do you mean by "rewrite/rejection"?
If rewrite means "escaping strings using the database function designed for that purpose", then that approach works just fine. It's not comparable to rejection at all.
If they were making their own version, then the underlying problem is that they were making their own version. Parameterised queries are lovely but they are not the only option.
I mean that they were doing simple things like replacing a single quote (dangerous!) with a double single quote. E.g.: ' -> ''
That means that when a user called Bob O'Neill enters their name, instead of returning a HTTP/500 error, the database stores Bob O''Neill.[1]
Then when the user goes to edit their form, they will see O''Neill. Okay, oops, that's a mistake, let's just replace all double single quotes with a single quote when outputting HTML! Now it'll say O'Neil correctly!
Of course, if you enter some bad text with double single quotes via some other mechanism such as a CSV upload, there's a decent chance that'll it'll be incorrectly stripped. Perhaps in some mid-tier API, which will then interpret it as a single quote, resulting in an injection vulnerability (or data corruption) again.
That can be fixed with "mere" man months of effort instead of the minutes it would have taken to just use the parameterised queries like God intended.
Now that that nightmare is over once and for all... what to do about % symbols screwing up LIKE searches? I dunno, that's complicated, so let's just replace all...
... rinse, repeat, ad infinitum.
[1] Oh, oh, you assumed that the query engine would replace '' with ' and the database would store the correct text? Hah-haaa.... you assumed that this "fix" was applied only once! What's fun about band-aids is that they're so easy to accidentally layer three or four deep without even realising. More band-aids == more safe, am I right?
A common problem with technical leads is that occasionally they tend to forget that they can't know all the correct solutions and it's usually better to yield to those who do. Sometimes seniority makes this even worse.
Source: After so many years of dealing with bad technical decisions became a TL myself :)
"Needs to die" is a bit harsh but there are alternatives that are better and more secure. If you're just a regular sysadmin and only want to spend 8-9 hours a day at work you might just use a WAF instead and deal with the lost performance/added cost.
So HN uses Cloudflare? That surprises me because typically I notice sites using Cloudflare because my mobile running GNU Linux cannot pass their dreaded Turnstyle. Luckily that does not happen for HN.
They used to run it but stopped (I want to say) around 2016 or 2017. Another poster here linked[0] to how dang confirmed it is to protect against a DDOS attack.
The main YCombinator site might have been using Cloudflare seen years, and now HN might have been added to the same account to protect from a DOS attack.
The aggressiveness of the "dreaded Turnstyle" is 100% configurable.
It's very easy to disable it completely via Cloudflare settings. Using cloudflare doesn't require you to use all of its features, and almost every feature can be turned off.
There are many options to configure it, the main reason to make it always visible and blocking is that the callbacks for managing the hidden/on-demand version are wonky and can break in unexpected ways leaving your site entirely unusable, with the only indication being some errors logged to console.
WAF in general is security theatre. If your operation genuinely benefits from one, I dread for what's sleeping underneath. Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
It’s just doing hash math (think like bitcoin mining) to make your CPU burn enough processing time to make a layer seven DDoS not worthwhile. It works. Because now the server uses way less processing time than the client did.
That is not true.
It does a whole bunch of checks, like fingerprinting your GPU, environment, etc.
The checks are even run in a custom VM, and are heavily protected.
The gathered data is then sent back to cloudflare, and you either get an access cookie (cf_clearance) back, or not.
While this is true and worth reminding the ops about, it still sucks because many people don't understand the issues they cause by turning WAF on. CloudFlare should have a big "I understand I'll block many legit clients when I enable this" checkbox. Or you know... fix it in general. Or at least have a "report this block as invalid" link on the page.
Cloudflare WAF doesn't block clients in general, it blocks based on the data the client sends to the server.
Unless your client sends a string which matches one of the WAF patterns the site will work fine. It only blocks individual requests.
Now the problem here is that you probably shouldn't enable the WAF without having it in log only mode for a while if you are operating a site which let's users submit arbitrary text input. Of course it's going to match... You'll have to adjust the configuration.
Agreed, I believe the default Firewall security level is "Medium" and I think that's far too strict. First thing I do when adding a new zone is to set it to "Essentially off"
how many people do you piss off with the opinions you post on your blog? enough to warrant being DDoS'd by an emotionally stunted highschooler with their parents/stolen credit card and the ability to Google for a botnet?
Almost nobody who uses big brother as an individual ever does. What would anyone care about a nextcloud login panel? Or a reasonably civil personal blog? And yet they enable cloudflare for yet another small corner of the internet :(
Oh, how did you find that? Some inside knowledge about HN or is that a well-known path published by Cloudflare sites?
Edit: Seems to work also for at least some other Cloudflare sites. Interestingly HN is served from Stockholm (behind a sea cable) while others are served from Helsinki (should be closer). Not enough hackers here in Finland?
Edit 2: Works also on sites where Turnstyle keeps me out.
news.ycombinator.com. 3710 IN CNAME news.ycombinator.com.cdn.cloudflare.net.
news.ycombinator.com.cdn.cloudflare.net. 3710 IN A 104.22.6.236
news.ycombinator.com.cdn.cloudflare.net. 3710 IN A 172.67.5.232
news.ycombinator.com.cdn.cloudflare.net. 3710 IN A 104.22.7.236
bummer. i used to like the legend that it was all on one commodity linux pc implemented in some nice concise lisp running on sbcl.
edit: my memory is crap. it was a single machine, but the codebase was written in a custom experimental language that i think was a lisp derivative. (which would make sense!). the source was online at some time, can't find it now.
I understand the backend is still somewhat limited. After Altman was fired (IIRC) when HN got 1000s of comments within no time, dang asked everyone to logout and login again, to make life for the poor machine a bit easier. So besides limited HW resources, also some non-optimal implementation details ;)
It still is a single single-core server, dang references it frequently when there's unusually high traffic [0]. And the language you're referring to is Arc [1]. They do have caching for not-logged-in users, historically done through nginx [2]. From other comments in this thread, it sounds like they just temporarily put Cloudflare in front of that single server to block a DDoS.
Cloudflare is just dumb, they block XHR requests randomly in the same session they’ve already challenged breaking websites in not quite obvious ways and have been doing that for as long as I can remember. Trying to do anything on for example Montana's SOS BIZ portal takes a lot of patience. They’re like TSA of the Internet but at least with TSA you can pay for a fast pass.
cloudflare blocked me from signing in to my petflow account to buy cat food. It was in an endless verification loop. Awhile back it did the same with my paid crunchyroll subscription. I don't code, I have a very ordinary setup with a well known browser. Apparently cloudflare now owns our access to the internet and can block whom it pleases, when it pleases, no recourse. The internet is soon to be available only to those who fit cloudflare's criteria, whatever that may be, as long as companies keep buying in to the third party control.
I'm sure that there is a huge chart on their Cloudflare dashboard about how many attacks were blocked! This is one thing that gets me, all of the reporting Cloudflare provides treats every block as a huge success. Nothing to help identify actually attacks vs false positives. Let also false positives that would have actually has a negative effect on the application behind the WAF.
That's how one of my past employers resolved this. Basically base64 encoded every field in the JSON as someone reported a bug where the WAF blocked it. Not only was this done inconsistently and was super tedious but completely defeated the purpose of the WAF. (Except of course to check the checkbox that we had a WAF.)
Same for Akamai, Cloudfront, Fastly, etc. Pretty much every business that wants to offload DDOS protection, caching,and some level of frontline security uses a proxying CDN.
An alternative is to keep all of your CDN assets on a CDN bucket on its own hostname, with your main secret-containing business apps on your own servers, but it costs a lot to manage this level of separation and the payoff is only protection against the theoretical attack of "NSA can't attack our users/spy on them". If the NSA ever did do this on a large enough scale or to target a particularly notable person, it's very unlikely it would be kept a secret for long, and the end-business that used Cloudflare et al. wouldn't be implicated whatsoever since every business uses one of the big CDN providers.
https is important for preventing spying by anyone else in between you and the server. ISPs, coffee shop owners, schools, etc used to spy on http traffic to see what people were doing/searching for, and ISPs like xFinity injected code into non-https pages to show "important messages" to users, e.g. going over your bandwidth limit[0].
The only weak link now is Cloudflare, which is still "less secure than a direct connection" (with respect to government spying, bugs[0], hackers, etc) but the threat level is drastically reduced.
Cloudflare can issue from Google Trust Services/Digicert with ACM[0] and often does even without ACM (although maybe only for Business/Enterprise domains).
Check the whois entry for the IPs that domain resolves to. If they belong to CloudFlare, they can see the plaintext traffic. Same for Akamai, Cloudfront and others.
To downvoters: please don't shoot the messenger. I'm not happy about the existence of Cloudflare (or their competitors who do the same thing) either.
That said, the choice is yours whether or not to use sites that utilize such untrustworthy MITM providers, like Cloudflare. There are even browser plugins that can automatically block connections to such untrustworthy entities.
This isn't an endorsement, and you should always review the source code of any browser extensions you're utilizing due to the risks extensions themselves can pose, but I personally use one called Cloud Firewall and it works great. (https://addons.mozilla.org/en-US/firefox/addon/cloud-firewal...)
>There aren't obvious signs up front that a site is using cloudflare.
You're joking, right?
It takes 2 seconds to click the padlock in your browser, click through once more, and see "Verified by: Cloudflare, Inc". You don't even need to view the certificate.
If 2 seconds and 2 clicks is too much time and effort, it's obviously not actually that important to the user in question.
It’s a CDN that caches content and it’s able to inject “are you human?” verification pages, it can rewrite content on demand (e.g. serve optimized images / html / JavaScript). It seems obvious to me that they have access and ability to modify all cleartext content in-flight.
It's a TLS termination proxy that decrypt and re-encrypt your TLS packet. Technically Cloudflare can read anything unless you add your own crypt layer on top of TLS.
Yes that's how Cloudfare works. The TLS certificate for basically any website using Cloudflare "ends" at Cloudflare's servers. It's then either forwarded on to the actual servers in cleartext or re-encrypted with an internal company certificate (maybe signed internally as well) to pass the connection on to the actual servers. It was the easy way many companies who didn't have the expertise to do their own certificate management moved from the http world to the https world. They just handed it off to cloudflare and kept their servers running http.
F5 Networks, my former employer, sells something similar, but it's a box (or virtual appliance) you put in your own data centers somewhere that dead-ends the connection instead.
It's entirely possible to have a proper SSL connection to a bogus hostname, that is showing the correct website and even interacts correctly.
Bogus MITM decrypts the traffic, logs it, then forwards the traffic once again encrypted to the destination server. Then does the reverse for the resonse.
"Look for the padlock" is only useful if the actual hostname is correct in the browser.
If I hosted news.ycombnator.com using this and you didn't notice that I could be proxying just like that. It's possible cloudflare has protections against this in place but doesn't every website on earth?
If cloudflare have thr certificate’s private key and are advertising the A record they have access to everything you send, from emails to credit card numbers.
Can you explain what you mean a bit more? My connection to eg my bank isn't decryptable by anybody but me and my bank (and their CDN which is serving their certificate). That is, eg, Verisign has root CA keys to sign the cert, and they could give me a cert that says they're my bank and I could make a new connection that they could decrypt, but the original connection to my bank can't be decrypted by their keys.
That's often the case with HN I think from past experience when there are large threads on HN, and dang has in the past said that's due to the application server.
I hope you figure out that annoying people doesn't make you right. We can only dream of a world where it's difficult to be annoying, and requires putting in some effort to be right first.
> stealing from users
What a weird definition of stealing.
And it's as much an issue with your browser if hitting back doesn't return the text. There are extensions to improve that behavior.
But what I find really interesting is that you seem to think being mad about an issue is a reason to break a completely unrelated rule?
I didn't say to avoid jokes. Joking and being antagonistic are different things!
And I still don't think stealing is the right word for that kind of technical issue, especially when it's still half your browser's fault.
> which you didn’t manage to specify
Why would I need to specify something you brought up? "OH NOES comparing HN to Reddit violates "policy.""
> and being “mad…”
Is "aggressive griping about" better? People usually simplify that to "mad about".
> One of them has gone through and downvoted all my posts now
Almost every post you made inside that 24 hour downvote window deserves it, so depending on how literal that "all" is, they're probably helping and not a bad actor.
This is also fine, but blocked if you change the slash to a dot:
This works too: OK, that's all for now. Can you believe people pay money for "web application firewalls"?