I’m curious whether those who voted for this submission have ever taken a look at their server logs.
Almost every public website on the open Internet receives thousands of HTTP requests similar to the ones mentioned in this text file. This is one of the several reasons why web application firewalls gained popularity years ago, especially as vulnerability scanners became widespread.
Years ago, when I was employed at a young security startup, my colleague and I dedicated countless hours analyzing this particular kind of web traffic. Our objective was to develop basic filters for what eventually evolved into an extensive database of malicious signatures. This marked the inception of what is now recognized as one of the most widely used firewalls in the market today.
I sometimes take a look at the logs, but nowadays there's a lot of noise from "security" companies that scan probably all IP addresses and all ports with known vulnerabilities. And they do it the lazy way. They just fire a bunch of URLs at each port that responds: long hexadecimal URLs, wordpress admin end-points, oauth end-points, etc. In the beginning, they even sent emails to tout their services.
We use one of them for ISO certification. Twice a year, we turn on their "vulnerability scanner", which says its test over x-thousand vulnerabilities, we get a report, and everybody is happy. Only on the first run did it discover a small error in the nginx config. Unfortunately, it is theater.
Most of this looks like random data designed to detect if SQL injection is happening without crashing the query (to avoid detection). So, the random strings are effectively a token to check if it is found in the response, which indicates that the injection worked. Similarly for the sleep calls, the attacker would time the response.
This exactly. Any text field is "checked" to see if it is getting submitted unprotected to a database. Every single one of them.
When you run a search engine you will see queries that try to look through every page of results for the search query 'input type="text"' typically this will either come from an API query or to a search page that is fronting another index.
The sleeping is pretty clever, but presumably vulnerable to false positives if the queries are slow anyways. I wonder if they go the extra mile and time the load time with and without the attempted injection
It would be interesting to make a SQL injection honeypot that behaves like a database in most responses but is designed to maximally frustrate the attacker.
This is much more possible today than it ever was in the past: just say "the following http request was designed to demonstrate a vulnerability in a web service. Please explain what vulnerability this request is designed to detect, and what part of the response demonstrates the vulnerability. Finally, output an example of a response that a vulnerable service might produce in response to this request" to an instruction tuned LLM, and then return that response to the attacker (the "explain what is happening" bit is just to get a more plausible response).
As a bonus, your apparently vulnerable service would be incredibly slow, so any iterative testing would be incredibly slow.
Yeah, so much noise.
I enjoy screwing around with them on my free time, "imposing cost" by giving back unexpected things.
I don't know if it actually does something, but I bet returning either a gzip-bomb or a 5 MiB really obscure (but valid) HTML file will crash quite a few scanners.
Protip: I usually add a hidden input field to my forms. As it is hidden a normal user should not be able to fill it out, only a bot will. So if the hidden input isn't empty, I can disregard it as spam, it works wonders.
I do the opposite: a hidden field that is filled automatically by javascript. Yes, that means you can't submit without JS. That's a tradeoff I was ready to make. I'm actually surprised it still works as well as it does.
For many sites, turning off JS is not an option. IMO, it's wasteful to ignore all that compute power in the browser. It's better to run code in thousands of browsers than do it all on the server.
The user already is. The casual user's system is idling with all sorts of nonsense. Adding some light processing doesn't harm. I'm thinking 100-500ms per page. You don't render the page for the user neither, do you?
For heavier use cases (e.g. image processing), the user should be willing to spend some CPU power. It doesn't make sense to send an image to a server, put it in a queue, wait for an image processing worker to run it, and send back the result. It's simpler and more sensible to run that process client-side, if feasible. E.g. LLMs are too big for that, but many other tasks can.
Some "light processing" like 10 seconds of instantiating <js framework of the day> crap that gives me nausea while it redraws infinitely and boxes move around on the page?
Even the mobile oriented samey SAAS sites that have you scroll through 20 screens to read 5 lines sound better...
Edit: Btw, 100-500 ms on what? The latest Intel 500 W space heater? And tested only in Chrome because it's too expensive to notice that it's not very fast or responsive on other browsers?
Edit 2: Not to be misunderstood. If you're doing the computation for me, go ahead. If you're doing the computation because your framework has 100000% overhead, no thanks.
I don't like heavy frameworks either. I try to keep everything light, both server and client-side. 10s loading animations is too much. But having no framework at all severely limits development speed.
All browsers are approximately equally fast nowadays. I use Firefox, so no worries there.
One of the first bits of analytics I put on any webserver is to count all unhandled urls. As others here say things like WordPress admin page request probing are classic but I remember one of the Django designers pointing out that sometimes legitimate looking requests are actually a form of suggestion. That used to be a lot more true when people would try to play with urls to get to what they wanted.
Relatedly if you work in a field where your products become known as a useful benchmark you will find prototypes start showing up long before any public disclosure. We used to use this to be able to anticipate new screen resolutions and evaluate new GPUs and SoCs before being told about them.
I remember back in the day getting my first server online. Then a few months in I stumble across the ssh logs… let’s say it was quite handy because at the time were trying to come up with a name for our kid.
The internet is a jungle with dragons. Nowadays I try to keep everything on my vpn as an extra security layer.
I remember installing a Juniper Intrusion Detection System in a server rack on a telecom company. Was quite impressed when I saw in the logs such attacks were discovered and blocked. This was 15 years ago.
I've got a classic guestbook on an intentionally vintage page but I actually filter the input into "spam" and "humans". Here's the spambook: https://bootstra386.com/spambook.html
Showing it's not impossible to have a classic anonymous guestbook, you just have to be a bit clever.
Yes, that's a zipbomb to a particular offender at the beginning. It worked. A script dumb enough to brainlessly slam the site easily broke against a zipbomb.
Almost every public website on the open Internet receives thousands of HTTP requests similar to the ones mentioned in this text file. This is one of the several reasons why web application firewalls gained popularity years ago, especially as vulnerability scanners became widespread.
Years ago, when I was employed at a young security startup, my colleague and I dedicated countless hours analyzing this particular kind of web traffic. Our objective was to develop basic filters for what eventually evolved into an extensive database of malicious signatures. This marked the inception of what is now recognized as one of the most widely used firewalls in the market today.