Hacker News new | past | comments | ask | show | jobs | submit login
URLhaus: A database of malicious URLs used for malware distribution (abuse.ch)
157 points by fanf2 20 days ago | hide | past | favorite | 38 comments



To me this looks that for ~90% (eyballed) you just need to tell your browser somehow to stay away from any port other than 80 or 443. If some script/link-rel/src attrib points to a non-http(s) port just pop a warning to confirm you're ok with it. Is there a browser today with this feature?

Or go wild to neuter your browser and configure your firewall to allow only dst ports 80 and 443 for mozilla/chrome/edge/etc.

Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.


This feels like the wrong conclusion in an adversarial game. Such a heuristic works only because it's not done at any reasonable rate. As soon as it gets applied nontrivially you'll see scammers adapt back to ports 80/443. And you've got a broken browser to boot.


I agree that is the likely outcome. It is interesting to me though that they're not concentrated on ports 80/443. There must be a reason why and one answer, from a comment below, could be that boxes hosting this already serve legit http content on those ports. Having this kind of traffic show up in http monitoring tools would make the hack obvious.


A lot of stuff is hosted on compromised sites or devices - which are often running some insecure admin interface on an odd port. Very rare in my experience that someone would spin up a new web server on a box they’ve popped.


I recently got a pop up in Firefox doing exactly that. I was messing around with my homelab and entered a URL/Port that Firefox deemed suspicious and warned me that “this port is usually not used for web browsing. Are you sure you want to visit that?”


I'm hearing more and more nice things about firefox lately. Been on chrome for the last 10 years or so but I might have to switch soon if they break extensions the way the plan to.


Where do you get those 2 million IPs? The plaintext url list [0] only contains 90k entries and after filtering it to ips only and de duplication it's just 39k.

I've just added it to my firewall that does around 160Mbit/s right now using an ipset and the only increase in CPU I can see is a small blip from the ipset restore. And that's just an APU2 with a AMD GX-412TC (1GHz Quad core from 2014) and not a beefy box.

[0]: https://urlhaus.abuse.ch/downloads/text/


yea, it should not have any performance implications.

be aware that blocking stuff in your infrastructure will have hard to diagnose fallout and you're generally better of if you police content on the client (ad-blocker)


On performance, it depends a bit.

If you're running a stateful firewall, those generally don't evaluate firewall rules for established states, and most of your traffic is to established states, so no big deal.

If you're not running a stateful firewall, it's not totally unreasonable to skip the firewall for tcp packets with ACK and not SYN, so again no big deal on those. But http/3 is udp, so no shortcuts there.

Afaik, most firewalls have a lookup table available, you'd want to use that, rather than 2 million rules. On FreeBSD, ipfw and pf have lookup tables, ipf calls them pools, but it looks like the same thing. A lookup table for IP addresses is pretty fast, even with 2M entries.


> But http/3 is udp, so no shortcuts there.

Usually stateful firewalls create a "state" for UDP connections, so "shortcuts" are still possible. See, for example, pf: https://www.openbsd.org/faq/pf/filter.html#udpstate


> Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.

OpenBSD's pf firewall supports tables of IP address which can be black or whitelisted. From their FAQ:

> A table is ideal for holding a large group of addresses as the lookup time on a table holding 50,000 addresses is only slightly more than for one holding 50 addresses.

https://www.openbsd.org/faq/pf/tables.html


i have little snitch configured to only allow 443 and 80 on my browser and have it display a dialog with accept decline on non-default port requests.


that's cute


I try to be!


Google has been doing a pretty good job at breaking the web on their own via "features" in Chrome, let's not give them any other ideas.


> Also curious what are the performance implications for adding to your firewall about 2 million IPs or maybe a couple 100k if you're brave to do ranges.

The problem here is, that the "bad guys" move around, and sooner or later you ban most of the eg. digitaloceans IPs, amazon IPs, azures IPs, etc., and you break conectivity for other, "good" uses.


Can anyone explain why malware distributors would prefer (or be somehow forced into using) non-default TCP ports?


It's because they typically hack web servers, and then they spawn a fresh web server on a random port.

Anything below 1024 requires root access, which can be problematic, and 80/443 may already be used.


My (naive) guess is that it makes them slightly less likely to be picked up by scanners like shodan. I'm probably wrong though


This is a big part of it. There are companies that scan the internet in search of malware, and if you run on a different port it makes the search space so much bigger.


The low ports typically require root privileges to map to, and people are more likely to keep track of what runs on them.

It's not about Shodan, Shodan will find it. Probably Greynoise too.


because the original owner of the box would mind if the application on default port stops working


or wonder why weird routes like "/i" or "/bin.sh" are among the top accessed URLs in their monitoring dashboards


Their own waf (?) is messing with the data: https://urlhaus.abuse.ch/browse/tag/mirai/ returns "405 banned".


Surprised to see so much malware served from GitHub domains https://urlhaus.abuse.ch/browse.php?search=github


In my day-to-day work, we analyze millions of files every day, and it's well-known and well-utilized detection evasion techniques to host and serve malware from "trusted" websites. It's so widespread that I did extensive research on that issue. There are well-known apps with $Ms in funding and revenue with a plethora of malware hosted on their servers. Some are even used as C2 servers for data exfiltration. I see an increasing number of companies proactively blocking all traffic to those notorious sites to increase overall network security.

The outcome of my research was the following:

- Disjointed content moderation and cybersecurity departments: Not many companies have content moderation teams equipped to perform malware analysis or make cybersecurity-related decisions (the only company that does an exceptional job in this regard is Meta).

- If hosting malware doesn't impact the company's revenue and reputation, the content moderation team has other priorities.

- Section 230: Companies will refer to Section 230 when asked about hosting malicious content or scanning the content for potential malware.


I see a few false positives. It appears that unsigned software is being labeled as malware, and as grayware on some pages.

Unsigned software is not malware or 'grayware'. It's not inherently malicious.

I'm also seeing coin miners being labeled as malware. They often are, but I'm sure there are misclassificatons along those lines as well in this dataset.



How does it keep the records up-to-date? An IP nowadays is highly elastic and can be relocated to different tenants on your cloud provider.


it was never a good idea, but works somewhat


Who are abuse.ch? Are they well-known? I assume the hosts file could be useful to add to pihole?


Sometimes popular domains like drive.google.com get added, and at other times some domains are just that popular to reach a TRANCO rank of 20,000, so I generally advice against blocking using sources like URLhaus.


i generally advise against embedding any blocklists into infrastructure; content policy should be done on clients within reach of the user


Yes, they are fairly well-known. They have been partnering with Spamhaus: https://abuse.ch/blog/abuse-ch-appoints-spamhaus-as-a-licens...


They seem to be partners of spamhaus. From the recent feed [0] it looks like it's often just IP addresses so you would need to add it to your firewall.

[0]: https://urlhaus-api.abuse.ch/v1/urls/recent/


abuse.ch is a non-profit, initially private. Working on cyber security issues for 15 years. Mainly focused on botnets and malware. Since 2021, abuse.ch is under the Institute for Cybersecurity and Engineering ICE at Bern University of Applied Sciences. To date the project has been funded entirely from private-sector donations.

They have mainly two goals:

1 Research: Research into malware and botnets

2 Open source threat intelligence: indicator of compromise – IOC for the public to prevent threats



Great set also for collecting malware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: