You cannot hide anything on the internet anymore, the full IPv4 range is scanned regularly by multiple entities. If you open a port on a public IP it will get found.
If it's a obscure non-standard port it might take longer, but if it's on any of the standard ports it will get probed very quickly and included tools like shodan.io
The reason why I'm repeating this, is that not everyone knows this. People still (albeit less) put up elastic and mongodb instances with no authentication on public IP's.
The second thing which isn't well known is the Certificate Transparency logs. This is the reason why you can't (without a wildcard cert) hide any HTTPS service. When you ask Let's Encrypt (or any CA actually) to generate veryobscure.domain.tld they will send that to the Certificate Transparency logs. You can find every certificate which was minted for a domain on a tool like https://crt.sh
There are many tools like subdomain.center, https://hackertarget.com/find-dns-host-records/ comes to mind. The most impressive one I've seen, which found more much more than expected, is Detectify (which is a paid service, no affiliation), they seem to combine the passive data collection (like subdomain.center) with active brute to find even more subdomains.
The Certificate Transparency Log is very important. I recently spun up a service with HTTPS certs by Let's Encrypt. By coincidence I was watching the logs. Within just 80 seconds of the certificate being issued I could see the first automated "attacks".
If you get a certificate, be ready for the consequences.
Were these automated "attacks" hitting you by hostname or IP? Because there's a chance you would've been getting them regardless just from people scanning the entire IPv4 space
This is really interesting. For my Homelab I've been playing around with using Lets Encrypt rather than spinning up my own CA. "What's the worst that could happen?"
Getting a wildcard certificate from LE might be a better option, depending on how easy the extra bit of if plumbing is with your lab setup.
You need to use DNS based domain identification, and once you have a cert distribute it to all your services. The former can be automated using various common tools (look at https://github.com/joohoi/acme-dns, self-hosted unless you are only securing toys you don't really care about, if you self host DNS or your registrar doesn't have useful API access) or you can leave that as an every ~ten weeks manual job, the latter involves scripts to update you various services when a new certificate is available (either pushing from where you receive the certificate or picking up from elsewhere). I have a little VM that holds the couple of wildcard certificates (renewing them via DNS01 and acmedns on a separate machine so this one is impossible to see from the outside world), it pushes the new key and certificate out to other hosts (simple SSH to copy over then restart nginx/Apache/other).
Of course you may decide that the shin if your own CA is easier than setting all this up, as you can sign long lived certificates for yourself. I prefer this because I don't need to switch to something else if I decide to give friends/others access to something.
Your top level (sub)domain for the wildcard is still in the transparency logs of course, but nothing under it is.
If you're homelab'ing then you should be using private IPs to host your services anyway. Don't put them on a public IP unless you absolutely have to (eg port 25 for mail).
Use your internal DNS server (eg your routers) for DNS entries for each service. Or if you wish you can put them in public DNS also. Eg. gitlab.myhome.com A 192.168.33.11
You can then access your services over an always-on VPN like wireguard when you're away from home.
Then it doesn't matter if anyone knows what subdomains you have, they can't access them anyway.
If your exposed services use authentication and you use strong passwords you are no worse off than any small business but you have the advantage of being a lesser target.
Tailscale actually does all of the above for you: does the DNS, can register a LE cert, and provides the always-on VPN to allow access when you're away from home.
I was looking for a lazy/easy way to do this manually and settled on KeyStore Explorer, which is a GUI tool that lets you work with various keystores and do everything from making your own CA, to signing and exporting certificates in various formats: https://github.com/kaikramer/keystore-explorer (to me it feels easier than working with OpenSSL directly, provided I trust the tool)
In addition, I also setup mTLS or even basicauth at the web server (reverse proxy) level for some of my sites, which seems to help that little bit more, given that some automated attacks might choose to ignore TLS errors, but won't be able to provide my client certs or the username/password. In addition, I also run fail2ban and mod_security, though that's more opinionated.
I use a wildcard certificate for my home infrastructure. For all the talk of hiding, though, it's wise not to count on hiding behind a wild card. Properly configure your firewalls and network policy. For the services you do have exposed, implement rate limiting and privileged access. I stuck most of my LE services behind Tailscale, so they get their certificates but aren't routable outside my Tailscale network.
Doing something similar on AWS right now, what do you mean by leaking service usage? What is ACM exposing? I assume the “fix” for this would be to host your own CA through ACM?
Recently, I opened 80 and 443 so I could use LetsEncrypt’s acme-client to get a certificate (and then test it). Tightening up security a bit, I configured an http relay to filter people accessing 80 by ip address rather than domain name - some scanners are still trying domain and sub-domain names I was using weeks ago - which goes to show how organised hackers are about attacking targets.
You can use DNS-01 challenge [1] to get certificate. You just need to add temporary TXT record to your DNS. It also supports wildcart certificates.
Most popular DNS providers (like Cloudflare) has API, so it can be easily automated.
I'm using it in my local network: I have publicly available domain for it (intranet.domain.com) and I don't wont to expose my local services to the world to issue certificate trusted by root CA on all my devices. So, this method allows me to issue valid Let's encrypt wildcard cert (*.intranet.domain.com) for all my internal services without opening any ports to the world.
Once you expose something long enough to get scanned. It's going to continue to get scanned pretty much forever.
I self host a couple web services, but none are open, you need strong authentication to get in.
It's not ideal, ideally I'd close the https web traffic and use some form of VPN to get in. But sadly that's just not feasible in my use case. So strong auth it is.
not to underestimate the power of shodan, and oh god don't spin up a default mongo with no auth, but port knocking would seen to counteract this to enough of a degree, not to mention having a service only accessible via Tor.
Yes, you can hide with a little bit of effort. Port knocking or Tor will stop almost any thing (but don't rely on it as the sole protection, just as another layer).
I like to prefix anything "I don't want scraped" with a random prefix, like domain.com/kwo4sx_grafana/ and nobody will find it (as long as you don't link to it anywhere). But I still have auth enabled, but at least I don't have to worry about any automated attacks exploiting it before I have time to patch.
Something as simple as moving SSH on a non standard port reduces the amount of noise from most automated scanners 99% (made up number, but a lot).
> You cannot hide anything on the internet anymore, the full IPv4 range is scanned regularly by multiple entities. If you open a port on a public IP it will get found.
Sure but you might still host multiple virtual hosts (e.g. subdomains) on the same web server. Unless an attacker knows their exact hostnames, they won't be able to access them.
First you can simply try bruteforcing subdomains, secondly if you are using https you can simply pull the cert and look at the aliases listed there. 2 ways off the top of my head.
> This is the reason why you can't (without a wildcard cert)
Guess being security conscious pays off, as testing those on some domains I have, they only managed to show what I want to show, since wildcard will just mask them.
That being said, I don’t think anyone should consider a subdomain as a hidden thing, it’s an address after all and should not be considered hidden, assume it’s accessible or put it behind a FW or VPN and have a proper authentication, security by obscurity never works.
> the full IPv4 range is scanned regularly by multiple entities
Single packet authorization. Server just drops any and all packets unless you send a cryptographically signed packet to it first. To all these observers, it's like the server is not even there.
At my company we got bit by this several months ago. Luckily the database was either empty or only had testing data, but like you said the port was exposed and someone found it.
IPv6 won't get found by brute-force but there are a few projects which tries to gather IPv6 addresses using various means and scans them as they are found.
Shodan did (maybe still does) provided NTP servers to some ntp-pools and scanned anyone who sent incoming requests.
As others have said, certificate transparency seems to be doing some heavy lifting here. It reports subdomains for me that have never had a public CNAME or A record, but have had let's encrypt certs issued for internal use.
It's also missing some that have not had certs issued, but that are in public DNS
If the subdomains aren't supposed to be public, the public also doesn't need to trust the TLS certs. Sign them with your own CA and trust it on the devices that should be able to access the domains.
Where I work, having internal services be accessed by employees’ own, unmanaged devices would be a no-go anyway. It would be considered a huge security loophole.
You can scope CAs with name constraints. However, I believe many implementations ignore constraints on root CAs. Not sure if there is some practical way with cross-signing around that (giving users the choice between trusting your CA and creating their own and cross-signing your CA with that).
I looked before I started using Let’s Encrypt for some internal stuff and there really isn’t a way to use name constraints in a practical way with modern web browsers at this point. If you’re not using a browser, things get a lot easier, but for browsers you sort of got to suck up that you can’t really avoid the “big” internet.
There is a way, I've recently generated my own CA with domain name constraint, trusted it, and used it cross sign my company's self signed CA. It works like a charm.
As sibling poster already wrote, technically you can scope a CA to a set of subdomains only. Or try. The spec entry is "nameConstraints" but for a number of reasons it may not be well supported.
Some of those reasons are absolutely hilarious. I needed to set up an internal CA back in 2015, and wanted to limit the blast radius in case the private key was leaked. (Usually a "when", not "if" scenario.) I learned about the nameConstraints field and tried to use it. OpenSSL would ignore the key in a CSR input file. Okay, fine, the spec has an OID for the field so I reached for the nearest ASN.1 library to construct a modified CSR with the field in place.
OpenSSL broke trying to parse the file. Go's implementation blew up with a magnificent trace. I gave up and the internal CA was generated with a global validity scope.
I later learned that apparently Microsoft's PKI libraries had support for scope limits, but the feature was not used in real life. Likely because if such a thing came into contact with anything else in the wild, the underlying libraries would just implode.
If you had a self-signed client cert with a nameConstraints in the supplied CA chain, you could probably still crash a non-trivial fraction of web servers.
HSTS is about remembering to do an http:// -> https:// redirect. It's not about remembering a cert.
The downside of TOFU in browsers, is that it trains users to always click through cert warnings. Train them to do it once, and they'll click through it again when there's a real attack. The warning is the same on the first time visiting the site and on a later time visiting it if the cert has changed.
The TOFU UX in SSH is better, because it displays a different warning for when SSHing to a site for the first time vs SSHing to a site again and the cert has changed.
Many of our clients send automated updates for our systems for data managed in other services via SFTP. It surprises me that few seem to bother verifying the host fingerprints, just blindly accepting them on first connection, given how paranoid they are (quite rightly, the data contains staff and customer information) otherwise.
Every single company does it. The 3 of them: Asking employees to install a CA, using it for “.internal” resources, then ask employees to use a web proxy and MITM their connections. And optionally, leak the CA’s pk to get pawned. It’s the standard operating procedure of any well-run business.
Non-public usage doesn't necessarily mean that only devices under your direct control need access. Slack needs access to some of my organization's systems, for example, to support the way we collaborate on our projects -- but the general public doesn't and would likely just be confused if they stumbled into one of our infrastructure subdomains instead of visiting our public website.
Yeah. In that case, it's just easier to get a really cheap wild-card cert signed by a low-cost reseller for <50 bucks. They only reason to care about big-name certs is compatibility with all the devices out there, but if you don't need compatibility, then get the cheapest thing you can.
> and trust it on the devices that should be able to access the domains
Sometimes it's not an option. I spent too many hours trying to figure out why some Android apps didn't want to talk with a service I self-hosted. They just ignored my Root CA cert installed on the phone.
I have a single wildcard certificate for my internal domain name and ~10 CNAMEs for various service subdomains in the network (plex.server.com, grafana.server.com, etc). This tool found zero subdomains for my internal domain.
I have a similar setup (*.home.domain.com DNS auth with LE -> service1.home.domain.com etc.) for my personal, but externally reachable domain, and I get the same results. I went the wildcard route just due to a bit of paranoia, nice to see that it actually worked out in this case.
As this (I expect) heavily uses cert transparency in the background, I want to point out another use case for that service. You can search the CT logs with wildcards to find your domain "neighbors" on other TLDs: https://crt.sh/?Identity=google.%25&match=ILIKE This usually gives you somewhat more active websites compared to just checking whether you can register the domain and somewhat weeds out squatted domains. I found that for our company one TLD contained a NSFW games store that way.
At work we have a wildcard certificate for most services we host on our own infrastructure. Most public websites have been detected, and some internal ones which have probably been referenced in public GitHub issues and so on.
They've done simple reverse DNS lookups on our public IP range and indexed all those hostnames.
Certificate transparency logs have found names used for externally hosted websites.
There are some pretty old hostnames which haven't been used for 5 years or more, and were probably found with reverse DNS at the time.
It knows many of the wildcard-served customer subdomains of one of my former employers. (They're probably just scraped from search or something, but a wildcard is not sufficient to prevent discovery.)
I would be keen to know what techniques are used. Usually subdomain discovery is done with dns axfr transfer request which leaks the entire dns zone (but this only works on ancient and unpatched nameservers) or with dictionary attacks. There are some other techniques you can check if you look at the source code of amass (open source Golang reconnaissance/security tool), or CT logs. Dns dumpster is one of the tools I used alongside pentest tools (commercial) and amass (oss)
* Apache Nutch - So they're crawling either some part of the root itself or some other websites to find subdomains. Honestly might help to query CommonCrawl too.
* Calidog's Certstream - As you said, you can look at the CT logs
* OpenAI Embeddings - So I guess it also uses LLM to try to generate ones to test too.
* Proprietary Tools - your guess is as good as mine
Probably a common list of subdomains to test against too.
Seems like multiple techniques to try to squeeze out as much info as possible.
Could that later standard be NSEC3? It’s like the easily walkable NSEC, but with hashed names and special flags for opting out of delegation security features. The 3 appears to stand for the number of people that fully understand how it works…
How can one avoid their browsing ending up in the passive DNS logs? For example, is using 1.1.1.1, 8.8.8.8, or 9.9.9.9 (CF, Google, and Quad9, respectively) good or bad in this regard?
For example, where does Spamhaus get their passive DNS data? They write [1] that it comes from "trusted third parties, including hosting companies, enterprises, and ISPs." But that's rather vague. Are CF, Google, and Quad9 some of those "hosting companies, enterprises, and ISPs"?
I am totally fine with my ISP seeing my DNS traffic (it is bound by GDPR & more; I trust it more than CF or Google). I want to ensure the DNS traffic info does not leave my ISP (other than to other DNS resolvers recursively).
And as per Spamhaus, the DNS traffic in a datacenter may still end up in the Spamhaus passive DNS DB.
Interesting. Our domain has some subdomains with a numeric suffix; and the API response here has entries in that pattern for not only the particular subdomains that exist or ever existed, but also for subdomains of the same pattern that go beyond any suffix number we've ever actually used.
You'd think they'd at least be filtering their response by checking which subdomains actually have an A/AAAA/CNAME record on them...
For my personal domain: it got the ones I have on the SSL cert alternative subject names, made up three, returned one I deleted more than a year ago, and didn't find two. Very curious.
Those SAN and CN names will appear in publicly visible certificate transparency lists ( https://en.wikipedia.org/wiki/Certificate_Transparency ): so if you ever get a TLS certificate for a super-seeekret internal sub-sub-sub-domain-name from a major CA then it won't be secret for long. The only way to keep a publicly-resolvable DNS subdomain confidential is to either get a wildcard cert for the parent domain or find a dodgy (yet somehow widely-trusted) CA that doesn't particiate in CT - or use a self-signed cert.
This subdomain.center database returned one of my "private" sub-sub-domains (which just points to my NAS) for which I did get a cert from LetsEncrypt, but it doesn't have any of my other sub-sub domains listed (despite resolving to the same A IPv4 address as the listed subdomain) because those subdomains have only ever been secured by a wildcard cert.
Interesting. It only found less that a quarter of the subdomains of the site I work on, and everything it did find is public facing. I wonder if that’s maybe something to do with how we set up certificates for public vs internal subdomains? It even missed “staging.” which should be nearly identical in configuration to www
Note, if you looked up a domain and it had no results, you should check back again after some minutes. I looked my domain up and had zero results, which was weird as it should at least find some in the ct logs, but a few minutes later it showed some subdomains.
It took about 5 minutes for me. It found my apex domain and a sub-domain that must have belonged to the previous renter of my domain name. [1] So I was curious and it turns out the previous renters pages were in Wayback. [2] That page renders as mostly little boxes for me. Funny, I had never bothered to check that. I should check if any of my other domains have snapshots from before I rented them.
Web archive can also somewhat act as a subdomain finder (not really in this case, only the www subdomain, but still interesting):
https://web.archive.org/web/*/ohblog.net*
I would assume so. I tested on one of my private domains that generally isn't linked to anywhere, and it just returned the few domains that I generate Let's Encrypt certs for, plus my nameservers.
Interestingly, I did not receive any DNS queries on my authoritative nameservers during the query, so they don't seem to be doing any active DNS probes.
it may utilize a few techniques as there are subdomains I am aware of that've never been published other than in the zone config on my registrar that are returned from api query
I use Siteground and it has a staging server that AFAIK hasn't been used for at least 6 years ...
Nothing at the host has any details of that, archive.org doesn't have it in their site URLs, it's not in DNS records, not in .well-known, it was a transient test years ago ... really curious, must be historic data from somewhere?
I use Cloudflare for DNS and the only ones it found had LE certs. It's not doing a simple brute-force on common names, I don't think. Otherwise it probably would have found a lot more. Curious about how it works.
If this were able to determine which wildcard subdomains were active for a given domain, you could use it to figure out a lot of B2B companies’ client/customer list.
I'm a somewhat old coot and do remember those days, but I think the term still makes sense but only in a lan environment.
machines still have hostnames, and home routers will often trust your dhcp clients machine name.
So I can still look up steamdeck.lan and find the IP of my steam deck and in that context calling it a machine name is perfectly apt and I think would still be well understood.
It gave me empty results for some of my domains that have multiple subdomains that have TLS certificate associated with them so that must appear in the certificate transparency log.
I guess it should be "discover some subdomains for some domains".
What kind of security considerations are there to having multi-tenant user applications on subdomains and then having them exposed like this?
I'm building a SaaS right now, and I guess one thing is that a given username can then be discovered as a valid login for the system...but obviously that's only part of the login credential.
Maintaining a list of mappings to opaque subdomains seems to reduce targeting, and conceal login partial credentials, but doesn't seem to offer much besides.
It doesn’t seem to detect subdomains set up with Kubernetes ingresses, based on results for one of my domains, so that might be a place to start research.
One thing I noticed looking at my logs is that there is almost no unsolicited traffic (i.e. failed authentication attempts, exploits of various worldpress bugs, etc) through ipv6. I think it's a function of 1) those coming from networks (compromised home devices, etc) that don't support v6, 2) the v6 address space being too large to scan (the size of an encryption key), so good security by obscurity. This would nullify 2).
I got back an empty list for my domain on Cloudflare with several subdomains (non wildcard)
edit: I retried on my computer (was on my phone earlier) and now it returns all of our subdomains, even picking up our test R2 bucket. In guessing I was rate limited because I accidentally loaded the example file a few times
Sublist3r [1] does a similar job, as long as you have the authorisation to use it on a particular domain, as it uses more aggressive discovery techniques.
Only that actually works. I get hundreds of entries for my domain there, including entries before Lets Encrypt was a thing, while the subdomain checker returns an empty array.
If it's a obscure non-standard port it might take longer, but if it's on any of the standard ports it will get probed very quickly and included tools like shodan.io
The reason why I'm repeating this, is that not everyone knows this. People still (albeit less) put up elastic and mongodb instances with no authentication on public IP's.
The second thing which isn't well known is the Certificate Transparency logs. This is the reason why you can't (without a wildcard cert) hide any HTTPS service. When you ask Let's Encrypt (or any CA actually) to generate veryobscure.domain.tld they will send that to the Certificate Transparency logs. You can find every certificate which was minted for a domain on a tool like https://crt.sh
There are many tools like subdomain.center, https://hackertarget.com/find-dns-host-records/ comes to mind. The most impressive one I've seen, which found more much more than expected, is Detectify (which is a paid service, no affiliation), they seem to combine the passive data collection (like subdomain.center) with active brute to find even more subdomains.
But you can probably get 95% there by using CT and a brute-force tool like https://github.com/aboul3la/Sublist3r