Hacker News new | past | comments | ask | show | jobs | submit login
Whois: Fragile, Unparseable, Obsolete (netmeister.org)
128 points by ementally on Sept 24, 2022 | hide | past | favorite | 51 comments



Once worked on a whois scraping project and ran into a bunch of issues.

One particularly fun story is how we might have broken a whois server. It was the country TLD server for some West African nation, I think Senegal but I'm not sure. We hit their server with like a hundred queries in rapid succession (to test what rate limiting approach they used) and requests started hanging. We switched IP addresses ... and still requests were hanging. We tried multiple IP addresses in totally different networks, all of them hung or timed out, even for a single request. A day later we retried and all of a sudden it started working again! From that point on we made sure to never do more than a couple requests a second to that particular domain.

Also, any queries to one cc TLD (either Egypt or Ukraine, can't remember which) just returned "we don't provide information in whois requests" or something to that effect.

GoDaddy didn't do traditional rate limiting. If you exceeded whatever their limit was they didn't just return an error message, they would blacklist your IP and for any query say "visit our website for information", and their website gated things behind a captcha.


A lot of Linux based command line whois clients still use "whois-servers.net" to look up up the appropriate whois server for a TLD, which is long dead (it still responds but is no longer maintained).

Many years ago I built a replacement whoisservers.org, and tried contacting a few maintainers, but nobody seemed to really care.

If you want to make use of it, you can run "whois -h com.whoisservers.org exmaple.com" (or substitute -h with the appropriate flag for your client to specify a server)


Similar to whois.geektools.com, then?


Geektools actually does the whois for you.

Mine points your client to the appropriate whois server you need to talk to for a specific TLD. Like I said the same functionality is already built into a few whois clients, they are just using a DNS zone that is no longer updated.


I ran into the same issue but worked around it slightly differently: have my code use RDAP, and then have an RDAP->WHOIS proxy [1]. There are usually rate-limits on WHOIS, so public instances won't survive long, but it works for me and you can run locally.

I also hunted (s/whois/rdap/g) around for undocumented RDAP servers and found a few. There are still a lot of TLDs without RDAP though [2].

[1] https://rdap.redirect2.me/ (source at https://github.com/redirect2me/rdap-proxy)

[2] https://resolve.rs/domains/rdap-missing.html


Here’s a random thing I made for RDAP a long long time ago. It has lots of bugs but has come in handy from time to time: https://rdap-explorer.chris-wells.net/



Lots of IPs break it haha. Maybe now I’ll take some time to look in to that…


Fixed it!


Interesting, was not aware of RDAP, thank you.


Does anyone here run their own whois for their own domains using srv records? If so, how many hits per day do you get? I'm curious because I have never seen anyone request srv _nicname_.tcp. from my nameservers.


> Does anyone here run their own whois for their own domains using srv records?

I don't think that's possible. WHOIS, by design, is controlled by the domain registry, which may delegate it to registrars -- the owner of the domain may have some limited control over the contents (like the registrant information), but they don't get to control it fully, and I've certainly never seen a registrar delegate WHOIS to the domain owner.


Makes sense. I've only ever seen it delegated when I would swip out a cidr block to a b2b customer and even then the people I interacted with never asked to run their own whois, only custom PTR delegation.

I can not find any whois clients that support this expired ietf draft [1] so I assume it was abandoned.

[1] - https://datatracker.ietf.org/doc/html/draft-sanz-whois-srv-0...


Why do domains have WHOIS records anyway? I get why IP blocks have it because machines actually do things from behind IP addresses, but the only thing I'm doing from a domain name is stopping other people from using it.

Someone is hosting copyrighted content? Look up that machine's IP-WHOIS.

Someone is trying to DDOS me? Look up that machine's IP-WHOIS.

Someone is holding a domain I want? If their answer is going to be anything other than a straight "no", they'll happily provide a way to be contacted.

Please tell me how I'm wrong.


A single IP can host many domains, each of which may have separate technical and administrative contacts. Conversely, different subdomains (and MX for email) can live on different IPs. If I use dyndns, there isn’t any fixed relation between IP and domain at all. I happen to own several domains, but I don’t own the IPs where they are hosted.


But why would I, a complete stranger to you, need to look up that contact information? We both hold a number of domains, but that alone isn't a reason for needing to contact each other.


For example if I published something on my web page you have privacy or copyright claims to. Or if my mail server errors out in the middle of trying to send email to your mail server. Or if the domain name is violating someone else's rights. Or if some public service or functionality is broken on the domain and you want to inform those who are in charge of the domain. Tnere are any number of reasons why you'd want to contact a domain registrant or the relevant technical or administrative contact.


It probably made more sense in the pre-web Internet, when not all domains were necessarily serving web traffic. Or had any obvious or standardized way of serving a "contact us" page.


The DNS SOA record has an RNAME field that is available to convey this information.


See also the RP (Responsible Person) record; RFC 1183: https://www.rfc-editor.org/rfc/rfc1183.html


These days they are somewhat useless as every domain name provider redacts everything about the owner.


IPs are shared by many machines these days, in particular CGNAT bundles many machines in potentially very disperse locations together. You can't identify the source of DDOS by whois in that case.


You can, it's just the answer will be "Someone using (ISP)". Even then, that ISP will be the one to contact.


I use to have a very reliable but simple python script that asks the whois server for the TLD then gets the whois server from it and asks that server the whois of the apex and if it gets another whois it asks that server and so on recursively. It was more reliable than any service I found because a lot of TLDs that aren't mainstream or run by random countries have their own finnicy whois with some custom weird webui bur whois on port 43 is always there and contains a lot more info than RIR whois which is what most services tend to show.

Some parameters are reliably there and in a way it is very easy to parse since it is key value separated by a colon (cut -d ':' -f 1,2) but there is no "schema" you can follow and sometimes I saw unique and extra additions by some servers and missing critical fields by others. "Your domain is compromised, bad guys are doing bad stuff with it" how do I reliably find out the right contact for example? That last bit was always a manual excercise.


> I use to have a very reliable but simple python script that asks the whois server for the TLD then gets the whois server from it and asks that server the whois of the apex and if it gets another whois it asks that server and so on recursively.

Don’t modern whois clients all do this? (I.e. not the one available in, say, macOS.)


Possibly but I was stuck using windows at the time.


Yes, 100%. I’m trying to use registration information for cybersecurity stuff, and it’s a mess. Some TLDs just doesn’t provide that information or provide it only to registered accounts or only inside their country. Parsing is a mess. Many have rate limits, like .au has 20 requests/day, .cz - 100 day, but with delay of 3 minutes between requests, …


Whois was killed off by the European Union passing the GDPR. It really changed how I use the internet for the worse. In the old days I could always at least send an email to a domain hosting a service. Now there's no default contact information and everything is behind walled gardens.

Email was the great communicator. Removing it from WHOIS made the web more fragile and broken. But whois doesn't have to be that way and the problems are not intrinsic to whois. They are stemming from political interference done with good intentions but bad outcomes.


To be fair, when I recently found all my historical addresses (past 15 years or so) connected to my name on some random website there was only one source they could’ve gotten that information from.

I’m inclined to believe that most people looking up WHOIS details have good intentions, but clearly there are people that use them for their own purposes.


Mentioned in TFA FWIW:

the ICANN [contact disclosure] requirement now does indeed conflict with modern privacy laws, such as the EU's GDPR, meaning all domains registered by European registries are in violation of either GDPR or ICANN's requirement.


That's only for individuals. For businesses, the reverse is true. It's illegal to run an anonymous business on the Internet in the EU.[1] That's in the Directive on Electronic Commerce.

General information to be provided

1. In addition to other information requirements established by Community law, Member States shall ensure that the service provider shall render easily, directly and permanently accessible to the recipients of the service and competent authorities, at least the following information:

(a) the name of the service provider;

(b) the geographic address at which the service provider is established;

(c) the details of the service provider, including his electronic mail address, which allow him to be contacted rapidly and communicated with in a direct and effective manner;

(d) where the service provider is registered in a trade or similar public register, the trade register in which the service provider is entered and his registration number, or equivalent means of identification in that register;

(e) where the activity is subject to an authorisation scheme, the particulars of the relevant supervisory authority;

The EU wants to make cross-border commerce work. So, they want customers to be able to find sellers should there be a problem.

[1] https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX...


I don't follow. How does Whois now having privacy - which had nothing to do with GDPR - make the internet worse?


In the old days (up to ~2018) you could always contact people on the internet. Now you cannot contact people. That's bad. It's like the switch from knowing and being able to talk your neighbors to not knowing and not talking to your neighbors. Like the switch from a small town to a megacity.

And you're completely wrong about GDPR. It is the primary, if not exclusive, reason most registrars in most regions have removed WHOIS information. ref: https://www.icann.org/resources/pages/gtld-registration-data... https://circleid.com/posts/20210119-whois-record-redaction-a...


WHOIS having privacy guards was a thing at least as far back as 15 years ago, as memory serves.


I'm aware. Look at the plots of WHOIS privacy vs open once ICANN was forced to allow it by the GDPR. They're in my second link. You cannot deny that correlation so I assume you just didn't bother looking.


So basically people who were unaware of the fact that their private data was being served publicly are now protected by default thanks to the GDPR.


In a similar way ftp clients are guessing what is filename when they parse the output of "dir" command.


FTP solved this in 2007 with RFC 3659, which includes the MLST command.


One of many reasons FTP is moribund.


I can't get the page to load for some reason, but I don't think whois is obsolete. I used it via command line to search for available domains when I was creating my blog. It was simple and effective for that purpose.


I've been using https://github.com/likexian/whois This is a Go WHOIS client that is able to emit JSON.

I would like to use RDAP instead but RDAP coverage is even spottier than WHOIS.

BTW, if anyone knows of a WHOIS server able to handle the .de TLD, please let me know as all I get right now is "The DENIC whois service on port 43 doesn't disclose any information"


It's sad we can't improve and build modern APIs that can support load and querying and exactly why companies exist whose main business function is scraping services like WhoIs, Social Media, or Sites behind cloudflare.


As the article observes, we can and are doing that: that's what RDAP is.


And the linked article, written in January, is out of date on that front. The timeline has finally been set and Whois is transitioning to RDAP for all new gTLDs next year.


A few years back I tried building my own Whois parser and you’re right, it’s a mess. Before you even factor in all the fun rate limiting.


What's needed is for Facebook and Google to gin up a replacement!


… or use RDAP:

> since 2019, ICANN requires registrars and registries to implement an RDAP service.

At least, as much as one can. It'd be nice if ICANN required registries to implement an RDAP service.


once upon a time i wrote a whois server


Is the source available to play with? I wanted to fool around with hosting a whois server but couldn't find any software out there. The server would be completely useless for anyone else but would be fun to use some time learning about this old protocol/tech/etc.


Sorry, no, it belongs to an tld operator.


Speaking of fragile perhaps..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: