
Ask HN: How do you check a url is valid? - lookingfj
So it is trivial to check a url is valid using regex, but if you wanted to take this one step further and make sure it is a valid domain name that is registered and actually in use...how would you do this? I have a few ideas for how to achieve this with a Microservice but I feel like others may have solved this problem before and there may be better solutions out there.
======
pwg
First, you need to be precise in what you mean by "valid".

"Valid" can encompass at least these four possibilities:

1) the url follows the correct syntax for url's;

2) the url is valid as per #1 and further the "host" portion of the url (when
it contains a name) can be resolved to an IP address;

3) the url is valid as per #2 and further there is a server located at the
host (and optional port) value encoded in the URL that responds to requests;

4) The url is valid as per #3 and further the path and/or query and fragment
parts defines a valid path on the server running at the host:port encoded in
the url.

#1 you can do yourself, as it is just a check that the syntax is correct.

All of numbers 2-4 require some form of 'lookup' occur from some other system
in order to verify 'validity'.

------
lookingfj
So I think in this instance I would deem valid to be: 1) the url is the
correct format 2) the url resolves to an ip address 3) the url is registered
and is in use. By this I mean it's not one of the "this domain name is for
sale" pages.

Number 3 is the novel and challenging piece of this.

------
icebraining
Check Whois, DNS and make an HTTP request?

This feels like an XY Problem, though. What are you trying to achieve by
checking if the URL is valid?

~~~
lookingfj
That the url is valid

------
ultrablue
curllib will tell you whether there's something there, presuming the network
is available.

In fact, a simple HEAD request will suffice for that.

That would also prove that the domain is registered, presuming DNS is working.

~~~
icebraining
That's not enough; some registrars keep a domain parked and responding after
the registration has expired.

~~~
lookingfj
And this is exactly the problem I want to solve and the key challenge

