
Show HN: The Internet Portal – Warp to a Random Website, Using ICANN CZDS Data - sberman
https://theinternetportal.net/
======
1vuio0pswjnm7
Not every domain registry's zone files are available through CZDS,
unfortunately.

Not every domain listed in a zone file represents a "website".

Choosing a random domain from a zone file and prefixing it with
"[http://](http://)" and having PHP send a GET request certainly does not have
a 100% chance of returning a web page.

(Might be interesting to calculate the probability.)

Seems like the author is not even filtering out A records corresponding to the
NS entries in the zone files, e.g., something like a.ns.domain.tld

Sending a GET request to such subdomains is obviously not going return a web
page.

As for clicking the button over 200 million times (assuming the total domains
listed in the zone files is 200 million), that might violate the ICANN Zone
File Access Agreement. Unless the terms have changed, one of the restrictions
used to be against redestributing the data. This project would not be
redistribution of the IP address data but if the user logs the names there's
an argument it could be redistribution of the name data.

To "click the button" once from the command line

    
    
       curl -s https://theinternetportal.net/php/button.php
       echo

~~~
sberman
It's true that this doesn't list all the websites that are registered, nor do
all the domains lead to a working website. However, I think that most of the
invalid websites are not caused by NS entries. As for the Zone File Access
Agreement, it prohibits uses that allow the access of a significant portion of
the data. An immense amount of time would have to be spent scraping data to
get any portion that could be considered significant.

~~~
1vuio0pswjnm7
Also, there are alternative, publicly accessible ways to get most of this
public zone file data now, so I am not sure that restriction in the access
agreement is anything more than an historical artifact at this point.

You could use publicly available scan data for ports 80 and 443 to pare down
the list of "websites".

The goal of exposing the non-popular web is worthwhile.

~~~
2fast4you
You could port scan the entire IPV4 address space(minus all reserved
addresses), send a GET request to everyone that responds, filter for valid
HTML. It would take no more than 5 hours on a shitty PC, a lot less if you get
a small aws instance.

~~~
luckylion
Most non-major sites are on shared hosting. Without a host name, you won't get
anything useful unfortunately.

~~~
1vuio0pswjnm7
Most major site are on shared hosting. (Sadly)

------
sberman
I started this project after doing some research about how DNS works and
learning about the CZDS, where any interested individual can request access to
DNS zone files. I realized that I could turn this into a website, especially
since I couldn't find anything similar on the internet. I used their Python
API to download all the zone files, then wrote a Python script to scrape them
into one file with only the domain names. I then stored these in a MySQL
database on my web server, and used AJAX + PHP to retrieve and redirect to the
domain. One thing I think is cool about this is that it gives you a sense of
the websites that constitute most of the internet, not just the most popular
ones. And unless you've clicked the button over 200 million times, you are
almost certainly going to get a website you've never seen before.

~~~
1MachineElf
Thank you for this.

~~~
sberman
I'm glad you like it!

------
square_usual
I wonder if it was just chance or due the growth of the internet, but over
half the sites I got were in Chinese. I found that really interesting, since
the Chinese side of the internet is usually so far removed from the English-
speaking world, and we have no idea how the internet is growing or being used
over there.

------
anigbrowl
Was unconvinced of the value of this until I found I site where my bliss was
100% guaranteed.

[https://brothersetup.online/](https://brothersetup.online/)

------
systematical
Most didn't work and when it did well wtf is this
[http://ngazzokah.org/](http://ngazzokah.org/)

~~~
Agentlien
My first reaction was "hey, this is Dutch, I understand this".

Well, the landing page is a very long list of statements about ngazz.
Apparently ngazz is, was, remains, excites, grabs by the throat, stirs,
laughs, and a lot of other things. One of these "ngazz" is a link to the
actual page. It's some small blog/ promo page for ngazz, a band playing a
fusion of rock and jazz.

------
stolenmerch
I tried literally 20 times in a row with no actual website as a result. Did I
get super unlucky or is the web really this rotted?

~~~
fourthark
I got 12 sites in 40 clicks, and some of those I still wasn’t sure. I think it
really is!

------
ralston3
I did "Jump In"s for about 10 minutes. Thought I'd provide some feedback.

I <3 the idea. I've personally wanted to see something like this for a while
as I continuously visit the same 10 sites every day (as do most people).

Some feedback for the next iteration:

\- Maybe ping sites first to see if they're down before jumping? I hit a few
500 and 404s during my jumps. \- Possibly show "content" sites? Said another
way - I jumped into a few business pages XD (lawyers, doctors, and such) and
those aren't the most interesting.

UX and speed all seem pretty decent. Thanks for sharing

~~~
sberman
Thanks for the feedback! I'm thinking about ways to prevent loading broken
websites. I'm not sure it's possible to filter for only a certain type of
website though, I think there are way too many sites for that.

~~~
indigodaddy
I would say 75%+ of all the working sites were parked or expired pages. I
would suggest to remove or re-redirect any sites that resolve to known
registrar parking page IPs (perhaps only assuming if these IPs are distinct
from their webhosting cluster IPs, where actual webhosting customer websites
might live). That might be a good start to at least prune a lot of the parked
sites.

------
Agentlien
I love this type of sites. I tried another some years ago.

After four dead ends and one "under construction" I was brought to
[http://chimneysweepnews.com/](http://chimneysweepnews.com/)

This is exactly the type of thing I had no idea existed, have no interest in,
but seeing it out there still makes me smile.

------
anonytrary
Many of the sites understandably timeout or can't be resolved, or are just
"under construction" or parking pages. However, I did come across this
Japanese site: [http://mottainou.com/](http://mottainou.com/)

For some reason the topic, design, and color scheme makes me very nostalgic.

~~~
square_usual
For those who can't read Japanese, the site is about a small-scale organic
farm in Daisen Town, Tottori prefecture.

------
new_guy
Very nice, but out of 5 I tried 4 were dead sites, and 1 asked for a login
straight away.

Maybe ping the sites and only return active ones?

~~~
sberman
I'm currently working on a way around this similar to what you suggested.

------
iudqnolq
I ended up on 8sectnformats.online, which redirects to
[https://www.google.com/#spf=1600155592494](https://www.google.com/#spf=1600155592494).
I couldn't find anything about them online. Does anyone know what they were?

------
prox
This is fun, my first jump was to a fruit I didn’t know existed, and
apparently is very healthy. It suspiciously does look 100% like black currant
;)

[https://www.lifebrook.com/tonyd/](https://www.lifebrook.com/tonyd/)

~~~
Symbiote
They're not related, but the common name for Aronia is chokeberry, which
should give you some idea of the unprocessed taste.

~~~
prox
That doesn’t sound really enticing! Chokeberries. Is it weird if I am getting
curious about the taste now?

------
falseprofit
I really got my hopes up while
[http://alieninterviews.com/](http://alieninterviews.com/) was loading, but
alas it was just a parked domain.

------
1MachineElf
Half of the sites were basically offline. Amusingly, I found that "Ma'
Business Adviser Services DOT COM!" is now available, although it might
actually be intended for Massachusetts.

------
Karawebnetwork
I'm surprised at how many Asian gambling websites I've stumbled on. I'd have
imagined more pornography.

I guess we need to change the song. The internet is for gambling~

------
fromaj
I seem to have ended up at a chinese sports-betting site
[https://1958abcd.com/](https://1958abcd.com/)

------
eezurr
How come when I go to the new site, uMatrix lists theinternetportal.net as
part of the loading (with an "Other" object)?

Edit: This only happened on the first site.

------
uxamanda
The ultimate rabbit hole. :-) For me it ended up being a good reminder of web
design through the years. Even found a real life <marquee> tag.

------
japanoise
My first result was a Chinese porn website. Thank goodness I'm at home and not
at work right now. You may wish to add a disclaimer about that.

~~~
encom
There already is.

~~~
japanoise
Oh, right, I see it now. It wasn't visible straight away on my phone.

------
Shared404
Cool idea, I did have a little trouble with dead domains, but nothing to
drastic.

Also, I love the visual design on your landing page.

~~~
sberman
Thanks! I don't have very much experience with web design, but I did my best.

------
pcarolan
Such a great idea! Reminds me of the internet before bubbles a la physically
printed indexes and yahoo directories.

------
encom
I just see a spinning purple saw blade of death, and nothing further happens.
Left it for about a minute.

~~~
sberman
Some links won't work, and that's unavoidable. I'm working on a way to
redirect to another site if the first one is broken. In the mean time, just
press the button again for a new link.

~~~
encom
Nothing appears to be loading. It's not like my browser is trying to connect
to something that doesn't exist. Just tried again a bunch of times.

~~~
sberman
You must click the button each time to request a website. The loading page is
just a placeholder, so reloading that page will not bring you anywhere. I'm
not sure if that's what you were doing, but hopefully that helps.

------
brylie
I rolled the dice three times and got two parked domains and a domain
squatter. How was your luck?

~~~
troyjfarrell
I found a site that made me think of Microsoft FrontPage:
[http://www.bichonfrise.ws/](http://www.bichonfrise.ws/)

HTML 4.01 Transitional and table layouts FTW!

------
sam_lynx
Did it a few times, all I got were ads and dead links. Maybe implement a
filtering system?

~~~
sberman
Yes, I'm thinking about a way to implement that. It's too many domains to
filter in advance, but it might be possible to redirect the user to a new site
if the current one is dead.

~~~
sam_lynx
Cool :)

------
blauditore
Here are my results for 10 tries:

Unreachable/dead site: 6

Domain squatter: 2

Partially broken, but actual website: 1

Working website: 1

------
TomJansen
On my third try I got google.com. What is the chance of that!

------
anfilt
The internet needs more stuff like this!

------
theqult
clicked ten times, not a single porn happened. I'm shocked.

