
Ask HN: How do I compile a list of censored websites in my country - amingilani
Hello<p>I live in Pakistan and my government censors websites that contain &quot;immoral&quot; content. Sometimes random websites get caught in the censors, e.g pastebin.com<p>I wanted to build a portal that kept check of which websites are blocked, but I&#x27;m unsure about how to go about this:<p>1. I could download the zonefiles for all the popular tlds and test them, but this will exclude unpopular tlds<p>2. I could crawl the web but this will turn into a more expensive side project than I have the funds for.<p>3... any suggestions?
======
loopbacker
Most dns servers don't allow zone transfers, so I don't believe you'll be able
to enumerate domains like this.

In general enumerating domains is not really possible (you can't just dump all
North Korean domains for example, and people have tried).

I'd start by figuring out how they are blocking sites. Probing to see which
IPs are blocked would be much easier. There are various tools to lookup which
sites are hosted on a given IP address (using crawled data).

One way to do this would be to try connecting from outside Pakistan (e.g. EC2
server) and from inside and seeing if a IP is blocked.

Scanning all 4 billion IPs surprisingly doesn't take too long on a modern
internet connection.

------
Raed667
My two cents: Don't do it on your own. Contact NGOs that work on the
censorship issues, EFF, AccessNow, FLD, etc.. They have been working on these
issue for a while and probably can help you with their resources.

~~~
amingilani
I might after I build the site :)

------
jjoe
I think you should crowd source it. Build a simple two-page site where people
can submit censored websites and another where you display them. But your very
own website could end up on the list.

~~~
amingilani
That is a wonderful idea!

------
AdamTheAnlayst
I own [https://www.DomainDetect.io](https://www.DomainDetect.io), we pull new
domains from Zone files/3rd party sources daily. ~200,000-400,000 domains per
day to alert on new phishing sites for businesses. So we do something similar.
I would start with something like
[https://czds.icann.org/en](https://czds.icann.org/en) attempting to resolve
them asyncronously in your country, then work your way through the new daily
additions. Use an ElasticSearch/Cassandra backend for speed to query that
volume of information.

Your problem will come from the countries/TLD's that dont play ball with their
zone files. These are trickier to obtain.

------
miguelrochefort
Yours will soon be on that list.

~~~
amingilani
I'd like to see that happen. I don't take kindly to being censored arbitrarily
and will fight tooth and nail to the full extent of the law :)

