
Show HN: DNS over Wikipedia - aaronjanse
https://github.com/aaronjanse/dns-over-wikipedia
======
aaronjanse
Hey HN,

I saw a thread a while ago (linked in README) discussing how Wikipedia does a
good job keeping track of the domains of websites like Sci-Hub or The Pirate
Bay. Someone mentioned checking Wikipedia to find links to these sites, so I
thought this would be a fun thing to automate!

To try it out, install an extension or modify your hosts file, then type in
the name of a website with the TLD `.idk`.

For example: scihub.idk -> sci-hub.tw

Cheers!

~~~
Polylactic_acid
Its incredible how insane this seems from the title but how practical it
sounds from the readme..

~~~
basch
Right? Basically a modern im feeling lucky meets meta-dns.

------
HugoDaniel
DNS translates a name into an IP address. This is not DNS per-se, it is just a
search plugin for the url bar.

If an analogy was needed with a network service perhaps this is more like a
proxy redirector than DNS.

Keep in mind: with this you will still be misdirected if your DNS/hosts file
is pointing the name into a different IP than it should be.

~~~
capableweb
Indeed. Even the GitHub repositories description has this error.

> Resolve DNS queries using the official link found on a topic's Wikipedia
> page

@aaronjanse: you probably want to correct this. "Resolving DNS records" carry
a specific meaning in that you have a DNS record and you "resolve" it to a
value, which actually. You're kind of doing, in a way, I suppose.

I was convinced when I started writing this comment that calling this "resolve
dns queries" is wrong. But thinking about it, DNS resolving is not necessarily
resolving a "name into a IP-address" as @HugoDaniel in the comment I'm
replying to is saying (think CNAME records and all the others that don't have
IP addresses). It's just taking something and making it into something else,
traditionally over DNS servers. But I guess you could argue that this is
resolving a name into a different name, that then gets resolved into a IP
address. So it's like a overlay over DNS resolving.

Meh, in the end I'm torn. Anyone else wanna give it a shot?

~~~
penagwin
I mean once we account for all the different types of DNS records - regardless
of its original intent, isn't it essentially just a networked, hierarchical
key store? For example the TXT field is "dns".

This project is still doing key -> value. It just fetches the value from
Wikipedia first, much like your normal dns servers have to fetch non-cached
keys from their sources (other dns servers normally)?

~~~
sfoley
Just because it's doing key-value pairs does not mean it's DNS. If I can't
`dig` it, it's not DNS. This is simply doing HTTP redirects and works with no
other protocols.

~~~
capableweb
Hm, there are plenty of DNS records (not to mention all the custom ones all
around the world) that you won't be able to `dig` but most people would still
call DNS.

~~~
icedchai
Can you provide an example of one that you can't 'dig'? I have my doubts.

~~~
capableweb
One example: I can't seem to get dig to work with URI records (but I might be
missing some flag). Doing `dig URI _kerberos.hasvickygoneonholiday.com`
returns "no servers could be reached" while doing `kdig URI
_kerberos.hasvickygoneonholiday.com` returns the proper records.

So seems to be a proper DNS record, but I can't make it work with dig.

~~~
icedchai
Plain old "dig" works for me! I suspect it may be an older version of dig
you're using? This is DiG 9.11.5-P4-5.1ubuntu2.1-Ubuntu on Ubuntu 19.10 ...

~~~
capableweb
Strange, but thanks for letting me know! I'm on my Ubuntu laptop now, so `DiG
9.11.3-1ubuntu1.11-Ubuntu` and it works too! But initially tried on my Arch
Linux desktop, where it didn't work, and I would expect my desktop to run a
more recent version than my laptop. Very strange, will take another look.

------
jakear
> If you Google "Piratebay", the first search result is a fake "thepirate-
> bay.org" (with a dash) but the Wikipedia article lists the right one. — shpx

How interesting. Bing doesn't do this, which leads me to believe it's not a
matter of legality. Is Google simply electing to self-censor results that it'd
prefer it's used not to know about? Strange move, especially given the
alternative Google does index is almost definitely more nefarious.

~~~
tomcooks
Google does list proper pirating sites!

At the bottom of the page click on the DMCA complaint, you'll find all the
URLs you shouldn't ever, never ever, click on~

~~~
philips4350
Whats funny and ironic is that this actually makes finding pirated content
much easier since only actual sites that contain pirated content are the ones
that will be listed on DMCA complaint list

~~~
tomcooks
Yes I wonder if these URLs have to be made public by law in DMCA notices.

I assume that, if they legally could, they wouldn't show you anything

~~~
kevin_thibedeau
The notices don't have to be disclosed to anyone but the alleged infringer.
The URLs don't have to be hyperlinked either. This is one part of Google
giving the trolls a middle finger.

~~~
StillBored
Google should index them all on a separate page. For science of course.

More than once I've done a search for something pedestrian (no intent for
piracy/etc) only to notice the "some results removed" link. Out of curiosity
I've clicked it, just to see what crazy things have been removed, and been
quite amused/interested in the results.

------
frei
Pretty neat! Similarly, I often use Wikipedia to find translations for
specific technical terms that aren't in bilingual dictionaries or Google
Translate. If you go to a wiki page about a term, there are usually many links
on the sidebar to versions in other languages, which are usually titled with
the canonical term in that language.

~~~
nitrogen
Out of curiosity, how well does Wiktionary fare in this regard?

~~~
greenpresident
I use it primarily for cooking ingredients. The names on some unconventional
grains and vegetables are easy to translate using this method and not always
available in conventional dictionaries.

It would also be useful for identifying cuts of meat, as US cuts and, for
example, Italian cuts differ not only in name but in how they are made.
Compare the images on this article for an example of what I mean:

[https://en.wikipedia.org/wiki/Cut_of_beef](https://en.wikipedia.org/wiki/Cut_of_beef)

~~~
carlob
I own an illustrated encyclopedia of Italian food: there are 9 pages of
regional cuts of beef!

That's what you get in a country that unified 160 years ago...

------
segfaultbuserr
There's a risk of phishing by editing Wikipedia articles if the plugin gets
popular. Perhaps it's useful to crosscheck the current URL against the 24-hour
earlier and 48-hour earlier versions of the same article. Crosscheck back in
time, not back in revision, since one can spam the history by making a lot of
edits.

~~~
Asmod4n
I believe the German version of Wikipedia had(has?) a feature where you only
get verified versions of a page when you browse it anonymously.

~~~
hk__2
> I believe the German version of Wikipedia had(has?) a feature where you only
> get verified versions of a page when you browse it anonymously.

What’s a “verified version”? Who verifies?

~~~
fragmede
[https://en.wikipedia.org/wiki/Wikipedia:Verifiability](https://en.wikipedia.org/wiki/Wikipedia:Verifiability)

~~~
hk__2
Verifiability is about the ability to check some information using reliable
sources ; it has nothing to do with having “verified versions” of pages.

~~~
hn_101
[https://en.wikipedia.org/wiki/Wikipedia:Pending_changes](https://en.wikipedia.org/wiki/Wikipedia:Pending_changes)

~~~
hk__2
Thanks, that’s more like this.

------
hk__2
> Wikipedia keeps track of official URLs for popular websites

This should be Wikidata. Wikipedia does that, but this is more and more moved
into Wikidata. This is a good thing, because Wikidata is much easier to query,
and the official website of an entity is stored at a single place, that is
then reused by all articles about that entity in all languages.

------
snek
The extension has nothing to do with DNS, a more accurate name would be
"autocorrect over wikipedia".

The rust server set up with dnsmasq is a legit DNS server though.

~~~
MatthewWilkes
It isn't autocorrect either. It's domain name resolution.

------
abiogenesis
Nitpicking: Technically it's not DNS as it doesn't resolve names to addresses.
Maybe CNAME over Wikipedia?

~~~
usmannk
Nitpicking nitpicking: "Technically" CNAME is DNS insofar as DNS is
"technically" defined at all.

~~~
datalist
It is not even a CNAME. It is a JavaScript redirect based on the the response
of an HTTP request to Wikipedia.

------
LinuxBender
This may be a little off topic, but has anyone ever considered a web standard
that includes a cryptographic signed file in a standard "well known" location
that would contain content such as

\- Domains used by the site (first party)

\- Domains used by the site (third party)

\- Methods allowed per domain.

\- CDN's used by the site

\- A records and their current IP addresses

\- Reporting URL for errors

Then include the public keys for that payload in DNS and in the APEX of the
domain? Perhaps a browser add-on could verify the content and report errors
back to a standard reporting URL with some technical data that would show
which ISP is potentially being tampered with? Does something like this already
exist beyond DANE? Similar to HSTS maybe the browser could cache some of this
info and show diffs in the report? Maybe the crypto keys learned for a domain
could also be cached and warn the user if something has changed (show diff and
option to report)? Maybe more complex would be a system that allows a
consensus aggregation of data to be ingested by users so they may start off in
a hostile network and some trusted domains populated by the browser in
advance, also similar to HSTS?

~~~
andrekorol
That's a good use case for blockchain, in regards to the "consensus
aggregation of data" that you mentioned.

~~~
Spivak
Why would you need a blockchain for this? This would just be a text document
sitting at $domain/.well-known/$blah and verifiable by virtue of being signed
with a cert that's valid for $domain.

------
renewiltord
This is hecka cool. What a clever concept! I like the idea of piggy-backing on
top of a mechanism that is sort of kept in the right state by consensus.

------
blattimwind
Wouldn't this be an excellent use case for Wikidata?

For example looking up "sci hub" on Wikidata leads to
[https://www.wikidata.org/wiki/Q21980377](https://www.wikidata.org/wiki/Q21980377)
which has an "official website" field.

------
oefrha
Pretty cool, although legally gray content distribution sites like Libgen,
TPB, KAT, etc. are often or often better thought of as a collection of mirrors
where any mirror (including the main site, if there is one) could be
unavailable at any given time.

------
gbear605
One concern is that you can’t always trust the Wikipedia link. For example, in
this edit [1] to the Equifax page, a spammer changed the link to a spam site.
They’re usually fixed quickly, but it’s not guaranteed. So it’s a really neat
project, but be careful about actually using it, especially for sensitive
websites.

[1]:
[https://en.wikipedia.org/w/index.php?title=Equifax&diff=9455...](https://en.wikipedia.org/w/index.php?title=Equifax&diff=945519521&diffmode=source)

~~~
edjrage
True, seems pretty risky. Maybe the extension could take advantage of the edit
history and warn the user about recent changes?

Edit: Unrelated to this issue, but I have a more general idea for the kinds of
inputs this extension may accept. It could be an omnibox command [0] that
takes the input text, passes it through some search engine with
"site:wikipedia.org", visits the first result and finally grabs the URL. So
you don't have to know any part of the URL - you can just type the name of the
thing.

[0]:
[https://developer.chrome.com/extensions/omnibox](https://developer.chrome.com/extensions/omnibox)

------
29athrowaway
Many Wikipedia articles can be edited by anyone. This is not secure.

------
jrockway
Why does Google censor results, but not Wikipedia? It seems like you can DMCA
Wikipedia just as easily as Google.

Overall this is a nifty hack and I like it a lot. Wikipedia has an edit
history, and a DNS changelog is something that is very interesting to have.
People can change things and phish users of this service, of course, but with
the edit log you can see when and potentially why. That kind of transparency
is pretty scary to someone that wants to do something malicious or nefarious.

~~~
jhasse
Google also sells copyrighted content, Wikipedia doesn't.

------
leoh
Nice work! Sometimes I seem to be directed to a wikipedia page as opposed to a
URL. For example, with `aaronsw.idk` or `google.idk`. I wonder why that's the
case?

~~~
O_H_E
I think it directs to the correct link when it is labeled `URL` in wiki. In
the other cases the link is labeled `Website`.

~~~
aaronjanse
This was exactly the issue! I just pushed fixes for this problem.

~~~
cooper12
I've written a userscript[0] before regarding official websites and I feel
this is the hierarchy you should be using:

1\. Try getting the Wikidata "official website" property

2\. Then any link inside of a {{url}} template or |website= in an infobox

3\. And if you really want to try to get something to resolve to, the first
site wrapped in {{official website}}

If you need code to reference:
[https://en.wikipedia.org/wiki/User:Opencooper/domainRedirect...](https://en.wikipedia.org/wiki/User:Opencooper/domainRedirect.js)

[0]:
[https://en.wikipedia.org/wiki/User:Opencooper/domainRedirect](https://en.wikipedia.org/wiki/User:Opencooper/domainRedirect)

------
erikig
Interesting idea but:

\- How do you handle ambiguity? e.g what happens when sci-hub.idk and
scihub.idk differ?

\- Aren’t you concerned by the fact that Wikipedia is open to editing by the
public?

~~~
captn3m0
Maybe use WikiData? The slower rate of updates might work in your favour to
avoid vandalism.

~~~
skissane
In my personal experience, Wikidata is often worse at detecting vandalism than
Wikipedia. Wikipedia has more editors and so vandalism on Wikipedia tends to
be noticed sooner. Wikidata gets less attention so vandalism can endure for
much longer.

With the increasing trend to pull data from Wikidata into Wikipedia, this is I
think becoming less of an issue – even if nobody is watching the Wikidata
item, if some vandalised property is exposed in a Wikipedia infobox, that
increases the odds that someone will notice the vandalism. However, there are
always going to be more obscure items which lack Wikipedia articles, and more
obscure properties which don't get displayed in any infobox, and for them the
risk of vandalism is greater. (Plus, it is possible for a Wikipedia article to
override the data in Wikidata with its own values; this is done for the
English Wikipedia Sci-Hub article, for example – Wikidata is including all the
historical web addresses, Wikipedia only wants to display the current ones – I
don't think it is technically possible yet to filter out just the current
ones, so instead Wikipedia is manually overriding the addresses from
Wikidata.)

~~~
mmarx
> I don't think it is technically possible yet to filter out just the current
> ones, so instead Wikipedia is manually overriding the addresses from
> Wikidata.

Note that the historical ones are of “normal” rank, whereas the current ones
have “preferred” rank. You can filter that when using the API, and when using
the SPARQL endpoint, if you go for the “truthy triples” representation
`wdt:P856` of the “official website” property, you will only get best-ran
statements – in this case the preferred ones. If you want to be absolutely
sure, you can go for the “reified triples” representation and query for
statements that don't have any “end time” qualifiers.

------
jneplokh
Awesome idea! It could be applied to a lot of different websites. Even ones
where I'm too lazy to type out the whole URL :p

Regardless, having a system where you can base it off a website could
definitely be expanded beyond Wikipedia. Great work!

------
snorrah
Does this comply with the terms of service? I know this won’t be a popular
reply and that’s fine, but I just want to know whether your admittedly
intriguing concept isn’t taking the piss :)

------
upgoat
Woah this is hecka cool!! Nice work to the authors.

------
jaimex2
Alternatively just don't use Google.

~~~
Sabinus
What search engines don't censor?

~~~
jaimex2
Yandex and Duckduckgo are good.

------
sm4rk0
Nice hack, but you can do it much easier with DuckDuckGo's "I'm Feeling
Ducky", which is used by prefixing the search with a backslash:

[https://lmddgtfy.net/?q=%5Chacker%20news](https://lmddgtfy.net/?q=%5Chacker%20news)

That's especially useful if DDG is default search engine in your browser.

(I'm not affiliated with DDG)

~~~
kelnos
That just takes you to the first DDG result, no?

The purpose of this seems to be to treat Wikipedia as a trusted, reliable
source of truth about the canonical URL for websites (debatable, of course).
The idea is that you don't trust the search engines, perhaps because you live
in a country where your government has required search engines to censor
results in some way, but (for some reason?) lets you go to Wikipedia.

------
BubRoss
Wouldn't dns over github make more sense than this?

------
rootsudo
This is cool!

