
Dorking: the use of search engines to find very specific data - abarrettwilsdon
https://www.alec.fyi/dorking-how-to-find-anything-on-the-internet.html
======
chris_f
A few corrections:

The + (formerly used to force a term to be present in the result) and ~ (also
find synonyms) operators have been deprecated.

Google now advises to wrap the word in quotes instead of using the +. Google
will also automatically look for synonyms without the use of ~.

I have seen 'AROUND(n)' mentioned in many other places working as a proximity
operator in Google, but I don't believe that is true and haven't found it to
work in any logical way.

Also the use of parentheses to nest queries is not necessary in Google. It is
actually required for Bing on complicated queries though.

~~~
GordonS
Worth mentioning that even if you put a term in double quotes, Google _still_
tries to be too clever - you are not guaranteed to get results that contain
your quotes search term :/

~~~
solarist
As a workaround and under search tools one can enable the “verbatim” option.

~~~
GordonS
AFAICT, the verbatim option gives the same results as if I'd quoted my search
term?

~~~
solarist
In my experience it depends on the number of results, and the results are more
accurate with verbatim.

~~~
aspenmayer
It’s like being able to touch your nose in the dark, or even having been blind
since birth.

Verbatim searches are for those easy tasks when you know _literally_ ,
_exactly_ , what you’re looking for.

Something tells me that a lot of folks on HN are being let down by Google in
this area. I don’t mean it in a bad way, I want to identify these issues to
help everyone. I wish the UI of search wasn’t so textually-bound.

Is there such a thing as an augmented reality search engine? You search like
you do at the library, or when you lost your keys. But for data and virtual
objects, they lose out - their evanescence creates a perception gap. There is
no correspondence with physical objects, etc. We are getting better with
haptics, however.

 _Rainbows End_ [0] is a future society I’d choose to live in, especially
right about now.

Anyone have any good book recommendations?

[0] Vernor Vinge

[https://en.wikipedia.org/wiki/Rainbows_End](https://en.wikipedia.org/wiki/Rainbows_End)

------
sawaruna
Might be my librarian career bias but I'm always surprised at how few people
know about query operators. Ironically as Google search seems to be ignoring
vital parts of people's queries, they are becoming more needed now, whereas
years ago I would have assumed a constantly improving Google search would get
better at determining what I was looking for.

~~~
kebman
You don't even wanna know how many times specialized searches have saved my
ass, after multiple years on uni, and working as a writer, journalist,
programmer, en even a musician! You can safely say that my entire life
revolves around being good at doing various forms of searches.

~~~
Wistar
I have long believed that the art of precision search should be taught at the
primary level. It is a necessary skill.

~~~
otr
That idea is present in the Rainbows End by Vernor Vinge. Basically search
engines and the use of discussion forums to find information becomes a subject
in schools etc. Though this is not the main point of the novel at all.

[https://en.wikipedia.org/wiki/Rainbows_End](https://en.wikipedia.org/wiki/Rainbows_End)

------
uniqueid
Last week I blocked every * .google.* domain on my network except "youtube-
ui.l.google.com".

Google Search: (1) ask a natural language question (since actual search is
hobbled) (2) get unrelated garbage and ads back (3) blame yourself for "not
being technical enough" to understand why the results aren't actually garbage.

Google Search has deteriorated to the point that so far I haven't missed it
_at all_.

~~~
darepublic
Google still good for coding related searches

~~~
uniqueid
Ever spent three minutes opening useless links from Google's Search results,
only to realize they dropped the keyword you searched? That seems quite common
now, especially with programming keywords, which are often obscure.

Remember Google Code Search, and Google (Usenet) Groups? Back then, Google
cared about this stuff. Now they seem only to want to show you furniture ads,
or get you to use their Zoom knockoff, etc.

These days Google substitutes the heck out of searches. Perhaps it's better if
you've logged in, but I'd rather hack my leg off with a rusty saw than
voluntarily log in to an account just to search the web.

~~~
jjeaff
This is also very common with DDG. And the strange thing is that many times,
DDG will even ignore if you are using parenthesis. The parenthesis seem to add
weight but I will still get tons of results that don't contain the required
words at all.

~~~
roochkeez
Never seen this happen with DDG. Required words should be in quotes, not
parenthesis.

------
neilduncan
I live two towns over from Dorking.

[https://en.wikipedia.org/wiki/Dorking](https://en.wikipedia.org/wiki/Dorking)

~~~
tomalpha
I grew up in Dorking, but this is the first time (that I can remember...) that
I actually read its wikipedia article.

TIL: No one knows why 'Dorking' is called 'Dorking', but there's a English
Place Names Society which since the 1920's has researched the origins of town
names in England, and is considered [0] to be "the established national body
on the subject".

[0] [https://epns.nottingham.ac.uk/](https://epns.nottingham.ac.uk/)

~~~
kolektiv
The -ing part of a place name is Saxon, from memory, something like "ingas"
meaning "people of". So, People of Dork, literally, and whatever became Dork
(a person/place/event) is probably lost to historical memory. We have an awful
lot of places with "ing" in them around here, probably due to a lot of C7
Saxon settlement.

------
weisbaum
This is a pretty common practice among SEOs for a variety of different
reasons. They are also known as advanced search operators.

Ahrefs has a pretty comprehensive list here: [https://ahrefs.com/blog/google-
advanced-search-operators/](https://ahrefs.com/blog/google-advanced-search-
operators/)

------
harha
I think it would be useful to be able to explicitly search around knowledge
graph entities or site topics, e.g. a programming language, a city, a season,
without having that single/specific term.

So a search including all sites related to an entity, say Munich or python
along with the terms the user is searching because a page might then not
specifically include the entity in its keywords or the text on the site or
have a different language or use a synonym.

I’m sure search engines consider this somewhat, but explicitly activating such
a feature would be a great improvement for the user.

Stackexchange has this feature with tags (using []), with user curated tags.
Would be nice to have in DDG or google.

~~~
mitchdoogle
I would just like to create my own groups. As another user said, tagging would
probably be gamed by SEO companies, but if people could use their own
groupings, that problem wouldn't occur. Their could even be curated lists out
there of specific sites that fall within a general category. At the least, I'd
like to be able to block sites from ever appearing in my results. I've used
add-ons for that which work pretty well, but it should be built-in in my
opinion

------
Shared404
Syntax for doing things like this with DDG:

[https://help.duckduckgo.com/duckduckgo-help-
pages/results/sy...](https://help.duckduckgo.com/duckduckgo-help-
pages/results/syntax/)

~~~
kps
I'd switch to DDG in a half second if they supported the full query syntax of
altavista.digital.com (see
[http://jkorpela.fi/altavista/](http://jkorpela.fi/altavista/) if you've
forgotten). Disclaimer: I work for, um, Google.

~~~
Shared404
I do wish they supported a larger search syntax. My current workaround is I
have a massive bookmark folder of alternative search engines that I try if I'm
not having luck narrowing things down enough.

~~~
lsiebert
That might make an interesting blog post

~~~
Shared404
I'm still mid-setting up a blog, but I'll keep that in mind once I've got it
up and running.

I'm afraid it probably wouldn't be that interesting to HN'ers though, because
this is where I found most of them.

~~~
jidiculous
I'm pretty new to HN so I'd really find that post interesting – having them
listed in one place would be useful.

~~~
Shared404
If you want, you can shoot me an email and I'll send a link to a page where I
have them listed already. Not a blog post, and not much info about them, but I
have a list.

I just don't have a domain yet so I don't really want to go super public.

~~~
jidiculous
Sure thing, I've just emailed you.

~~~
Shared404
Update: there is now a domain, the page can be found at

[http://a-shared-404.com/other-stuff](http://a-shared-404.com/other-stuff)

------
surround
Exploit database with more dorks

[https://www.exploit-db.com/google-hacking-database](https://www.exploit-
db.com/google-hacking-database)

------
1vuio0pswjnm7
I have a question for anyone reading this thread:

Do you believe you can get consistent results with _any_ search?

For example, if we pick some _uncommon_ search terms will we get the same
results on the first search, the second search, the third, etc. Or will the
results change?

I did a search with some terms from one of the comments in this thread, in
quotes. The first search returned only one result: this thread.

As I searched the same quoted terms repeatedly along with additional terms,
more results were returned that contained the exact string of original terms.
Surprised by this, I tried a search with only the original terms, in quotes,
once again. This time the search returned more than just the one result.

~~~
abarrettwilsdon
If it's specific enough, the SERP should stay the same until someone else
publishes the same thing

e.g. the search of another article "set up Google Sheets APIs (and treat
Sheets like a database)"

turns up my site and a couple Twitter threads talking about it (plus a
phishing site which has scraped and republished it). I presume that will stay
the same b/c it's such a specific title phrase (but not because searches are
necessarily deterministic)

~~~
1vuio0pswjnm7
I am skeptical that quotes really work like the plus operator used to work.

For example, try searching the following string in quotes: "the SERP should
stay the same".

[https://www.google.com/search?q=%22the%20SERP%20should%20sta...](https://www.google.com/search?q=%22the%20SERP%20should%20stay%20the%20same%22)

Now, the logical presumption, assuming Google works as people say it does, is
that each result will contain that exact string. If no results contain the
string, then you should receive no results.

However, for me, results are returned. Did each of the results from that
search contain this exact string? For me, they did not.

~~~
abarrettwilsdon
When I run that query, it returns

`No results found for "the SERP should stay the same".`

Then defaults to providing the SERP for the fallback query:

`Results for the SERP should stay the same (without quotes):`

That SERP should change when this HN thread is indexed though

~~~
1vuio0pswjnm7
I must have missed the line "No results found" and the disclosure that Google
has, by default, gone ahead and performed a "fallback query" without the
quotes. Perhaps that is the goal. That I, the user, will not notice. If I
wanted the fallback query's results then I would not have used quotes. This
appears to be another another example of Google second-guessing the user.
Perhaps they assume that the user who searches for an exact string with quotes
would, in most cases, try the search again without the quotes if there are no
results found.

------
yuvadam
Dorking is not that easy to do, Google is very easy on assuming you are being
malicious on certain queries, try one too many and you'll hit their dreaded
captcha that is impossible to pass.

~~~
userbinator
That really angers me, and I've tripped it more times than I can count,
usually by searching for very specific things. Coworkers have also run into it
multiple times (before everyone started working from home, we would exclaim
"Fuck you, Google!" and raise a middle finger to the screen, which was a cue
to everyone else to help).

The fact that they think you're "not human" when you use a search engine for
its intended purpose and show how much you know how to use it is both
disturbing and saddening. I wonder if Google's own employees run into it
and/or the continuing degradation of results, or if they're somehow given
immunity and a much better set of results...

~~~
IggleSniggle
I’m curious about this. Can you give an example of the kind of query you are
talking about where Google assumes you are a bot and not a human?

~~~
bloaf
I get them _all the dang time_

Suppose I want to search for a textbook in an open directory. This query will
get flagged almost immediately:

    
    
        textbook -inurl:htm -inurl:html -inurl:asp intitle:"index of" +(epub|mobi|pdf|txt)
    

Suppose someone says to me "no one in this forum believes _(dumb conspiracy
theory from Obama-times)_ " or "the lamestream media never reported on X."
Naturally I will do a few quick iterations of

    
    
        conspiracy related "terms" site:someforum.com before:2017-01-01
    

which will trigger a humanity check if you iterate too quickly.

I got hit recently when the news about Tucker Carlson's racist writer came
out. I set out to find the full threads in which the offending comments were
made. Iterating through combinations of text, usernames, and urls like

    
    
        "full text of offensive post" username before:date-news-broke -inurl:news
    

got me checked every 3rd search or so.

------
kace91
Back when I was a teenager,I had a book titled "hacking with Google" by Johny
long that was basically all specific searching tips and terms (oriented to
find open vulnerabilities and the like, but still very useful in general
despite the tacky name).

I wonder how much of it is still valid after all this time.

~~~
mcswell
Back when I was a teenager, I had a slide rule. I can guarantee that a slide
rule is still valid, so long as you're not interested in more than two or
three significant digits, and you don't want to add or subtract.

------
voldacar
Why doesn't google.com have a comprehensive list of these? I'm constantly
seeing new ones that I didn't know about, but google never teaches you about
them so you have to find them in obscure blog posts

~~~
vezycash
Google randomly ignores "search term in quotes".

Related:examplesite.com used to work well. Now, it's better to use sites like
alternativeto.net.

~phrase is unnecessary because but google searches for synonyms by default

phrase1 + phrase2 - Google randomly ignores it. I use it this way
+compulsoryTerm

Although rare, there are things I simply can't find using Google. But Bing
would. If Google keeps it up, other search engines would benefit.

~~~
ewired
Doing some "related:" queries returns some interesting results that look
human-curated and out-of-date. related:google.com shows results for Yahoo,
Bing, AOL Search, and HotBot (which used to be a search engine, but the brand
is now for a VPN provider).

~~~
chris_f
That is great! It's like a search engine blast from the past.

Also interesting, related:bing.com gives me no results.

------
ricardo81
Worth pointing out if you do some of these crafted operator searches quite
quickly, you'll end up getting blocked or having to complete a captcha. I
haven't done so in a while so I'm not sure what their current behaviour is.

Main reason being there's plenty data mining, e.g. looking for "powered by
wordpress" and vulnerable versions, and generally all kinds of data mining
that involve very specific requests for information, likely queries that
aren't creating revenue, either.

------
w0mbat
The - prefix operator is very useful and still works.

Google should reinstate the + prefix operator. It was only taken out because
it screwed up the search results for Google+, which is dead now.

~~~
kilroy123
I find myself having to use the "-" prefix a lot these days.

------
marcrosoft
I love the “inject JS into the page to find stuff” hack. The author mentions
local “site you are on” but this can be applied with headless chrome to crawl
many sites.

~~~
flywheel
That's web scraping 101

------
yourad_io
Fun fact: googling for -273.15 without double quotes produces no results.

You need to quote negative arithmetic values when searching, even if there are
no other query parameters. It made me wonder if I was misremembering absolute
zero.

~~~
yjftsjthsd-h
Oh, probably because it interprets it as a logical negation; not "negative X",
but "remove X from results".

------
jrochkind1
Why is this called "dorking"? "Dorking" is a word that just means using search
engines to find very specific data? This seems bizarre to me. Why does this
need a special word?

Or it actually means using search operators beyond natural language entry?
That's what this page seems to be about? I don't know why that would be called
"dorking" either?

~~~
p410n3
It all started with a def con talk if I remember correctly.

[https://youtu.be/N3dzVl40lQA](https://youtu.be/N3dzVl40lQA)

------
indit
A very comprehensive and frequently updated list is here:
[https://www.exploit-db.com/google-hacking-database](https://www.exploit-
db.com/google-hacking-database)

------
the_jeremy
All I want is the ability to search for symbols. Symbolhound.com is the only
site I've heard that will support that, but it leaves a lot to be desired.

~~~
Brakenshire
It’s strange to me that more domain-specific search engines haven’t been
created. There must be value in a programmer-specific search engine for
instance. Or why aren’t there search engines that specialise in news, social
media, Q&A websites or events, to give a few examples.

------
aaron695
Learn to use time. It's a drop down.

The web is slowly atrophying. Going back in time for originals makes a big
difference.

Reverse is also true.

After a blow up the mass media will repeat the same thing on mass and swamp
results.

Often an article in the last hour might have what you want, like the database
link they are all talking about.

------
huffmsa
Don't you just love it when you're carefully crafted search finally displays
the words or phrases you want in the snippet on the results page but then when
you actually open the link and CTRL+F for it it's nowhere to be found? Not
even in the raw HTML?

I sure do.

------
Tepix
There's a related thing you can do. If you have web pages somewhere, create a
bunch of blank web pages with just one random word on them (something like
"ristordshest") and then create an index page that links to them all.

Then link to that index page somewhere where noone except web crawlers will
notice it. Then wait a few weeks.

Now when you

a) sell something on eBay where you are not allowed to link to the product
support page page or some other stupid restriction like that

b) want to promote something on Instagram where you can't link to it

Ask people to google for the search term. There will be only one result:
Yours.

------
bmay
the "link:" operator doesn't work for me--it just seems to include the URL's
tokens in the search

~~~
snowwrestler
Pretty sure that one is deprecated. It was very useful for SEO research, which
is probably why it doesn’t work anymore.

------
peter_d_sherman
A few thoughts:

1) Great information!

2) It seems like the world could use a book like Joe Celko's "SQL For
Smarties", but for search engines. Yes, there are such books already, most
notably O'Reilly's "Google Hacks" by Rael Dornfest, Paul Bausch, Tara
Calishain -- but I think the world could still use a book covering more search
engines and search techniques. The above web page would be a great starting
point to an endeavor like that.

3) "Dorking" (love that term!) -- is going into my 2020 vocabulary lexicon!
<g>

------
kobieyc
Anyone here remember Fravia?
[https://web.archive.org/web/20191201105758/http://search.lor...](https://web.archive.org/web/20191201105758/http://search.lores.eu/indexo.htm)

~~~
bmn__
I do. Alec's blog entry is outright pathetic in comparison. It does not even
scratch the surface of fravia+'s treasure chest.

------
harimau777
Is there any way to search the actual page text? I find that often I remember
some unique turn of phase from the page that I'm looking for and it would be
extremely helpful to be able to simply search for that.

~~~
abarrettwilsdon
`intext:phrase` and `allintext:multi part phrase`

generally "phrase" works well too

~~~
harimau777
Thank you!

------
jhbadger
Does filetype: still work? I'm getting zero hits for example filetype:epub

~~~
choo-t
It still work but some file type never return anything, I have the same
problem with epub, pretty sure it's some google's shenanigan about books
piracy.

[https://support.google.com/webmasters/answer/35287?hl=en](https://support.google.com/webmasters/answer/35287?hl=en)

~~~
achairapart
Maybe Google doesn’t index epub at all? I think I never saw one in search
results.

~~~
choo-t
Well, I may have become crazy but i have vivid memory using it in the past,
and some websites even refer to this specific query (
[https://ebookfriendly.com/google-search-tips-
books/](https://ebookfriendly.com/google-search-tips-books/) )

------
chc
I'm kind of surprised to see Google brought back the + operator. I remember
they prominently changed its meaning when they made it the @ of Google+, and I
never bothered to check again after it died.

------
buffin
As a teenager, I used to search for "Index Of <movie name>" for movies. 2/3
times, I was able to find and download the movie I wanted to watch.

------
zhacker
I think I should rename filechef.com to dorkchef now

------
iandanforth
The email specific queries don't appear to work. The "@" is ignored by google
so you just get results for the domain string.

~~~
abarrettwilsdon
The first two appear to still work, but the third does not.

The permutation searches are tricky because you don't know if a lack of
results means the email does not exist, or just hasn't been posted anywhere
indexed

Will update and credit

------
Daub
Effective Google-foo is one of the first things I teach my first year
students. Few greater life skills exist.

------
j45
This reminds me of an article I once read about the neat tricks that used to
exist in altavista.com search engine

------
malwarebytess
NLP and to a lesser extent SEO has vastly diminished the value of this type of
searching.

------
somerandomboi
It would be useful to use “Dorking”, even for non-programmers.Good article!

------
lizardmancan
[https://www.google.nl/search?q=site%3A+news.ycombinator.com+...](https://www.google.nl/search?q=site%3A+news.ycombinator.com+lizardmancan)

i use to use these a lot but now it's just useless

~~~
mikequinlan
You need to remove the blank after the colon.

[https://www.google.nl/search?q=site%3Anews.ycombinator.com+l...](https://www.google.nl/search?q=site%3Anews.ycombinator.com+lizardmancan)

------
flywheel
Prediction: Using the methods of "dorking", this is the only page on the
internet among 10 million+ results that is calling this "dorking".

~~~
montjoy
I hope it doesn’t catch on since it makes me die a little inside. It’s a very
Reddit-type word though. I can easily imagine it being used by non-technical
folk and tech journalists.

