
Show HN: Imagine a search engine that removed top million sites from its index - taxonomyman
http://millionshort.com/about.html
======
swalsh
The results are surprisingly good. I did some searches for recipes, and
frankly without the top 1000 you really start getting some fresh hits. Entries
by real people, rather than sites raking up recipes for a hit.

~~~
hkmurakami
Wow, I didn't know until today that Google doesn't serve more than the top
1000 hits. (A result of me thinking, "can't I just ask Google to show me the
million-first through the million-tenth hits?)

------
nostromo
It's interesting to see popularity used as an inverse corollary with quality.
Imagine a TV that skipped the most popular programming (goodbye American
Idol), or a radio station that only plays non-hits.

Of course, there are great websites out there that are very popular
(Wikipedia, NYTimes/WSJ, StackOverflow). I'd love to see a search engine with
a better signal for quality than non-popularity (this search engine), or SEO
(Google), but it's a fun start. :)

~~~
afhof
You have to be careful with that idea since it can still lead to poor quality.
My campus's radio station (WREK Atlanta) seems to play just weird music for
the sake of being weird. They never play popular music, but they rarely play
any good music either.

~~~
waterlesscloud
WREK is just about the best radio there is. It's not weird for the sake of
being weird. It's intentionally eclectic and far-reaching, sure, but the point
is to broaden horizons. As much for the DJs themselves to broaden their
horizons as the audience.

Disclaimer- former WREK DJ. :-)

------
zeratul
It means that our ranking algorithms have good recall but very poor precision.
We value web page connectivity more than its content. We don't know how to
teach machines to evaluate web page for its merit so we hope that a large
number of Twitting, Liking and Plusing non-experts will approximate single
expert. _Millionshort_ shows that this model isn't good enough.

~~~
RegEx
I was able to rank for some moderately competitive terms just by paying
someone minimum wage to go leave nice comments on dofollow blogs. The whole
"Content is king" meme is a joke, really.

~~~
rudiger
Interesting. How do you find blogs that don't implement nofollow?

~~~
RegEx
Plenty of blogs promote themselves as DoFollow Blogs. It's a bit sad, really.
They basically trade their "link juice" for your insincere, contrived
comments.

~~~
robryan
Which in turn can help them rank for more terms.

------
PaulHoule
This is the first off-brand search engine I've seen that's, in some sense,
cooler than Google.

For one thing, the huge miasma of spam websites that dominates the SERPs just
isn't there -- I hope this lights a fire under Google's butt and people see
another world is possible.

~~~
jaems33
It reminds me of why I first moved to Google from
Yahoo/Webcrawler/Altavista/etc in the first place.

~~~
erichocean
Yup. Since this is trivial for Google to copy, it's unlikely that it'll
actually disrupt Google in any way, but still...very cool.

~~~
noahc
The technology is trivial, but the reasons behind it and philosophy that
drives it aren't.

Google isn't going to come along and say, "oh, guys, you're right...Wave,
Buzz, Google+ all a mistake. Social Graph..Mistake!. We're sorry. We are
rolling everything back to 2004"

Google has been increasingly more frustrating for power users over the last
couple of years. People are looking for alternatives. DDG is one, this is
another. Power users matter, and not just because they spend more time inside
an application, but because they recommend it to others.

A perfect example of this is firefox. I used firefox a lot, and then switched
to Chrome. I've since switched my mother, grandpa, brother, girlfriend, sister
and too many friends to count to chrome. I now use DDG for all my searching
needs via the bang syntax (!, !g, !w, etc). It's only a matter of time before
the rest of my family follows along.

~~~
hello_asdf
Ah, I was unaware of the !g bang syntax, that's a great one. I'm also fond of
!archwiki, and all the language bangs. DuckDuckGo is the only search engine
that I know of that could take over a significant portion of search traffic.
By significant I don't mean tens of millions, but I mean a dedicated userbase
who use it in leue of Google.

~~~
dsrguru
Just out of curiosity, what does DuckDuckGo do that conventional keyword
searches in Firefox or Chrome don't do?

~~~
noahc
I have DDG set as my default search engine, but can you do the following:

! fuji Ames => takes me to Fuji Steak House in Ames Iowa directly

!w Sushi => Takes me Directly to the article on Sushi on Wikipedia

!m 801 grand ave Des Moines => takes me directly to google maps at the
address.

I still probably do about 30% of my searches using !g kicking me straight to
google for search results.

~~~
dsrguru
But how is that different from regular keyword searches? If I want to pull up
the Wikipedia article on sushi in Firefox, I enter "w sushi" in the URL bar.
If I want to read the Arch Linux wiki article on pacman, I type "arch pacman".
If I want to watch Friday by Rebecca Black on Youtube, I type "y Friday", etc.
I feel like there's more than a 50% chance that I'm missing something. What
does DDG do that browsers don't?

~~~
noahc
I typed in "w sushi" into chrome and I got
[https://www.google.com/search?aq=f&ix=ucb&sourceid=c...](https://www.google.com/search?aq=f&ix=ucb&sourceid=chrome&ie=UTF-8&q=w+sushi)
which google set to my default. I expected to get
"<http://en.wikipedia.org/wiki/Sushi>

Do I have to setup all these different keywords to work? If so that's the
difference. It's essentially a pre-configured command line with literally
hundreds (thousands?) of predefined syntax for searches.

~~~
dsrguru
Yes, I set those keyword searches myself. So the difference seems to be that
with DDG, you just have to learn which shortcuts exist rather than customizing
them yourself. That doesn't really sound like it saves much time, but it
sounds like the real advantage of DDG is that the act of reading through the
DDG list of keywords would probably give me ideas of useful time-saving
shortcuts that I'd never think of making on my own.

------
Cushman
I'd be fascinated to see the kind of SEO that would go on if this took off.

"Bad news-- we're a top 100 hit for several of our main keywords. We'll have
to change our URL scheme again."

~~~
Bud
We can call it Search Engine Pessimization!

~~~
hoprocker
Bravo!

------
libraryatnight
This reminds of searching the internet in the 90's, I'm finding results from
pages I haven't visited or heard of before now.

This is really refreshing.

------
eliasmacpherson
This is a breath of fresh air - I'm loving the unpredictability of the top
results! It's like flicking through a new set of 1000 tv channels in a
different country.

~~~
taxonomyman
The positive feedback is awesome.. Thx

------
pbhjpbhj
Would you share some implementation details.

What's your source for the top million sites; where do you get your site list
from for the other results?

------
goodgracious
Who needs to imagine it? It's here.

Cat is out of the bag.

What is the Alexa list good for? Answer: Filtering out the boring, money-
grubbing commercial sites. A truly GREAT idea.

A return to the good 'ole days. The non-commercial web.

Many young people who love today's www never got to experience it as it was
before it became overrun with Google-ization and auto-generated garbage.

Take the ball and run with it. We ca reclaim the web. This is only the
beginning.

------
waivej
Wow... It felt like using Google 10 years ago. I think you are onto something.

~~~
taxonomyman
Wow.. awesome comliment..

~~~
mburshteyn
Usually don't search unless I'm looking for something in particular, but just
played with this for a good 15 minutes running random queries. The results are
really good and at the same time I'm discovering sites I'd never otherwise see
with Google.

------
bicknergseng
Turns out removing the top million results from a search for Google... still
returns google. Or google.com.au to be precise.

It's a cool idea, but I'm not sure it's working. I tried "american history"
but it wouldn't return anything at all if I changed the "Remove the Top"
dropdown.

~~~
taxonomyman
Good feedback. Right now the filtering is not 100% working for non .com, .net,
.org, .co.uk domains. Still working on it.

------
bstar77
There's one pretty big flaw with this approach... For certain searches that do
not have "millions of results", you get completely unrelated results.

If I search my name then the results are for names similar to mine, but not
actually my name. This makes it completely useless for searching my name. I
would think that there are many searches with this problem.

I think there needs to be some kind of weighting system used that dynamically
decides the cutoff point. One million is a huge over-generalization for all
search terms.

~~~
taxonomyman
The number of results aren't a factor. We are removing results from sites that
are in the list of top 1 million most popular web sites. Hope that helps.

~~~
bstar77
Ah, this seems to be the case. It looks like you are dropping my name down to
"thousand from results" despite it still saying "million from results".

Very nice!

------
RegEx
I like this idea a lot. I came across a nice, concise explanation of a Buffer
overflow

[http://www.apolis.org/index.php?option=com_content&view=...](http://www.apolis.org/index.php?option=com_content&view=article&id=81:what-
is-a-buffer-overflow&catid=44:faq&Itemid=62)

~~~
pash
This is a copy of the Wikipedia article on buffer overflows.

~~~
hoprocker
Looks like this idea isn't totally in the free-and-clear from content farms.

~~~
robryan
Yeah except now you get the content farms that don't know how to rank. A
competitive spammy term like "pay day loans" still shows plenty of low quality
sites because of the crazy number of sites looking to cash in on the term.

------
bane
It's a similar measure that's often used in NLP. Sentences, documents etc. are
usually stripped of common or popular terms first and the remaining ones tend
to have higher information value.

It's not entirely a surprise that it works for meta-language constructs like
the web and site popularity.

~~~
robrenaud
Uhh,

1) I am basically certain that it doesn't work. Imagine something this simple
actually did work in general. Do you think Google, Bing, etc, wouldn't have
implemented it?

2) I think the analogy is extremely flawed. nytimes.com is by no means a web
page equivalent to "the" or "a" in language. Articles, pronouns, etc, don't
really carry meaning. Despite being popular, nytimes certainly does.

~~~
bane
I don't know. I haven't really spent a lot of time looking at this to get a
good feel of the quality. But consider this, NYT tends to be a secondary
source: e.g. book reviews (books are the primary source), science news
(scientific papers are primary), business news (market movements and press
releases are primary), etc.

Now consider, if I'm interested in some particular thing in science, is it
better to get the NYT science reporting or just get the professor's
publication and research page on their university website? Filtering out the
top-n sites is more likely to turn up the professor's site near the top of the
pile rather than after all of the popular sites' second and third level
reporting.

Is this better? Depends on the audience. A "popular science" goal would argue
the former is the better as science news simplifies, abstracts and popularizes
complex science (with varying degrees of quality) while a scientist would
prefer the latter.

------
serbrech
The results feel actually fresh. It's removing the consumerism layer of
bullshit that google serves us everyday. Also wondering if that has something
to do with the "bubble" that google creates around us based on our search
history and social network information.

Thanks for that.

------
hafabnew
Searching for 'python global interpreter lock' yields some interest blog
articles describing the problems, also some related articles about approaches
to the C10k problem with python (preforking, worker processes, etc.)

A++ would search again.

------
heydonovan
I'm really liking this. Instead of being bombarded with content that's just
blasted with keywords, I get relevant well-written articles. Not only that,
but no more W3Schools in my SERP's. The chance to read an article that's
written with humans in mind, instead of Google is more than enough reason to
spend some more time using this.

------
halle_lu_jah
You think that top results in Google and other commercial search engines are
always ranked based on "popularity"?

It would be harsh to call this naive, but it shows a serious lack of SEM and
SEO knowledge. Ever heard of "paid placement"?

Many years ago when Digital's AltaVista was our main search engine, it was
becoming loaded down with paid placement.

The results were polluted.

Google eventually became the "clean" solution.

But now it's Google that is loaded down with all sorts of commercial crud,
much of pointing to Google acquisitions.

And paid placement, among numerous other strategies, new and old, still
exists.

The simplicity of millionshort is brilliant.

Filter out the crap.

------
unimpressive
Add a way for me to put this as my search engine in my firefox search bar.

Please.

EDIT: In trying to accomplish this task I found an add-on that lets you do
this for anything.

([https://addons.mozilla.org/en-US/firefox/addon/add-to-
search...](https://addons.mozilla.org/en-US/firefox/addon/add-to-search-bar/))

~~~
taxonomyman
Awesome.

------
halle_lu_jah
Here's what Dropbox thinks about power users:

WSJ: What's next?

Mr. Ferdowsi: We continue to focus on actually solving problems that real
people have and not being distracted by what power users want.

Google has made clear what they think about power users:

No + operators in search.

No web-based code search.

No Google Labs for the public.

etc.

Plenty of wood behind the Google arrows, but all the cool ones have been cast
out of the quiver.

Just what kind of targets is Google aiming at nowadays?

Millionshort I give you +999,999.

I would give you +1M if you took out the AdSense and PlusOne javascript.

This has been a long time coming.

Alas, DDG and other alternatives are all about _money_.

Search is about _discovery_.

------
__alexs
Well that's one way to break out of the filter-bubble/echo-chamber I suppose.
If only our best search technology was based on something better than a
popularity contest :(

~~~
taxonomyman
Popularity I think is important. But not at the expense of relevance. It's not
a easy nut to crack.

~~~
romaniv
Why? If I search, say, for a game review, I don't care whether it comes from a
popular website or a blog no one reads. In fact, the topmost websites are more
likely to be biased, since they try to appease everyone and they also have
strong relationships with publishers. The blog no one reads is nearly
guaranteed to be honest (if not well-written).

This holds true for most topics I can think of. Moreover, if I ever need to
read Wikipedia and such, I _already know_ about those websites, and I can go
there directly - no need to search. Shouldn't web search engines act like
discovery tools?

~~~
__alexs
> Shouldn't web search engines act like discovery tools?

If your business model is based on advertising that depends on masses of page
views to generate value, then no. You want to be as generally useful as
possible so that e.g. people use you as an (extremely inefficient) DNS
service.

(Google Labs has a single optional feature available for search. Perhaps their
arch doesn't make 20% or Labs projects a good fit for plugging in extra fancy
search features?)

------
tnorthcutt
And of course, W3Schools still manages to show up, thanks to their multiple
crazy subdomains: <http://cl.ly/GFup>

~~~
taxonomyman
Your're right. We strip www from the domains. We need a better function to rip
out the domain from a URL. Just haven't had time to cook one up yet.

~~~
mattj
Try out <http://publicsuffix.org/> \- that plus a custom suffix list of
overrides works wonders!

------
yayadarsh
After a few test searches, this is surprisingly effective for things which I
had resigned "un-findable" because of poor Google results. This is most
apparent on non-technical things, in this case specific Jazz chord fingerings
for a guitar class I am taking.

I am very interested as to what comes of this, or rather what is influenced by
its implications.

~~~
cjlars
Guitar chords / tabs / lessons are a terrible SEO spam offender... Look at
that! I finally found an accurate transcription of "Bohemian Rhapsody"

------
erichocean
Man, I love this thing. I've already found a bunch of interesting links on
path tracing. Bookmark'd.

~~~
taxonomyman
Sweet. Glad to have helped.

------
wonderwhy
What is the ranking used for the top million sites? A search result for
"Australia" returns as the top result <http://australia.gov.au>, which Alexa
ranks as 20,615 globally. Actually, a lot of the queries I tried returned
Australian sites.

[http://millionshort.com/search.php?q=australia&remove=10...](http://millionshort.com/search.php?q=australia&remove=1000k)

[http://millionshort.com/search.php?q=somalia&remove=1000...](http://millionshort.com/search.php?q=somalia&remove=1000k)
\-- another Australian site.

~~~
taxonomyman
Right now the filtering is not 100% working for non .com, .net, .org, .co.uk
domains. So, .gov.au isn't being properly filtered yet. Soon.

~~~
slyall
The Public Suffix list <http://publicsuffix.org/> maintains a list of domains
you need to filter on.

------
bo1024
I'd really like to see randomization instead. Return results picked randomly
from within the top 10 million or something.

~~~
shazow
I'm not sure this is a great idea. Predictability is a staple of a good user
experience. Getting different results for the same query between users or
sessions is bound to lead to broken expectations and frustration.

~~~
taxonomyman
Think of this as more of a discovery engine. And predictability takes the fun
out of discovery.

~~~
MichaelApproved
Great point. You might have a hard time competing with google on relevant
results but you might be able to beat them on discovery, like a stumbleupon
search engine.

------
scoot
It's amusing to see all the SEO "experts" that don't make it into the top
million:

[http://millionshort.com/search.php?q=seo&remove=1000k](http://millionshort.com/search.php?q=seo&remove=1000k)

~~~
Alexandervn
It removes the top million in general, not per query.

------
wazoox
A real serendipity engine. Absolutely great, thank you. I'm finding tons of
interesting products and ideas by searching the most banal things :)

------
pfarrell
I would prefer if my previous search was populated in the search box after
completing a search (since I might want to try the search with a different
filter).

It appears you have an "off by one issue" in the sidebar. There's always a
blank entry in the list of ignored sites.

Filtering does not seem to be working (or I don't understand it). Searching on
"chicken" produced the same results with 1million or 100k removed.

~~~
taxonomyman
Will add that feature. Thx

~~~
pfarrell
I neglected to mention... Awesome idea!

------
twelvechairs
Thankyou! I can see this being something I regularly use.

It may be a simple idea, but its something nobody else has done before, and I
think the creators deserve a lot of credit for coming up with and implementing
it. I hope they manage to get something from it. I can see that if the site
becomes popular it will just get copied by other search sites.

------
pessimizer
This is pretty amazing. I didn't know that the old internet was still there!
This may become my new favorite search engine.

------
halle_lu_jah
"Quality" is subjective.

More relevant is _accuracy_, i.e., you get what you specify via search
operators, and results are not influenced by all of Google's silly "factors".
You know what you're looking for and how to frame the query. But Google
assumes you're dumb and thinks it should decide for you.

Alexa Top 1M is a nice filter because the data comes from the Alexa Toolbar
which only the most braindead web users would have installed. So you are in
effect avoiding sites that the web's most braindead users would often visit.

Ranking sites based on "popularity" is great until you reach the point where
the majority of users are not very intelligent. (cf. search engine users in
2004 with search engine users today.) When you reach that point, you get
results where "quality" is determined by idiots (and SEO hats), not a group of
intelligent peers.

------
charlieok
It's like a hipster search engine. It's only interested in things before those
things are cool.

------
martinaglv
Is it safe to assume that this is how Google's search results would look if
nobody did SEO?

~~~
SquareWheel
No, not at all. Google is driven by many signals that SEOs try to optimize on,
but the methodology is still the same. Top-ranking sites have high-quality
sites linking to them, good content, and are supposed to not look spammy
(though that's debatable).

I don't know how this engine ranks but I assume it's a similar system, they're
just chopping a good portion of the results out. For what you said to be true,
the top 10K/100K/1M rankings would have to be there entirely due to
intervention from an SEO, and that's just not the case. The Wikipedia's of the
world have enough going for them that they don't need SEO, so they'll always
show up in Google, and never in this engine.

------
pjin
Funny, on my first query I found an obscure HN scraper:

    
    
      http://tazod.com/

~~~
dools
Wow I think this is the exact idea I had for "HN Time Machine" - basically a
thing to extend the life of the newest page and speed up the movement of stuff
through the home page (ie. I think the HN new page moves too quickly and the
home page not quickly enough)

------
vidoss
Exactly what I was thinking in 2001 =>
<http://www.halfbakery.com/idea/The_20Other_20search_20engine>

Glad to see someone did it now...

------
g_lined
This is a great idea and I see myself coming back to this. It's a shame that a
little blog on tumblr or blogspot gets taken out because it's under a big name
domain - but this has spam related benefits too.

Great work!

~~~
taxonomyman
Maybe we can set up a Webmaster tools sort of submission process for
inclusion.

------
jtchang
Wow some of the content there is great. Forget about searching the deep web.
For me deep is the real gems buried under the first 100 or so results where
stuff actually gets interesting!

------
chrislomax
Only a little thing, could do with maintaining query strings between pages. It
lost my query string and returned no results when I changed the drop down
without me noticing.

~~~
taxonomyman
Good catch. Will fix. Thx

------
Suncho
I have been wanting something like this for a while. It's even on my todo
list. Thanks for saving me the work. I'll be using it all the time!

------
felixchan
How did you build this? Are you indexing the entire web yourself? Or are you
using Google's index/removing the top 1 million based on domain?

~~~
wildmXranat
I think search APIs like Yahoo BOSS allow you pass arguments that contain a
black list of domains. I think it's the 'sites' argument that may be used like
this: &sites=-google.com

~~~
taxonomyman
You are right, but they won't allow the list to be 1 million sites long. You
are talking 15 megs of data per data in plain text per request.

But I like the idea of being able for users to, via a setting perhaps, add
their own list of deny/include sites.

Thx for the comment.

~~~
shurane
Isn't this under Blekko's domain of ideas with the slashtags letting you
include particular sites?

------
mswen
I just tried it with a search for some competitive intelligence. I used the
100K removal option. I found a competitor in another country that had not made
the top 2 pages on Google. It confirms that others are launching something
similar to what I am building... but also the fact that it doesn't bubble to
the top on Google means that the market space is not dominated yet.

------
garraeth
I love this! And am totally going to use it. Removing the top "thousand sites"
removes pretty much all the sites I WISH I could have filtered from my Google
results anyhow (ehow, w3schools, etc).

One request: please keep the search text in the form field after clicking
"search". Just so users can search the same thing multiple times with
different values in the "Remove the top" drop-down.

------
cnbeuiwx
Thank you - very refreshing! DuckDuckGo should implement something like this
just for the spirit of it.

The web just got more interesting. :)

------
fibbery
Doesn't make a dent in the travel site spam, unfortunately. though I might use
this just to permanently remove About.com...

~~~
taxonomyman
A few of the suggestions included the ability to set include/exclude sites
which I think we'll add.

------
83457
Just re-found a site I was looking for but couldn't find with google the other
day. This could definitely be helpful.

------
selectout
Not sure if I found an anomaly or what, but a simple search of "Privacy"
returns results from thesaurus.com, merriam-webster.com, truste.com,
kelloggcompany.com, and many more that are all in the top few thousand
according to QuantCast and Compete.

Great idea though, will definitely try this out some more.

~~~
taxonomyman
merriam-webster.com redirects to m-w.com - looking into the others.. Sometimes
if a site has header redirects it gets lost in the filters. Thx for pointing
this out.

~~~
selectout
ahh that explains it. I'm actually working on a personal project right now and
this could help out quite a bit, so I am excited to see where this goes. Best
of luck!

------
dclowd9901
Is it just me, or are the results fairly congruent with standard results from
a search engine?

~~~
taxonomyman
They are to a degree. We just remove remove the top million most popular
sites. So, searching for "social network" would normally yield Facebook.com,
but with MillionShort you discover a social network that that didn't make it
past the noise.

------
duck
Great idea, although I think if you could explain it a bit better you could
avoid the confusion like several of these comments are showing. I like how my
Hacker Newsletter project shows up #2 when searching for Hacker News. :)

~~~
zenpaul
And how do you determine the "top" million sites?

~~~
dahumpty
Check out:

<http://www.alexa.com/topsites>

------
tocomment
I think remove results with my search term in the domain name and this would
be perfect!

For example I searched for how to start a garden and I can guarantee that
startagarden.com is junk. But indie see some useful advice from small blogs
etc

~~~
taxonomyman
How would you like to see this work?

~~~
tocomment
Well for example I searched for "grow a vegetable garden" and I still see a
lot of results from URL's like:

<http://howtogrowavegetablegarden.net/> <http://www.grow-your-own-vegetable-
garden.com/>

And I just know any domain name optimized for a certain search term is going
to be garbage.

I guess a simple version of this feature is you'd have a setting: "Exclude
domains that contain my search term".

When the user clicks that you'd compare the domain name (removing all special
characters) with my search term (also removing all special characters and
white space). Maybe compare via edit distance and exclude if it passes a
threshold?

Although edit distance might not work too well, perhaps looking at the longest
common substring and if it's > say 90% of the length of my query exclude it?

I guess it would take some playing around. But there should be a good
algorithm to exclude domain names very similar to my query.

------
doktrin
I'm getting odd results with the following query :

Search String : Ruby

Remove From Top : 1000 & 10000

In both instances, the top hit is <http://www.ruby-lang.org>, which is also
the top hit from both Google and DDG.

Am I missing something?

edit: formatting

~~~
prawn
I think that removes the top 1,000 or 10,000 across all searches, not just
"ruby".

------
hybrid11
This is a cool search engine for discovery, but it defeats the purpose when
you are looking for a location. Do a search for "facebook", you will not get
any result that links you to facebook.com .

~~~
kevinrpope
I've seen people do this a few times, and don't understand why they do it
(Google 'facebook' or 'twitter' and then click the link to get to the site
instead of typing 'facebook.com' or 'twitter.com' into the address bar). Can
you explain why? I'm truly curious.

~~~
djeikyb
Couple reasons I've encountered..

1\. User doesn't understand the WWW. They believe Google is THE gateway to
everything internet.

2\. User doesn't know (or has trouble typing) the exact address. Google tends
to have the authoritative result first for popular sites like facebook,
amazon, twitter..

~~~
sanxiyn
I also often don't know the exact address, but browsers autocomplete URL for
me. I think it's a fair bet those who search for Facebook already visited
Facebook. Why search then?

~~~
djeikyb
I prefer to go straight to a site, rather than detour through a search engine.
That said, I use chromium, and use the address bar like a search engine.
Sometimes the auto-complete half-way through typing is the search rather than
the site. If I don't notice..

------
brudgers
I would like to see the search engine adhere more strictly to quoted search
terms. It seems that they are partially ignored, which gives it some of the
same problems that the major engines have.

------
radley
Please add a favicon so I can see in my (icon only) Bookmarks Toolbar.

~~~
taxonomyman
favicon coming right up. Minutes away.

------
ChristianMarks
Good idea. It's about time that search engines route around the power-law
distributions of popular sites, popular bloggers and personalities to find the
gems otherwise buried in the noise.

------
zecho
> Imagine a search engine that simply removed the top 1 million most popular
> web sites from its index. What would you discover?

A lot of my competitors who are still on the first page of Google results.

------
mserdarsanli
Wow that is pretty awesome. I reached some results I want that I could not
find via popular search engines with hours of searching. Believe it or not,
this engine is changing my life.

~~~
taxonomyman
Glad that code is being put to good.

------
thorin_2
Very interesting. I was pleased with the results and have already added this
site to my Chrome bookmark bar, right between my Google search and Hacker News
icons.

------
Tossrock
Well, in a rather meta turn of events, searching for my username on this
returned a link to hackerbra.in, which appears to be some kind of HN mirror.

------
carlosaguayo
If you search for "google" and remove the top million results, you still get
google main page (in this case, the one for australia and india...)

------
grampajoe
The site's way too wide on my netbook, 1024x600. Also, the list of domains
removed from the results covers up part of the results themselves.

------
dude123122
Another Cool feature could be to exclude sites that use Adwords or Paid Search
from the list too. Then it would really just be legit sites.

------
rogerbinns
It doesn't appear to work. I did a search for aspirin and the top match
returned by this is #5 doing the same search with Google.

~~~
anigbrowl
Top million sites, not top million results.

~~~
taxonomyman
Correct. Top million sites.

------
mathetic
If this becomes popular, at some point results would disappear since unpopular
sites will be pushed into the first million.

~~~
taxonomyman
I guess the thought would that graduated to some level of critical mass. Sort
of like a kickstart program for sites.

------
taxonomyman
8 hours later, we just launched our first re-design. Thanks for all the great
feedback and support. More to come.

------
it
I really like this. Right away I found some new sites about that I hadn't seen
before with interesting content.

------
hsparikh
As someone learning web development, I'd love to get some insights into how
one could build this.

~~~
lubujackson
Not knowing anything about what they do, the hack-ish way you could do it is
to use Google CSE (custom search engines) to add a list of negative domains.
Where to get the list of top 1 million domains? Probably from Quantcast here:
<http://www.quantcast.com/top-sites-1>

------
pnathan
Thanks.

Google has really killed the discoverability of the internet for me. I will be
experimenting with this.

Best of luck.

------
wyck
This is an incredible breath of fresh air, what an odd thing to say.

------
pazimzadeh
Removing Wikipedia might be a mistake. Otherwise, it's great.

~~~
romaniv
Why would anyone want Wikipedia in search results is beyond me. If I want to
read a Wikipedia article, I can just search Wikipedia itself. I know the kind
of information it has, so there is no point in "ranking" it against other
websites.

~~~
tom9729
I'm not sure if this is still the case, but in the past Wikipedia's search
engine was terrible and it was actually easier to google "X wikipedia" or "X
wiki".

------
Ben_Burke
i searched for my site and in teh goog i get first page....here i found
nothing....so for me this = no good. I understand the base but i dont
understand the result

------
tomelders
Should I be depressed that I'm the top hit for my own name?

------
stcredzero
I wonder if there's an analogous hack for social news?

------
qwertyz
Now if I could add it to firefox's search bar...

~~~
johnpowell
I came here to make the same request.

And I am loving this.

------
cleverjake
very interesting hack. thanks for doing it

------
thar2012
non-popular websites will start seeing some good traffic suddenly. It would be
confusing for them :)

------
tinyjoe
my website ranked 1st? guess i need to work harder T_T

------
tsunamifury
Searched my name... Got my website

:(

------
taskstrike
you should some how incorporate hipster into the search site's name.

------
digitallimit
I searched for "Hero Academy" and the first result was Google's 5th result, a
site called Hero Academy with the url "hero-academy.com". That's not very
"million short", IMHO.

~~~
abrichr
The site removes the top million sites by popularity, not the first million
results returned by a Google search. "hero-academy.com" is likely not in the
top million most popular sites.

