
Million Short allows users to remove up to the top 1M sites from a search set - cpeterso
https://www.forbes.com/sites/julianmitchell/2017/12/31/this-search-engine-startup-helps-you-find-what-google-is-missing/
======
condiment
I use million short to search for howtos written by real craftsmen in their
field.

One example is an end-grain cutting board that I made recently. For most
things Ike that, the top 1000 are dominated by made-for-Pinterest blogs or
major sites that aggregate low quality content that's good enough to get hits
but not good for much more than that.

~~~
jjeaff
What would we do without the garbage content that is eHow and wikiHow?

~~~
frandroid
I had a friend who had a whole list of dozens of banned sites configured for
his google searches, which made Google way more useful. And then one day,
Google decided to cut that feature out, as part of their drive to remove power
user features across their products.

~~~
jjeaff
I have always wanted that feature. I never knew it existed previously.

------
mrfusion
I’ve always wanted to see a search engine thar takes the top SEO tips and
penalizes sites that use them.

For example searching for how to grow a garden. I never want to see
howtogrowagarden.com. I’d prefer to find more genuine l, non seo juiced
advice.

~~~
__s
Problem is that SEO is tied to appearing relevant, so to penalize SEO is to
penalize relevancy. Splitting SEO from relevancy is often a matter of making
better search

Searching for least-relevant can be pretty random: it's easy to point at the
center of a circle, but the edge of the circle is not a point

~~~
ryandrake
A perfect system would penalize SEO (appearing relevant) and reward actual
relevance. Of course, this is easier said than done.

~~~
wmeredith
This is exactly what the search engines have been attempting for the last 10
years.

------
b1daly
One simple tweak I wish I could do in google would be to eliminate search
results based on broad criteria. For example, no sites that have a product for
sale, no sites that repackage contents from other sites, no sites that include
a given word, no sites that serve adds.

~~~
pricetag
You can instruct Google not to include sites with a given word by prefixing
the excluded words with a dash. For ex:

> buy hoes -sex -porn

~~~
solarkraft
Yes, but the rest is not solved by that. Google has really let the search game
slide for a while.

~~~
SerLava
They've even removed a lot of the useful search operators. Someone needs to
bring advanced search back in a competing search engine.

------
simula67
Do people really think search is a solved problem ?

I still have difficulty finding information I need for work from company
Intranets

I still have difficulty finding really local information

I still have trouble finding news that is objective and not slanted or click
bait

I still have difficulty finding recommendations on finding recommendations for
good books to read

It seems to be like Google has really dropped the ball on search since they
acquired "lock-in" through Gmail, Chrome sync, Android etc.

~~~
rurban
For cooperate intranets Google is not a possibility. There xapian,
elasticsearch or lucene are the best, with xapian dominating the backend and
the Java stuff the frontends.

For the others SEO optimizations are a real problem, yes. You can only try
alternatives, like searx.to, bing or asking around.

~~~
Chriky
> xapian, elasticsearch or lucene are the best, with xapian dominating the
> backend and the Java stuff the frontends

Elasticsearch uses Lucene under the hood, in my experience Lucene dominates
the actual indexing and searching, although I'm not familiar with xapian.

By users, I reckon SharePoint search (FQL) is probably the biggest although it
is way behind Lucene in features.

Google actually used to sell bright yellow branded racks that they would come
and install on corporate networks to provide a "private Google" but I'm not
sure if they still do.

~~~
rurban
With "xapian dominating" I meant the technical side. Of course lucene has more
marketshare, because of the better elasticsearch frontend, and browser
support. I.e highlighting and jumping to the results in word or PDF docs.

I wouldn't trust Google locally neither, and it's expensive.

SharePoint search is unfortunately used too often, yes.

~~~
Chriky
What aspects does Xapian beat Lucene at?

~~~
rurban
Everything backend related. Much faster, much less memory, more backend
features, huge indices - Google scale. (Gmane was its most prominent public
user). Lot of language bindings like PHP, Perl, python.

~~~
Chriky
Can you link to evidence of it being faster, and expand on what you mean by
"more backend features"?

I believe you, I just can't find evidence online.

My application is not that big, around 10 million text files, but I would be
interested in anything faster (or allowing more complex queries) than Lucene,
which is what I use at the moment.

------
jdavis703
This is a cute idea, but there are some genuinely useful sites this winds up
skipping over. Wikipedia is a great resource for quickly understanding a topic
enough to drill down in to more precise research.

Where I see this being useful is searching for current events. For example a
search for a local double-shooting I've been following returned some
information I hadn't seen before. It would probably be good for them to focus
on more news-oriented searching as that's where there's a serious echo chamber
among the top websites.

------
serveboy
I use [https://addons.mozilla.org/en-US/firefox/addon/g-search-
filt...](https://addons.mozilla.org/en-US/firefox/addon/g-search-filter/) for
filtering Google. Can't live without it. It would be interesting to have a
Github repo with some precompiled filters based on certain business domains.

But for me at least, a couple of rules are enough to solve 99% of the
problems.

~~~
vanderZwan
I'm using DuckDuckGo and I just realised I should install something similar,
thanks!

[https://addons.mozilla.org/en-US/firefox/addon/ddg-hide-
unwa...](https://addons.mozilla.org/en-US/firefox/addon/ddg-hide-unwanted-
results/)

EDIT: You can actually achieve something similar with bookmarks, using
keywords and '%s'

The idea is: take a link to a search query string of a search engine, and
replace the query part with '%s'. For example, take the following search query
on DuckDuckGo:

    
    
        cute hedgehog -site:www.pinterest.com -site:boredpanda.com -site:amazon.com -site:etsy.com
    

This results in:

    
    
        https://duckduckgo.com/?q=cute+hedgehog+-site%3Awww.pinterest.com+-site%3Aboredpanda.com+-site%3Aamazon.com+-site%3Aetsy.com&t=ffab&ia=web
    

Bookmark that, and replace the "cute+hedgehog" part with "%s", then edit the
bookmark and add (for example) "ddg" to the "keyword" section, then typing in:

    
    
        ddg cute hedgehog
    

... will send you to:

    
    
        https://duckduckgo.com/?q=cute+hedgehog+-site%3Awww.pinterest.com+-site%3Aboredpanda.com+-site%3Aamazon.com+-site%3Aetsy.com&t=ffab&ia=web

~~~
serveboy
Nifty trick! Is there a limit to the number of site: filters?

Look for an extension that highlights the good results. I find that more
valuable than filtering the bad results.

~~~
vanderZwan
I just tried, DDG seems to be a bit unreliable, especially when youtube is
involved:

[https://twitter.com/JobvdZwan/status/950331823229472769](https://twitter.com/JobvdZwan/status/950331823229472769)

As for Google, the limit to a query is 32 words, apparently:
[https://imgur.com/a/XW1Qa](https://imgur.com/a/XW1Qa)

... however, it also supports _inurl: <query>_, so you can easily filter out
sites with manu subdomains (say, pinterest.com, pinterest.co.uk, etcetera),
just by using _-inurl:pinterest_

------
cobbzilla
I love the idea but if this gets popular I guarantee the content-farm/SEO
assholes will figure out how to be on the first results page -- registering a
boatload of domains is the obvious countermeasure, there are probably many
others.

~~~
dawnerd
Having worked for Demand Media, seo people will always find a way. It’s always
going to be a cat and mouse game, sadly.

~~~
thisisit
So is there a way around on this app?

~~~
dplgk
Be the 1,000,001th result?

~~~
dawnerd
What’s sad is considering how panda hit them they may benefit.

------
taxonomyman
We're actively working to add lots of features on Million Short in 2018. I'd
be happy to add any feature requests to our roadmap planning.

~~~
vanderZwan
If I block ads on principle, but still want to support your site because I
like the service, what are my options?

Also, adding a Dark Theme to the settings would be nice! (I'm trying to
minimise the amount of light I'm exposed to at night to reduce eye-strain)

------
jwilk
Archived copy, which can viewed with JS disabled:

[https://archive.is/c7AWg](https://archive.is/c7AWg)

------
sytelus
I tried out few popular queries where it's hard to find great content which is
not extensively SEOed and this is working great! Sure, results are bit sparse
but this is great tool to find good content that would be otherwise buried
beyond 10 pages. Now I think about it, there are lot of URLs that is sourced
by reputable authors on twitter, hn, Reddit etc - many of which would be
example of "dark content" \- i.e. Not easily found unless right keywords are
entered. For example, search for neural network from scratch and you are
unlikely to find great quality implementation like Layered [1] in any of the
search engines. Instead you will only find what was extensively linked by
others.

[1] [https://github.com/danijar/layered](https://github.com/danijar/layered)

------
themodelplumber
The actual site (millionshort.com) seems hugged to death. I was excited to try
it.

If it takes off, I wonder if the phenomenon will give a boost to affiliate
sites. If you can't be there in the top results, align with those who can.

------
guelo
I would have liked to read about how their technology works. I imagine they're
not building their own index using their own crawlers.

------
frandroid
"Nice startup you have there. We have found that users like what you do, so
we've added your algorithm as an option on our search engine. Your business
case just vapourized." \--Google

~~~
radmarshallb
This would skip a lot of the sites that buy ads from Google in the first
place. They'd probably rather just purchase it and shut it down.

------
ikeboy
How large is their search index and how much funding does it take to build an
index that can compete with Google?

------
ronnier
Clickable: [https://millionshort.com](https://millionshort.com)

------
djanogo
Oh the irony, Forbes itself is a low quality content aggregator that needs to
be blocked.

------
jxramos
The contrarian search engine, I like the way they think. I'll take it for a
spin later.

------
jwilk
Please use the original title.

~~~
dang
"... unless it is misleading or linkbait":
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

The article title is both, so we replaced it with representative language from
the text.

------
sjg007
Great idea.

