

A new search engine - orangethirty
http://theopenstartup.blogspot.com/2012/09/a-new-search-engine.html

======
bad_user

        - I will respect privacy. 
        - It will not feature information bubbles.
        - There will be an API.
        - There will be social context in search.
        - It will be for hackers only until it is good 
        enough to be released to the general population.
    

If that's the value proposition, then it won't beat Google's search in any
meaningful way and it won't capture the mind-share of the "hackers" you want.

Google was successful because the relevancy of their search results were at
least 10 times better than those of the competition. And besides their near-
monopoly, they survive because their results are still better, not 10 times
better, but better nonetheless.

There are people here using Duck Duck Go. I'm not one of those people because
DDG does not have better search results for me. Google does a good job
currently at customizing search results. That's why they felt the need to
compete with Facebook in the first place, because people want search results
recommended by their acquaintances. They do have enough data to know at least
some of your interests but they lacked a good social graph. To see how they
can customize search results based on your Google+ graph, checkout [1]

And of course, the quality of Google can definitely improve. Personally I feel
like Google isn't doing enough to combat black SEO techniques and content
farms. This may be because they are trying to not piss off their users or
because those content farms bring them too much revenue. And it's also my
feeling that Google is no longer neutral - the placement of results from
Google Places and Maps whenever you search for places has hurt websites like
Yelp and TripAdvisor.

However an alternative search engine will barely be a glitch on anyone's radar
if your value proposition is stuff like "respect for privacy" or "an API". Not
to mention you can't provide both privacy and results based on "social
context" - to customize the search results in a social context, by definition
you have to track the user's social context. I did notice that the text says
"respect [for] privacy", but who's to say that Google doesn't respect your
privacy? That's not the same as giving privacy to users.

[1] <https://news.ycombinator.com/item?id=3452912>

~~~
Matt_Cutts
"Personally I feel like Google isn't doing enough to combat black SEO
techniques and content farms. This may be because they are trying to not piss
off their users or because those content farms bring them too much revenue."

If it makes you feel better, Google continues to roll out iterations of both
Penguin and Panda, algorithms which are targeted at black hat spam sites and
low-quality sites.

In fact, this past Friday we rolled out a change to reduce exact-match domains
(EMDs), which are domains like "buycheapviagraonline.info" that put a lot of
keywords in the domain name in an attempt to benefit in search rankings.

~~~
d0de
Huh. I thought it was an update to reduce specifically "low-quality" EMDs
rather than EMDs with "a lots of keywords in the domain name". Or are these
two ways of saying the same thing?

~~~
Matt_Cutts
The new algorithm is designed to target low-quality EMDs. HN readers are less
likely to know the term "EMD," so I went primarily for an explanation and
chose the example to help convey the connotation of low-quality.

~~~
d0de
Thanks!

------
lacker
The reason it's hard to compete in search engines is that an MVP is pretty
tough. If you can't type in [espn] and get to ESPN, then your search engine
hasn't hit minimum viable product yet. But it's not that easy to get to the
point where these "navigational" searches work. You probably can't do it with
just a few people and a few months. It requires millions of up-front
investment, like Cuil or Blekko.

So, if you want to build a new search engine, you need a more radical vision
for what you are leaving out. Either you are leaving out the vast majority of
the internet, or you are leaving out the vast majority of queries. Focusing on
the areas where the press has dinged Google like privacy or an API won't get
you there.

~~~
orangethirty
For the first months (years?), it will be exclusive for hackers. Thus the curl
approach to invites. Non-hackers wont get what curl is.

~~~
ElliotH
And those of us who cba to leave our browsers will quickly spoof our user
agent and get an invite anyway.

~~~
orangethirty
I know _that_. It is merely a filter to get the right people, not a wall.

------
danboarder
My first thought while reading this was how <http://duckduckgo.com/> is
already working on this problem, and is perhaps a lot farther along.

~~~
calebmpeterson
For what it's worth, you weren't the only one with that initial reaction.

I remained woefully ignorant of DDG until Spring of this year (no clue how, I
just missed it), but once discovering it, my definition of a good search tool
has been forever changed by a single character.

The ! (bang).

!walmart, !netflix, !.net, !clojure, !java, ...

!weatherspark (huh, it's not there, I'll submit that, now anyone can
!weatherspark)

The ! means DDG is my single-point search engine for almost any site
imaginable. And it uses that site's search feature instead of a naive textual
scrape (a la Google).

~~~
yk
You forgot !g (google search). That is at least the shortcut I think is most
important. With it, DDG is always at least as good as google, while it
provides a better interface.

~~~
calebmpeterson
I didn't even know that one; thank you!

------
rachelbythebay
Is every search engine focused on building the same system? You have a bunch
of crawlers, and then you build up some way to store all of this stuff and
index it, and then make some way to search it and serve an interface to it. Am
I right so far?

This is how web search has worked for a long time: make a copy of as much of
the web as you can, and then search _that_. This means a lot of missed
content, inconsistent results, and so much duplication it's not funny. How
many disk farms are out there solely to try to hold copies of the entire web?
How many RAM farms for the "hot" n%?

I came up with an idea for inverting web search. Instead of searching the
copies, search the actual sites with the content. But... instead of having to
find all of them to send them your searches, have them find you. It's like a
stock exchange for searching. I register a query, they pull from the firehose,
and they can provide their best match for it. Then it finds its way back to
me. It would probably cache old results to make response times reasonable, and
so that the sources wouldn't have to consume the full firehose.

That's the basic idea, and it goes from there.

I wrote about this in April: <http://rachelbythebay.com/w/2012/04/30/search/>

~~~
orangethirty
That is somewhat the approach I'm aiming for.

------
esrauch
The oft-maligned information bubble seems to have very real value that I don't
see mentioned that often.

The example that people always bring up are politically-aligned issues that
will prevent you from seeing the opposite side, which is an issue, but it
seems that the far more common case is that I'm searching for something like
"go construct" and I want to see something like golang and not
<http://www.goconstruction.net/>, the "bubble" makes it so that the terms will
disambiguate the way that I want them rather than a totally different meaning.

Good luck on this frighteningly ambitious idea though.

~~~
whatusername
Not to mention that when I search for something like "football" I obviously
mean AFL (Australian Rules Football). Google shows the correct result on the
top. Duck Duck Go shows a Wiki disambiguation link and then at least 5 pages
of either Round Ball or NFL.

Sometimes it needs to be: please.bubble.us

~~~
orangethirty
My point is that you sould have the _choice_ to be trapped inside a bubble.
Not forced.

~~~
Matt_Cutts
Just a quick note to say that you can turn personalization off with Google.
For example, you can choose the geolocation of the search results on the left-
hand side of the screen. Searching with an incognito browser window is another
option. You can also add "&pws=0" to turn personalized web search off.

In fact, if we personalize our web results, we mention that at the bottom of
the web page. You can click on that notice to see what kind of personalization
we applied, and we offer a link on that page to re-run the search without
personalization.

Google used to offer the link to turn off personalization above the search
results, but we eventually moved it below the search results, because
practically no one ever clicked that link.

We don't want anyone to be trapped in an information bubble either, which is
why we provide a wide variety of tools to help you slice and dice what you
see.

~~~
orangethirty
Here is the query string of a search I just did:

[https://www.google.com.pr/#hl=en&output=search&sclie...](https://www.google.com.pr/#hl=en&output=search&sclient=psy-
ab&q=Just+a+test&oq=Just+a+test&gs_l=hp.3..0l4.3947.9372.0.11775.30.20.9.0.0.1.1484.7932.2-11j4j1j3j0j1.20.0.les%3B..0.0...1c.1.Yu3MNcK3ca4&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=f3d97c742b6cf26a&biw=1280&bih=622)

You make it absurdly hard for a regular person to get out of the bubble. But
that's your business, and I respect that. But here is a question: How many
people can and will do all the things you suggested up there in order to run a
simple search? Nobody.

 _We don't want anyone to be trapped in an information bubble either, which is
why we provide a wide variety of tools to help you slice and dice what you
see._

Then why can't I search directly from Google.com, and not Google.com.pr or its
variants?

Why do I have to use a Proxy to do that (and its not perfect, either)?

Why can't I erase my past history?

Why do you force me to mix in my profile in G+ in every other service you
provide?

Please address these questions. Because, if you do provide a clear cut black &
white answer, you will save me months of work.

------
alid
All the best orangethirty! My two cents…the obvious benchmark / status quo in
search is Google's no-fuss solution - a kind of 'wham bam thank you ma'am'
(before you urban dictionary it, I mean that phrase in the context of quick &
to-the-point). So, what if you differentiated by being the antithesis of
Google? It's so crazy it just might work. Be visually-rich and inject the
principles of emotional design into search. Create 'clusters' or 'hubs' of
results around the search term - a visual representation that deliberately
includes a smorgasbord of websites, images, e-commerce, blogs and key social
media pages. (If you went with value proposition like this, hackers wouldn't
be the best target as early adopters; I know a visually-rich environment would
resonate with the female, right-brained and design demographics, and there are
certain markets in the world where this concept would have a particularly
popular acceptance - South Korea, Taiwan, Indonesia and Japan come to mind).

------
jrussbowman
I'm still of the opinion that if you're going to go for competing with Google,
Bing, Blekko or even Duckduckgo you're going to have to beat them on quality
of results. Reason being that most people are going to go to the site that
best answers the query they are entering. I'm not sure privacy, API or even
adding social context is going to provide a huge boost.

However, sometimes people need different ways to search. That's why I built
unscatter.com as it provides web and social results in a chronological order.
It's more useful for topics you're trying to keep up with rather than new
searches. For example there's a lot of technologies (and my favorite NFL team)
I like to keep up on what's new about. I use this search at least every 3 or 4
days to keep caught up. <http://unsctr.me/OyY534>

If you're going to go after the search market I think you need to come up with
a new way at looking at it entirely. I've reached the point where I'm probably
better off building my own crawler to continue further, the risk of using free
API's is a bit much to build a business on. For example I had to drop Twitter
a few weeks ago because their policy changes.

I don't think just changing policies around search is enough, duckduckgo
already did it.

------
pilooch
"Note: if the server gets hammered and goes offline, please send an email to
my address (check my Hacker News profile), and I will make sure to include
you."

OK, this is a joke. Interesting how this one can go to the top of HN that
fast... We're really hungry for a new engine, huh.

~~~
peteforde
This is not a helpful comment.

Regardless of how entitled to cynicism you feel, try to keep your feedback
pragmatically optimistic. I remember when the founders of Heroku told me that
they were going to build a web enabled Ruby on Rails editor, I thought that
was pretty dumb, too.

------
waterlesscloud
Best of luck to you. It _is_ ambitious to the point of crazy to take this on,
but it's not impossible. Someone is going to revolutionize the space, it could
be you.

As PG says, do you think in 100 years we'll still use something like the
current Google for search? Or something quite different?

So someone is going to make it happen.

It's an area that exerts a strong pull for me as well, because I'm so
dissatisfied with what we have now. There's gigantic room for improvement.

Don't aim at incremental, it's harder and even if you succeed at incremental
improvement, you'll lose due to inertia. Aim for massive improvement.

Start with the fundamentals- What is search for? Why do we search at all?
What's a better way to fill those needs?

------
khmel
Google's 'Scientologist' principle that what's true is what's true for you is
true. Google shifted from relevancy conception to 'value' conception.
Probably, they have some data that this makes users more satisfied in average.

I agree that you could beat Google on niche markets, some people will prefer
relevancy to value, like search of pdf documents for instance, or structured
search, etc. Although you should differentiate yourself from Blekko and
similar guys.

I was working on interest search and data clustering for Facebook year ago,
kind of mix of social network and search engine - it took me a lot of
resources that I did not have, but this was very exciting.

------
wickedchicken
There's an interesting paper that came out of Stanford's WebBase project that
might be helpful: <http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf>

------
dotborg
Just like DDG this one didn't mention the real challenge of making a search
engine: reaching other languages than english and other countries than
america.

Google is amazing about that.

~~~
orangethirty
Good point. I'm billingual, so that hits home. I will focus on making it good
in English and Spanish. Spanish speaking countries dont have a very good
opiton (google sucks with spanish).

------
rwallace
Thinking about what I would want that Google doesn't currently provide and
that would be feasible with today's technology, as a practical matter my wish
list has one item: when I search for a scientific paper, or terms for which
such a paper is the best match, I'm looking for a downloadable PDF, or to be
told if no such thing exists, _not_ an abstract with the actual content hidden
behind a paywall. A search engine that provided that, I would happily use.

~~~
orangethirty
Noted. Thanks for the feedback. Please request an invite. You hace some great
input and would hate to miss it.

------
psyklic
Their invite link seems to be broken. By trial and error, the correct one
seems to be:

curl
[http://orangethirty.webfactional.com/invite.php?email=your_e...](http://orangethirty.webfactional.com/invite.php?email=your_email_address@yourhost.com)

~~~
orangethirty
Sorry, it was 1:00 am when I uploaded the files. Let me fix that.

------
netvarun
IMO, a search engine for hackers would be a 'Code Search' (which Google killed
off a few months back) replica/clone.

It was great and awesome and I still miss it to this day.

~~~
orangethirty
That is the first iteration of what will be built. I actually want something
like it.

------
dutchbrit
Regular expression searching should be on that list too. :)

~~~
nisa
Google had this for Codesearch: <http://swtch.com/~rsc/regexp/regexp4.html>

~~~
dutchbrit
Already knew this (checked before commenting to make sure they didn't have it
and found that out). Although using regex might make it a lot more server
intensive - even though, the majority of people wouldn't use it besides
"nerds" like us

~~~
nisa
I think it would be difficult to calculate a good ranking when using trigrams
for regex-search (as in the link). As far as I know besides pagerank you have
to rely on a term-based ranking function e.g. bm25
<http://en.wikipedia.org/wiki/Okapi_BM25>.

Not sure if this is easy doable with trigrams.

------
hayksaakian
> equated Google+ to Android in terms of catch up Stopped reading here

------
cleverjake
the invite link is broken.

~~~
MichaelApproved
"Note: if the server gets hammered and goes offline, please send an email to
my address (check my Hacker News profile), and I will make sure to include
you."

A reference to his profile, but no link, along with the author posting this
article himself leads me to believe this was written directly to the HN
community.

That being the case, why has he not addressed this broken invite issue (which
was made shortly after the submission) or some of the other comments here? Why
post the article if you don't have time to participate in the discussion
you're trying to encourage?

------
redbad

        ...invite.php?...
    

There goes your credibility.

~~~
ebzlo
"thefacebook.com/index.php"

