
A new search engine - trishume
http://www.samuru.com/
======
drakaal
I didn't submit this. I didn't know anyone was submitting it. Or I would have
written a post about why it is different than other search engines. Better is
still up for grabs because it is new.

Samuru doesn't use link authority, it analyzes pages and matches what you
queried to the types of pages and picks the best matches.

Let me give you an example.

You search for "How to Make cupcakes" Google says give me the pages that have
the most inbound linkes (over simplification) that contain all those words.
The winner is Brandon's Cupcakes (not really but play along for a minute)
because it says, "We know how to make the best cupcakes, because we have been
doing it for 25 years"

That is not a useful result. Samuru on the other hand says "how to make
cupcakes is a search for instructions" and it looks for pages that match the
words, and are written as instructions.

We weigh other factors, like is there an author associated with the article.
Do they routinely write about the topic?

We do this for reviews, products and other things as well.

To be a full replacement for Google we need Driving directions, and image
search and a lot of things. But in order to do all the other things we are
doing we needed a search engine. (related content, analysis, speed testing,
building a corpus of words)

Responses get better if you search something someone else has searched or do a
second search 30 seconds later. This is because we haven't deep indexed the
entire Internet yet, and so we don't have all the deep data.

~~~
danso
Re: your portrayal of how Google works...An "over simplification"? It's just
plain wrong. Google has, for quite awhile, not depended on sites containing
all the words of a query...and natural language processing plays a huge part
in analyzing intent of a query.

I applaud this ambitious project but I'm skeptical you'll achieve what you aim
for if you're way off the mark in understanding how Google is so
successful...I mean, to even talk of replacing Google at this stage -- and
saying it's just a matter of providing rich snippets and other ancillary
features as if that was your engine's main deficiency compared to Google -- is
quite bold and a little cart before horse, IMO

-

Edit: an example...I did a search for my own name, something I do habitually
because I'm locked in an eternal struggle with a younger, better looking, more
talented namesake for the top Google result. However, your search engine
returns neither me nor my singing rival as the top result...instead you return
the domain that is my first and last name with a hyphen, which is exactly the
superficial result that Google was designed to avoid.

~~~
drakaal
So you think Google knows if a piece of content is instructional?

Or know that it is a Review? (not just has a rating)

And you have to have all the words or a synonym of the words. But more
importantly Google doesn't know what kind of content something is. Or what
questions the content answers. Our system knows "this document answers what do
aardvarks eat"

~~~
gibybo
>So you think Google knows if a piece of content is instructional?

Yes.

>Or know that it is a Review? (not just has a rating)

Yes.

>But more importantly Google doesn't know what kind of content something is.

You don't think Google knows how to classify content?

~~~
drakaal
Then you think wrong.

Google can't tell that something is instructions. That's not something they
do. It may know that ehow has a lot of pages with How To on them, but it
doesn't know if those are pages with step by steps for how to do something,
nor does it know that Rotten Tomatoes has pages with Opinions and Points, and
conclusions that make up a review.

~~~
joedevon
Well their move to semantic technologies e.g. Schema.org is a step in the
direction of understanding these things better. Like if you mark up the page
with <http://schema.org/Review> wouldn't you agree that they know it's a
review?

Are you using semantic web btw?

~~~
drakaal
No. They know you tagged it a review.

I could tag it review and have it be a sales copy page for a product.

Our stuff knows that a review expresses an opinion, backs that up with facts,
and makes a conclusion.

~~~
joedevon
And are you using Semantics? Going to Semtechbiz in June? If so, I'll buy you
a beer ;)

------
fmoralesc
OK. My first try was to search for "plato dialogue concerning friendship".
Google gave me the result I expected (a reference to the Lysis dialogue)
through wikipedia and a bunch of articles about it (the most helpful being a
link to the Stanford Encyclopedia of Philosophy, ranked third). It didn't link
the text though in the first page (it only appears in the third page of
results, with a copy at the MIT Classics archive). Samuru gives me a bunch of
general articles on Plato first (oddly enough, the first results are articles
from the SEP, but not the article on "Plato on Friendship and Eros".), some
noise and then information on the Lysis. The text itself appeared at 25th
place.

Something I find interesting is that one of the snippets samuru gave me (on
the 5th result) has a pretty good description of the lysis as the item most
likely to be the "plato dialogue concerning friendship": "the dramatically
later Lysis presents Plato's more developed understanding of love and
friendship than the dramatically earlier Symposium and Phaedrus". From this
description of the Lysis one could gather that the text of the Lysis itself
should be a very relevant result to the query; at the very least, that
information about it should be weighted as more relevant to the query than
info on the Symposium or the Phaedrus, and then info on those over all else.
From this, I think, one could build a better representation of a good answer
to the query than in google or samuru.

I think natural language analysis is very promising here. I hope work on this
area yields good results, but it seems like a hard problem.

~~~
Dn_Ab
Counterpoint. I was surprised by it. A couple weeks ago I decided to start
recording any search phrases which I felt were tricky or required good
language modelling.

" _baby features kept in adulthood_ " is the only one I've thought worth
recording so far. You can compare the results in Google, Bing, DDG. Only
Samuru and Google have it on page 1. Samuru has it as the first result. But
this is just one example so I can't draw any conclusions. Curious to see how
well it performs in general.

~~~
fmoralesc
Just for the record: I didn't want to sound dismissive of samuru. I actually
think the results I had were not bad at all (the fact that it gave me a link
to the lysis in the front page was great UX-wise, even though it is worth
pointing out that samuru yields more results per-page than google). The case I
pointed at was simply a case where I thought information could be gathered
from the dataset that would lead to better results than those presented by
samuru, or google (I also tested bing, but the results imho were poorer than
google's).

"baby features kept in adulthood" is a weird one too. You are right samuru
yields the best result in first place if one meant to get info on neoteny, but
then, on first sight, it is the only relevant result in the first page. And
the same thing happens with google.

------
samirahmed
About 1 month ago I switched from google to bing. There different queries that
use the to measure 'better'.

For simple queries 'strncmp', 'giraffe', 'sound transit schedule' ... Google,
Bing and Samuru perform pretty well. But Samuru is extremely slow.

For more complex queries like, 'seattle dumpling restaurant that is famous in
singapore' or 'how to zip a list in ruby'. I find that Google always comes out
on top, bing lacks the previous search history to personalize my searches and
often thinks I mean (zip as in zipfile)... But samuru gave me relevant results
for all three which is rather surprising.

Another type is one for people/social related searches... Bing's
facebook/twitter/linkedin/yelp integration actually makes it better than
google because the 'snapshot' bar it has is super helpful. However Samuru
results are on par with Google and Bing results here (minus the snapshot bar).

Overall I was skeptical but other than it being unbearable slow (Google spoilt
us with speed), Samuru does have very good search results for what I assume is
not a mutlibillion dollar product.

~~~
jggonz
If it isn't too much trouble, try your queries on blekko and let us know what
you think.

We actually have a /programming slashtag that is very useful for these kind of
queries.

<http://blekko.com/ws/?q=how+to+zip+a+list+in+ruby+%2Fruby>

[http://blekko.com/ws/?q=seattle+dumpling+restaurant+that+is+...](http://blekko.com/ws/?q=seattle+dumpling+restaurant+that+is+famous+in+singapore)

Just for fun... results in tablet-friendly format:

[http://izik.com/?q=seattle%20dumpling%20restaurant%20that%20...](http://izik.com/?q=seattle%20dumpling%20restaurant%20that%20is%20famous%20in%20singapore)

~~~
drakaal
I like slash tags and Booleans, but the truth is search should work with out
the need for those things. We support - to make things go away. Later we will
expose some of the cooler things we do behind the scenes like "reviews" or
"instructional" or "oped" searches, or "Simple English" but we want to do that
in a way that doesn't require syntax.

"If it requires syntax it isn't user friendly" is our internal battle cry.

~~~
greglindahl
Yeah, it's definitely the case that users don't want to learn or type syntax
in a search engine -- Daniel Russell of Google says that a majority of
searchers think they are advanced users of search engines, but a majority of
them don't know about or use "" or -.

That's why blekko and izik both invoke that syntax "under the hood",
automatically -- starting in November 2011.

------
ok_craig
I don't understand. Is this spam? There is no context or accompanying article
for the claim. I searched my name and the results weren't nearly as good. One
data point, sure, but first impression is everything.

Edit for context: original title read: "This search engine is better than
google."

~~~
furyofantares
I think it's related to this post about Liquid Helium:
<https://news.ycombinator.com/item?id=5579336>

------
valtron
Doesn't work: <http://www.samuru.com/?q=porn>

~~~
drakaal
Nope it doesn't. We decided that it was hard enough getting advertising
without having "adult" search. We focus on text analysis so we aren't very
good at porn searches.

~~~
danso
Just curious, but why can't you simply disable ads on those kinds of searches?
Also, having no results for this seems inexplicable:

<http://www.samuru.com/?q=Sex>

What is a casual user supposed to think when a new search engine claims that
there are no results whatsoever for "Sex" on the Internet, period?

~~~
drakaal
I need to look why you didn't get a message saying we don't do those kinds of
searches.

Disabling ads is actually pretty hard. We had Adsense running until we got
kicked for having results on "Jail Bait" those two words alone are not dirty.
But I didn't focus on building long lists of dirty topics so we were returning
results on that.

------
DanBC
I wish you luck with this!

Google is excellent. Bing is also excellent (with minor differences). DDG and
Blekko are adding interesting and useful features.

But they all feel a bit like they're a mono-culture, and thus vulnerable to
gaming. Black-hat seo seems to be something that Google is pretty good[1] at
dealing with. White hat SEO and ads have changed the web drastically from what
I remember.

So it's really nice to have an alternative method of search that searches in a
different way. Your post (<https://news.ycombinator.com/item?id=5580321>)
highlights a few things I find frustrating in search at the moment.

[1] It's odd that all the work they do isn't noticed.

------
nilkn
I'm willing to have an open mind about this, but I think some sort of
explanation on what samuru is hoping to achieve in distinction from other
search engines would be helpful.

~~~
monsterix
Exactly. Doing MVP of a search engine is hard, so it is _okay_ to lack on
quality of results initially when you launch. On HN probably. Even DDG is
trying to only catch-up.

But to keep the engine running, and keep the hacker interested you should tell
what distinction samuru is trying to achieve with its search engine.

And perhaps this query <http://www.samuru.com/?q=porn> should not be blocked
by default, rather provide tools for safe search. Heard of the porn cookie
guy? Just copy his footsteps, I'd say.

------
D9u
I got fairly good results using decidedly esoteric queries, and although I'm
on a very s-l-o-w connection I didn't notice a great speed discrepancy.

------
orangethirty
Aside from the different processing on the back-end, what else does samuru do?
I'm curious.

Disclaimer: I'm the guy behind Nuuton (a search engine).

~~~
drakaal
Summaries instead of Snippets. Document Type to Query Type Matching (looks
like you are looking for a review we favor reviews, looks like you are looking
for instructions we favor how to's)

~~~
orangethirty
I wonder why you decided to follow the same old search formula. There is so
much to innovate in this area. For example, Nuuton uses #hashtags for trending
results. Say you go and make a search. All related terms would appear as
#hashtags somewhere in the page. These are created by the users and by the
system. It also uses the / and the ! to filter results in different ways. Say:
/Honda !modified, gets you pages of modified Hondas. Click on a #hashtag, say
#turbocharged, and you would get turbocharged Hondas. Why so many tools that
appear to do the same thing? They are close in functionality, but affect
different factors in the back end.

~~~
drakaal
How do I teach 2nd graders, their 62 year old teacher, and my mom to do those
things?

We are focused on making interfaces that are Zero Learning Curve. Our goal is
to allow you to ask for what you want and get it with out having to know how
to ask.

~~~
orangethirty
How do you aim to provide correct answers to incorrect or incomplete
questions?

------
saejox
I find it funny that a Google competitor search engine using Google Analytics
and AdSense.

~~~
drakaal
Selling your own ads is hard. Especially at low volumes getting started. So
your choices are basically Google and Microsoft. (Chitika doesn't pay
anything)

------
p1mrx
You should enable IPv6 on your naked domain, in addition to www. The DNS
records are listed here:

[http://support.google.com/a/bin/answer.py?hl=en&answer=2...](http://support.google.com/a/bin/answer.py?hl=en&answer=2579995)

~~~
drakaal
Google AppEngine Issue. I'm not sure we can make that happen, but I will look
in to it.

------
xaviel
Their SEO engine is easy to game

~~~
drakaal
In what way? Writing something the meets our qualifications for "what is a
review" is much harder to game than Link spamming. You can game the system
only by writing content that is useful to the user.

The only easy to game part is that we give brands a pretty big bonus for
themselves. Sony.com/playstation will always be the top hit for Sony
PlayStation. Even if we should favor a .gov result that says they are recalled
for bursting in to flames. But as that rarely becomes an issue we are ok with
that being number 2.

~~~
regal
There seems to be a strong emphasis in search results on domain name match,
similar to Google several years back. e.g., search "dog training" and examine
the results - there's a much higher mix of spammier content mixed in with
helpful content than you'll see in the other big name search engines' results.

Anyway, keep cracking at it; I'm sure you'll get it sharper as you go.

~~~
drakaal
There is, but the bonus goes down as we get deep indexing. So if you check
back in a minute most the time the spam will have moved down.

------
arcatek
"Why Samuru" => Spelling suggestion : "Why Samurai"

It's interesting, results are not so far from what I want. I'll give it a look
for my next searchs.

~~~
drakaal
Samuru was a samurai. Several Japanese anime characters are named Samuru and
are samurai. It is also turkish for otter.

~~~
hcho
"Su samuru"(literal translation: water sable) is the Turkish for otter.
Turkish is an agglutinative language, that last u is actually a possessive
affix and doesn't make sense when the word is by itself.

------
tokenadult
Ghostery reminds me that this site runs Google Analytics, so the site founders
apparently do trust Google for some services.

~~~
drakaal
We run on Google AppEngine. So we use Google for a lot of things. Building all
of the pieces that make google is more than 10 people can do in a year. We
have 5 developers. And most of those only came in the last 6 months. We may
build analytics, but it will be a while.

------
prawn
Doesn't seem to tailor results to your location so might not be as useful for
people outside of the US? Or did I just try a stupid search? I performed a
vanity search and it was listing different names before there was anything
about me. Same search in Australia on Google has me in four of the top six
spots.

------
kludu
I searched for "pussy", "sex" and "porn".

No results.

WTF is this shit?

------
nu2ycombinator
Better in the what sense? Its not better in respective to speed of returning
the results.

~~~
drakaal
Google has 100 people searching every thing that can be searched. We have to
do the work when you do the search. We get faster the more people use us.
Exponentially.

------
saintx
How can they trademark the words "Liquid Helium"? The first search I did on
Samuru was for Liquid Helium and it brought back about a half million results,
all of which I assume are violating its purported trademark.

~~~
drakaal
We can trade mark it in the context of software. We don't sell frigid gas.

------
kephra
two suggestions:

\- you need a favicon, so its possible to pull your site into an icon bar for
bookmarking.

\- you need a search engine registration, so its possible to use it from
search engine tab in browser

~~~
aw3c2
At least in Opera and Chromium you can simply right-click the search form to
add it.

------
bekman
Congrats, search my name and satisfied with the results,

------
Glyptodon
I got decent results for all the searches I tried. For a couple searches I'd
say I even got clearly better results than Google.

------
swah
Its nitpicking and a jerk to say this but.. the "powered by Liquid Helium"
tagline is quite corny...

------
Sami_Lehtinen
This search engine is miserably slow, there might be room for some
optimization.

------
lez
It works with TOR.

------
raulonkar
how ur search engines index the site? how can i submit in it?

~~~
drakaal
You don't have to submit we will find you. He have a bot, I apologize I don't
recall the user Agent at the moment... It comes from a Google IP address since
we are running on Google AppEngine. So we have less control over the bot's
user agent than I would like.

