

Interesting New Search Engine - jcr
http://www.stremor.com/samuru.html

======
TYPE_FASTER
From the PDF, it sounds like they are exploring advanced information retrieval
techniques like local context analysis, sentiment analysis, and
person/company/location recognizers.

These ideas have been around for a while, and some of them are starting to
become more prevalent (phone number recognizers in phones, location
recognizers to link to a map, etc.).

Here's an example paper that talks about local context analysis:

[http://www.cs.uml.edu/~haim/teaching/iws/tirsaa/sources/ACM_...](http://www.cs.uml.edu/~haim/teaching/iws/tirsaa/sources/ACM_Transactions_on_Information_Systems/improving_effectiveness_of_IR.pdf)

~~~
jcr
Thanks. I'll give it a read.

In addition to the samuru.com search engine, they've also created another
display of their "Liquid Helium" tech at:

<http://www.unpartial.com/>

There are some youtube videos about it as well:

[http://www.youtube.com/watch?v=1DCLvBK_sEM&list=PLZ_j4Zk...](http://www.youtube.com/watch?v=1DCLvBK_sEM&list=PLZ_j4Zk4X876Tb9OqeZhOKDPMKh0xQrXb)

~~~
drakaal
Unpartial runs an older version of LHe. we have been meaning to upgrade it,
but it was optimized for Politics with a focus on the election. To make it
fast it doesn't have the full set of tools in it, so it's summaries are not
quite as good.

------
jcr
There isn't much detail on how their "Liquid Helium" tech actually works but
so far I've managed to find the following pdf and video channel.

<http://www.stremor.com/lhe.pdf>

<http://www.youtube.com/stremortv>

~~~
drakaal
Liquid Helium works through heuristiics. LHe is the symbol for Liquid Helium
in science, and is the intitials for Language Heuristics Engine. (we are so
clever we know)

<http://www.youtube.com/watch?v=9pb_F1mwLCo> explains it oversimplified.

Basically we create a very efficient set of classifiers using sieves.

Imagine you want to find a "How To".

What makes a great How To? It should be instructional. Have steps. It
shouldn't have a lot of opinion. It should have good descriptions. It should
be written as commands in the "you" form.

These are all language constructs we can identify. Then we score a document to
see that it meets a minimum requirement to be a "how to".

When you do a search we filter and rank results based on what kind of search
we think you were performing. We determine the type of search based on what
other people have searched, what kinds of documents come back and we build an
"ideal result profile" which we attempt to meet with the documents in our
result set.

If you are looking for a review we try to find a balance of positive and
negative.

If you are looking for a song we try to find lyrics and videos of that song.

We also summarize each result.

Some of these things happen only after someone has done a search for the first
time. So if you are the first person to try a search you may have to wait 30
seconds and try again.

~~~
hnriot
So, like google except not anywhere near as fast?

I tried it, the results were ok, I didn't see any result that was better than
Google (or bing).

I'm suspicious of anything that starts out by saying search is broken. It
isn't.

All these things you have built like sentiment analysis and summarization etc
do nothing new than using page rank. The reason is, those balanced review,
well written pages are the same ones that humans have curated and used in
their own citations. People do a much better job of language understanding
than some simple sentiment analysis and the other other things you do. Because
humans do it better, when google comes along and leverages all that human
curation, the results are staggeringly good. Sure, there are easy to find edge
cases, just like there were with your engine when i tried it, but I don't see
anything new here. Maybe you could provide some example queries that
demonstrate it producing better results than google or bing?

If, you could do comparison queries like Nikon vs canon and it produced a
summarized report, that would be interesting, but it didn't do that.

Song lyrics are easy! That doesn't demo anything, nothing could be easier for
a search engine than song lyrics...

~~~
drakaal
Like Google? No. Google uses inlinks and Keyword density.

Really how do you know something is a song? It's not the finding that is hard,
it is the knowing what to look for.

------
polynomial
After trying Samuru out, it looks like a great way to broaden a search,
however the results seem to skew toward the random.

That is, searching for specific concepts and things, it did not return the
most relevant results, but instead results that were heuristically
interesting- but not what I was looking for.

I'm also curious about the emphasis on the "Liquid Helium" name(from Language
Heuristics engine) which although clever sounds a little silly.

Potentially useful (we do need more search tools) but a bit wide of the mark
for what they seem to be claiming.

------
arindone
I am really impressed -- With the results in their current state, I'm excited
to see how they improve with scale (i.e. more and more people searching and
adding data to the machine learning algorithms)

I think a Bing vs. Google vs. Samuru challenge should be done a la what Bing
did in their most recent marketing spree; I'm curious what the rankings would
be

~~~
drakaal
Building one of those was harder than we expected, Google won't let you frame
their results page.

------
drakaal
I (Brandon Wirtz CTO of Stremor) am glad we keep hitting the front page of HN.
The response has been a bit overwhelming, so I will do my best to keep up with
the comments.

Some answers are already on the other two threads.

<https://news.ycombinator.com/item?id=5579804>

<https://news.ycombinator.com/item?id=5579336>

