Samuru doesn't use link authority, it analyzes pages and matches what you queried to the types of pages and picks the best matches.
Let me give you an example.
You search for "How to Make cupcakes"
Google says give me the pages that have the most inbound linkes (over simplification) that contain all those words.
The winner is Brandon's Cupcakes (not really but play along for a minute) because it says, "We know how to make the best cupcakes, because we have been doing it for 25 years"
That is not a useful result. Samuru on the other hand says "how to make cupcakes is a search for instructions" and it looks for pages that match the words, and are written as instructions.
We weigh other factors, like is there an author associated with the article. Do they routinely write about the topic?
We do this for reviews, products and other things as well.
To be a full replacement for Google we need Driving directions, and image search and a lot of things. But in order to do all the other things we are doing we needed a search engine. (related content, analysis, speed testing, building a corpus of words)
Responses get better if you search something someone else has searched or do a second search 30 seconds later. This is because we haven't deep indexed the entire Internet yet, and so we don't have all the deep data.
I applaud this ambitious project but I'm skeptical you'll achieve what you aim for if you're way off the mark in understanding how Google is so successful...I mean, to even talk of replacing Google at this stage -- and saying it's just a matter of providing rich snippets and other ancillary features as if that was your engine's main deficiency compared to Google -- is quite bold and a little cart before horse, IMO
Edit: an example...I did a search for my own name, something I do habitually because I'm locked in an eternal struggle with a younger, better looking, more talented namesake for the top Google result. However, your search engine returns neither me nor my singing rival as the top result...instead you return the domain that is my first and last name with a hyphen, which is exactly the superficial result that Google was designed to avoid.
Considering his comment drew out Matt Cutts, I'd bet he understands Google pretty well.
But the OP's claim stands on its own and makes assertions that can easily be verified. Are you arguing that Google's search engine is as simple and literal as the OP claims?
While Google gets things right with out all of our language stuff a lot of that is because they have user data about what people are clicking on and which things they come back after reading.
That data means if they can get the stuff on to the front page they can "crowd source" the rest.
We don't have that user base for a feed back loop. We have to get our results entirely based on software.
It's not like he was writing up this big detailed blog post.
Or know that it is a Review? (not just has a rating)
And you have to have all the words or a synonym of the words. But more importantly Google doesn't know what kind of content something is. Or what questions the content answers. Our system knows "this document answers what do aardvarks eat"
To answer your questions, yes and yes, Google can derive the meaning of my search without relying on literal interpretation of the search terms. In fact, Google can return what I want even if I deliberately spell every word in the query incorrectly:
"couk besr ribz" instead of "cook best ribs" brings up recipes of how to do good ribs:
I'm willing to take you at your word, that there is a better way to interpret context and meaning from a search query than however Google does it...but if that is the entire raison d'etre for your search engine, can you come up with at least a few case examples where this is the case? The "cupcakes" you've contrived is clearly hypothetical (and not at all close to what happens in reality), and the ones that I've tried don't seem to show any improvement on human-friendly results. Which is not to say that samuru is a bad product...what you claim to do is incredibly difficult and is exactly the feature that makes Google such a useful, ubiquitous engine...I would love to be surprised but I'm skeptical that a new engine with a fraction of Google's processing power, nevermind the resources for test engineering and algorithm design, can compete with Google here...this isn't a "well, why hasn't anyone ranked search queries in this way before?" in the same way that PageRank/BackRub was 15 years ago...Search engines have been analyzing queries for intent, and their shortcomings in this area are due to it being a very hard problem, and not for lack of desire.
The 10 results give me:
2 hits for boat related stuff, with no food.
4 hits for restaurants, with no recipes.
1 hit about cervical ribs.
And then, in positions 5, 8, and 9 there are recipes.
So, even with all the smart people and computing power at Google this is still a very hard problem.
It's great to have different people trying different approaches.
Either way, it's a deliberately contrived example to show that Google was more sophisticated than the GP indicates...I have no comment on whether amazingribs.com really does have the best ribs preparation tips
While we get the same top result for this, we also show that WikiHow.com linked to the Python Docs. So you can see closely related articles grouped together.
This is useful for things like http://www.samuru.com/?q=samsung+apple+patent+press+release
But Google requires the words or the synonyms to be on the results page. So do we, but we know that a search for http://www.samuru.com/?q=cook+bbq+ribs is not just about how to make BBQ Ribs, but to make the best BBQ ribs, because this is a subjective topic, I like honey BBQ, someone else likes Mesquite. That factors in to our results.
http://imgur.com/ghYKH7K shows my results. I can't reproduce the ones you describe.
Regional variations might account for the differences. Do you live in a place where people don't eat ribs? (If so, please accept my sincere condolences)
>Or know that it is a Review? (not just has a rating)
>But more importantly Google doesn't know what kind of content something is.
You don't think Google knows how to classify content?
Google can't tell that something is instructions. That's not something they do. It may know that ehow has a lot of pages with How To on them, but it doesn't know if those are pages with step by steps for how to do something, nor does it know that Rotten Tomatoes has pages with Opinions and Points, and conclusions that make up a review.
Are you using semantic web btw?
I could tag it review and have it be a sales copy page for a product.
Our stuff knows that a review expresses an opinion, backs that up with facts, and makes a conclusion.
I'm not sure I agree with this. Google may not know it explicitly but it can effectively know it, perhaps in some cases even better than the average human would if given the same search keys. It is the difference between knowing what a word means by the way people actually use it versus by looking it up in the dictionary.
Google essentially crowdsources its results, taking advantage of the fact that people entering similar queries probably have similar intentions. If your users are only using keywords, with little regard for word order, I don't see how you can do better than Google on average. You may do better for obscure queries where the best search result can not be easily inferred from the keywords present, but how pages are like this? Furthermore, if there are a set of keywords where semantic analysis suggests page A is the best result but the page that most people actually click on is page B, which do you return first? What if your index doesn't even have B? This will be challenge you will face when trying to do better than Google on average. Nevertheless, I applaud your work and will definitely keep Samuru at the ready for the queries that Google struggles on.
Who said we don't use similar search to infer intent?
We can't really use the click throughs until we have users. We need feed back to improve results. But this also powers the related search in our TLDR Products ( http://www.tldrstuff.com ) and those have to work with much more abstract queries, because they aren't user queries they are generated queries.
Perhaps you have solved this problem, but I just don't see how you can offer better results than Google on average (without having a similar sized index) when users are just throwing together a bunch of words related to what they are looking for. It seems to me that if we want to take advantage of systems like yours for search and if we want to get better results than Google, we need to change users' behavior; they need to learn to give more precise queries.
Now that we are both in the business of stopping spam we should grab lunch sometime.
I realize that this is a bit of a nitpick, but I felt the need to mention it.
Having just played with it, it feels both backwards and refreshing to go back to that. The results are different enough to feel good for the terms I used.
Better Social Media integration. We do Facebook, Twitter, Google Plus not just Google Plus for showing authors.
Voice Input if you are on Chrome 25 or higher.
Results are returned with Summaries not Snippets.
With that I am falling asleep. I have enjoyed answering questions on this an the https://news.ycombinator.com/item?id=5579336 thread but 5 hours of it has worn me out. If you leave comments I'll promise to get back to them.
1) Are you sure that giving a "bonus" to domains containing a part of a query is a good idea ? I understand the reason behind that, and know that you need time to turn off this "bonus" but waiting that moment are you really sure that is a good idea ?
When I type "How to rank well on Google" the first results is www.google.com => http://www.samuru.com/?q=How+to+rank+well+in+Google
Instead from the third positions the web pages seems to be great.
2) how works the search suggest ?
I m a french user and in our language we have a lot of accents like "é è ù à". While typing a search query many people do not use them. When i correctly type a query with the accents, Samuru suggests the same query but without accents, this is wrong and that's why I m asking me about the provenience of data used by the search engine to provide these queries suggests.
I really wish you to accomplish this project.
P.S.: I mistakenly typed HOT to make cupcakes
Something I find interesting is that one of the snippets samuru gave me (on the 5th result) has a pretty good description of the lysis as the item most likely to be the "plato dialogue concerning friendship": "the dramatically later Lysis presents Plato's more developed understanding of love and friendship than the dramatically earlier Symposium and Phaedrus". From this description of the Lysis one could gather that the text of the Lysis itself should be a very relevant result to the query; at the very least, that information about it should be weighted as more relevant to the query than info on the Symposium or the Phaedrus, and then info on those over all else. From this, I think, one could build a better representation of a good answer to the query than in google or samuru.
I think natural language analysis is very promising here. I hope work on this area yields good results, but it seems like a hard problem.
"baby features kept in adulthood" is the only one I've thought worth recording so far. You can compare the results in Google, Bing, DDG. Only Samuru and Google have it on page 1. Samuru has it as the first result. But this is just one example so I can't draw any conclusions. Curious to see how well it performs in general.
"baby features kept in adulthood" is a weird one too. You are right samuru yields the best result in first place if one meant to get info on neoteny, but then, on first sight, it is the only relevant result in the first page. And the same thing happens with google.
For simple queries 'strncmp', 'giraffe', 'sound transit schedule' ...
Google, Bing and Samuru perform pretty well. But Samuru is extremely slow.
For more complex queries like, 'seattle dumpling restaurant that is famous in singapore' or 'how to zip a list in ruby'. I find that Google always comes out on top, bing lacks the previous search history to personalize my searches and often thinks I mean (zip as in zipfile)... But samuru gave me relevant results for all three which is rather surprising.
Another type is one for people/social related searches... Bing's facebook/twitter/linkedin/yelp integration actually makes it better than google because the 'snapshot' bar it has is super helpful. However Samuru results are on par with Google and Bing results here (minus the snapshot bar).
Overall I was skeptical but other than it being unbearable slow (Google spoilt us with speed), Samuru does have very good search results for what I assume is not a mutlibillion dollar product.
We actually have a /programming slashtag that is very useful for these kind of queries.
Just for fun... results in tablet-friendly format:
"If it requires syntax it isn't user friendly" is our internal battle cry.
That's why blekko and izik both invoke that syntax "under the hood", automatically -- starting in November 2011.
Edit for context: original title read: "This search engine is better than google."
What is a casual user supposed to think when a new search engine claims that there are no results whatsoever for "Sex" on the Internet, period?
Disabling ads is actually pretty hard. We had Adsense running until we got kicked for having results on "Jail Bait" those two words alone are not dirty. But I didn't focus on building long lists of dirty topics so we were returning results on that.
Google is excellent. Bing is also excellent (with minor differences). DDG and Blekko are adding interesting and useful features.
But they all feel a bit like they're a mono-culture, and thus vulnerable to gaming. Black-hat seo seems to be something that Google is pretty good at dealing with. White hat SEO and ads have changed the web drastically from what I remember.
So it's really nice to have an alternative method of search that searches in a different way. Your post (https://news.ycombinator.com/item?id=5580321) highlights a few things I find frustrating in search at the moment.
 It's odd that all the work they do isn't noticed.
But to keep the engine running, and keep the hacker interested you should tell what distinction samuru is trying to achieve with its search engine.
And perhaps this query http://www.samuru.com/?q=porn should not be blocked by default, rather provide tools for safe search. Heard of the porn cookie guy? Just copy his footsteps, I'd say.
Disclaimer: I'm the guy behind Nuuton (a search engine).
We are focused on making interfaces that are Zero Learning Curve. Our goal is to allow you to ask for what you want and get it with out having to know how to ask.
The only easy to game part is that we give brands a pretty big bonus for themselves. Sony.com/playstation will always be the top hit for Sony PlayStation. Even if we should favor a .gov result that says they are recalled for bursting in to flames. But as that rarely becomes an issue we are ok with that being number 2.
Anyway, keep cracking at it; I'm sure you'll get it sharper as you go.
It's interesting, results are not so far from what I want. I'll give it a look for my next searchs.
WTF is this shit?
- you need a favicon, so its possible to pull your site into an icon bar for bookmarking.
- you need a search engine registration, so its possible to use it from search engine tab in browser