Hacker News new | comments | show | ask | jobs | submit login
Can a small search engine take on Google? (onthemedia.org)
47 points by shortlived on Apr 13, 2013 | hide | past | web | favorite | 43 comments

It is reasonable to take on Google in search, I think DDG has done a great job in both differentiating their product and delivering a great user experience. The challenge is not search per se. There are probably 100 really hard technical problems between idea and search engine infrastructure, the challenge is the ads.

Search engines need money to operate (we, Blekko, pay a ton of cash every month to keep our portion of a data center in Santa Clara humming) and search advertising is an excellent product, but it is also a finite market. Let me explain.

So lets assume for the sake of argument that the amount of money everyone in the world is willing to spend on advertising is fixed [1]. You know like $32B/year. (I don't know the actual number that is just made up for illustration). These are "companies" (from single users to large multi-nationals) who are willing to pay money to a person who puts their advertisement in front of a potential customer.

So lets say Alice at BigCorp has an advertising budget of $1M/year. Maybe she is going to buy a TV spots with most of that and spend $100K on "Internet" advertising. She can either talk to a bunch of "properties" (which is what she would have done in 1995) or she will buy ads on "Google" which means they might pop up on AdSense for Content pages, or via AdWords in various searches, perhaps on your Gmail window, or your News feed. And she'll only pay for them when they get clicked on and she'll use her analytics to try to figure out how "impactful" that was. Or maybe she only has a $100 Ad budget and she will blow it all on AdWords for putting her ad on search queries that people might make when they were looking for her business.

The small search engine is at a disadvantage, not from a technology perspective (the searches can be better than Google's pretty easily for highly contested searches) but from a revenue capture perspective.

What is worse, bad advertising networks are really bad, they can serve up malware as a number of popular blogs have discovered. So people who are very brand conscious or burned by a bad ad network will shy away from those networks, making non-Google networks less effective (fewer advertisers so less competition for ad insertions) and search engines that use them get commensurately less revenue per thousand searches.

The one redeeming factor is that when you have the ability to crawl and index enough of the web, that asset gives you the ability to do some very interesting things. Fortunately things that others will pay for (because neither Google nor Microsoft/Bing will give you access to their index). The down side its not as lucrative (on a $ per kilo-core-cluster millisecond level) as running the combination of the worlds most used search engine feeding you the worlds most used advertising network.

If you ever had any doubt, Google's advertising business funds them like Microsoft's Office business funds Microsoft. If you ever split Google in two where its ads business offered services to anyone on a non-discriminatory nature, the world would be a more interesting place (and there would be several really interesting search engines with their own editorial slant, not just a few)

[1] This is largely true, although the "growth" in Internet advertising revenues has shown up as a decrease in other media advertising. From newspapers to radio those ad dollars are shifting to the net but the overall size of the pie is constant or shrinking slightly according to Advertising Age (http://adage.com/)

I cannot agree with this. Smaller ad networks currently have some advantages:

- Google takes a huge (also: normally not published/known - it was 69% the last time I saw a number, but they can adjust it as they like) chunk of the ad revenue, so advertising on the Google network will be very expensive in comparison

- Google has no/terrible customer service (e.g. you can get banned for their apparent misinterpretation of EMEA trademark laws, by using "iphone" in your ads that point to your page where iPhones are sold - good luck getting unbanned again)

The biggest problem is currently the market share of alternatives (including Bing e.g. in Europe) - and the fact that DDG does not have any ads.

Couldn't you just use Google or Bing to serve ads?

Not it you want to compete with them in the search space (and subsequently search advertising space). It creates something of a channel conflict for them :-) Eventually to actually succeed at it you need your revenue not to be controlled by your competitor.

I dont really see why. You can always play Bing against Google and trying to compete in both markets when you haven't made a dent in the first one doesn't seem reasonable.

When we switched to their ads our revenue increased 10x which made us super profitable (we were primarily a federated engine like DDG though).

This feels like history repeating. We tried the same thing 10 years ago (remember Clusty?). We got a lot of press with the same kind of silly titles - "Dump Google use Clusty", "Should you ditch Google? Clusty", "Will Clusty be the next Google". We played the privacy card. We had big fans and some significant traffic. But we never managed to make a significant dent in the market. And we grew convinced that nobody could take on Google head on.

At the end we sold it for little money, focused entirely on the enterprise (which had always been our main focus) and sold to IBM for real money.

Growing convinced might have killed it. If you're passionate about something, you might have found a way to beat Google... Not saying you did badly or anything; I never heard of the engine (before my time probably). I just mean to say it's a pity you grew convinced that Google was unbeatable.

Hmm... I spent 12 years building my business and you tell me I wasn't passionate. We actually did beat Google but in the enterprise space.

The best way to take on Google is to render web search obsolete (the same way Microsoft is becoming irrelevant because the PC is becoming irrelevant), not by trying to match with limited resources and knowledge what they do very very well with enormous resources and know-how. Thinking otherwise is not being passionate, it's being presumptuous.

Yes, change the way the game is being played, but don't play the game that Google is playing. Change the way people search for things. Right now everything is based on the keyword search, there has to be a better way. Make the web irrelevant the same way that gopher became irrelevant. I really don't want to be using a web browser 10 years from now.

Were you responsible for the moralist censorship of Clusty's results or did that come after you sold it?

No that was after we sold it.

Thank you. I enjoyed Clusty when it first came out; I thought the clustering feature at the heart of it was really innovative, and the UI was nice and clean. But after the new rules came out, I abandoned it completely.

remember Clusty?


You just weren't born yet.

Cuil, enough said.

But we never managed to make a significant dent in the market. And we grew convinced that nobody could take on Google head on.

Depressing. Mozilla can and should take them on, Google Search is biased given that their real customers are the advertisers and extremely ad heavy. Amazon could make a dent too but they'd be biased as well.

Maybe Apple...

To take on Google people should take on a defined niche. Don't try and solve search, try to solve some subset of search.

We've seen this work when people take on CraigsList and Ebay and other "unstoppable" tech companies. Don't attack head on.

For example, I can think of a few popular types of searches that Google doesn't do super well: code search, product search, local search, genealogical search, real-estate search.

Blekko is of course working on searching specific namespaces, but that's not what I mean. I mean taking on a single underserved domain and really making it perfect.

>> We've seen this work when people take on CraigsList and Ebay and other "unstoppable" tech companies. Don't attack head on.

This is good. I used to work with an industry-specific portal that was perfected to work with said industry. Google would never be able to touch this space.

Despite being a smallish industry, there was two large players and few smaller players. The innovation even in such a small space was quite astounding.

There are definitely some areas that are too specific for Google to really work with. It is very good as a general search engine, but if your time is dependent on getting information fast in a specific industry, Google falls flat.

The main issue with this strategy is that you only have a small subset of the population and you have to an expert in many domains to get it right. thus, you'll never be as large or profitable as Google. Of course, you better have people to talk to on the phone. That'll kill this engine before it gets off the ground.

The only way I can think of toppling Google is if you created an engine that really focuses on productivity and gaining market share from the people who really need information and who are willing to pay for said information and offer them a free version that is better than their industry-specific tools, then after gaining penetration and toppling some of their industry players, branch into focusing on finding cute cat pictures and the like. There is more than one way to gain mind-share. Finding a better way to find links to movie reviews is definitely not it.

Don't you run the risk of limiting yourself to a local maximum? When Google launched, Yahoo and Lycos were the dominant search engine. Google was so immediately, obviously better (in both quality of results and presentation thereof) that switching was a no brainer. If you're saying it's not worth taking them on head-on, then it's similar to arguing that their general product can't be improved upon. Of course, people said the same thing about the huge head start that incumbents had back in the 90s with building curates indices.

I'm not sure i agree with those as examples of things google doesn't do well. Product search works great for me, as does local. Ditto for real estate search. Maybe your experience has been different, but try adding the zip code to your search for local. I'm not entirely sure I know what exactly you're looking for in code search. For coding problems with code examples, google always finds the right SO page.

Is there a transcript of the talk?

I think there's a market for search engine competitors, maybe not in the "general public" category, but certainly so for some verticals - I was recently asked to build something that requires either a crawler or access to a search engine API, and I don't know if Google is what I want (probably will start with Bing if we end up doing the project).

> I was recently asked to build something that requires either a crawler or access to a search engine API

By the way, is it possible to get deep query results from Google or another search engine? Say I need to see top 1000 results for 100K keywords and use that as a seed for my own crawler. Is anyone offering that?

Does DDG do all their own crawling? If I remember correctly, they mix Bing and some of their own crawling to get you a search.

They do a bit of their own crawling. They take other people's crawls; Google and Bing spend a lot of money on crawling and it's odd to try to compete against those.

DDG also take a lot of other people's crawls.

DDG then add a layer on top. For example, 'official sites' are identified.

(I was reading about this just this morning, and frustratingly I cannot find the post again.)

Yes, it's more of a meta-search engine that fuses results from Bing, Yandex, etc.

I'd love to see some new search engines pop up and break up Google's monopoly. I'd like to see Apple release a better designed search engine, someone make an open-source search engine, IBM make a search engine with the tech behind Watson, etc. Right now there just aren't many good choices. Bing is basically Google with a face-lift.

I use Blekko. I like it, but it needs a lot more people creating topical lists of sites before it really offers something unique.

someone make an open-source search engine,

That already exists in a sense. You could start with a bunch of ASF projects... Nutch, Solr/Lucene, ManifoldCF, Droids, Hadoop, OpenNLP, Tika, Mahout, UIMA, etc. and build a reasonably good search engine. The problem isn't writing search code; it's scaling the darn thing up to "Internet Scale" and other things that get ya. Can you imagine how much hardware and how much bandwidth it takes to continually crawl the web, download, parse and index pages on the scale of a Google?

The other problems are things like preventing spammers from gaming the system, etc. Whether or not a search engine where all the algorithms were public would be easier to "game" is, I suppose, an open question. I think most of us intuitively feel that it would be, but maybe not.

No it would not be reasonably good compared to Google. It wouldn't even match what AltaVista used to be (given the relevance algorithms used in these products).

Note that I said start with those projects, and not end with them. Of course it would take more work to get in the same category of relevance as Google. But the point was that a significant portion of the code needed to build an "open source search engine" exists. But, even if it all existed, the problem would still be hardware and bandwidth. You can't easily build a Google-like without some serious financial backing, even if everything is open source.

I know everyone loves DDG here but to reach a mass audience, the first thing I'd do is to get a new name.

Anyone remeber AltaVista?

Whatever happened to cuil?

I don't think DDG qualifies as a small search engine. It may not have the same user base as Google, but its not small. Though I have never seen usage metrics, and am probably wrong. I do think that a flat search engine does not work anymore. People have been moved from an unlimited web experience to a social network controlled web experience. Meaning that a lot of search these days is done through walled gardens. Been pondering the problem for a while now. Anyhow, search has changed so much that its actually pretty difficult to offer something better. I can't define better because I can't define search that well. Anyone have any ideas?

1) I think better could be "more vertical".

2) SEO has made search for some items really lousy. Some way of filtering those results out might be handy.

3) Sometimes people just need a quick answer to a question. "What's the population density of Manhattan?" is reasonably easy to get an answer to, but "What's the name of that TV programme I used to watch in the 80s with a character called something like flimby or flombu and it was a steel egg thing" is much harder to search. (It's a demonstration of just how good search is that I can bash in a few keywords for such vague queries and get useful answers.) Uh, so some sites try to solve this by letting you ask a question and have humans answer it (Yahoo Answers; Stack Exchange), but still a best search engine would be better at finding these kinds of answers.

4) I make a post to Facebook. Or I see a post on FB. A few weeks, months, later I want to find that post again. But I have no hope.

5) I have about 3 million bookmarks. They are untagged and poorly named. I want something to crawl those and build an index, so that I can then find the URLs I want.

Interesting, Nuuton currently covers all of that. But I haven't launched the ALPHA due to my incomplete understanding of the problem (and a minor funding problem (working on that at the moment))

1. It is more vertical. Allows for deep search with filters. 2. Ditto. It uses hashtags, forward slashes and bangs. All with a different functionality. 3. I'm up to my neck in machine learning to improve this. 4. It allows for you to search your own posts. Yes, you can post to Nuuton. No, its not a social network. 5. It allows tagging with hashtags. Say #pizza #recipe.

Plus its available as an API.

I think anything with less than 1% market share qualifies as small.

I disagree about the flat search engine being outdated. I hate all the social crap, videos, and local listings the big search engines shove down your throat. I've heard a lot of other people voice the same concerns.

You can actually see some metrics at http://ddg.gg/traffic

Not until they have their own crawlers, fully dependent on their own crawls I mean. They can't compete with them using others' crawls.

I use DDG as my main search engine but sometimes I just give up and have to fall back to Google which I think is because of their reach or maybe filtering. Happens mostly with pages/searches specific to my country(IN).

yes, in fact I think this is the perfect time for it. Google has hit a wall. It has not really gotten that much better in years.

Any new player cannot be just as good as google. It must be much better than google. If you manage to build something that is objectively better you can expect investors to shower you with money. Would not be surprised if google, facebook, MS would enter a bidding war to buy you. Don't know if Apple would be interested though.

The only way google can improve search is to drastically change their search algorithms. Perhaps throw away whatever they are doing right now and approach it from a new direction. This is the opening I believe new players have. A new approach to search, like contextual search will probably be enough to seriously threaten google.

Google Now and knowledge graph, that's their new direction and a right one i must say.

The all human approach:

I think human powered search could be achieved by Google. All they need to do is track our clicks on the serps and interpret which sites are good. Rotate all sorts of sites in the serps and gradually build a database. The whole thing would be like a wiki, with everyone contributing a little, and benefitting from the whole. The only way to accurately beat spam and low quality is to use human feedback. They probably do use human feedback already, just in a different way.

I'm wondering if reddit and Facebook could use this approach to build search engines. They do have large databases of human preferences.

The machine learning (AI) approach:

Another idea would be to distill information from the web Watson-style and try to answer many questions directly instead of redirecting to external pages. So far Siri, Watson, Wolfram Alpha are ahead in this field.

It wouldn't need to be better, it would just need to be different.

Google is no longer about search.

Google is, and always has been, about organizing the world's information.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact