
Ask HN: Is it economically possible to clone Google Search on the public cloud? - 19eightyfour
By this I mean, is it possible to build an economically independent ( revenues outweigh costs ) clone of Google Search, replicating basic functionality with a clone of PageRank and keyword indexing for relevance, a clone of AdWords for revenue , atop a public cloud like GCP or AWS ( or some other also public cloud option )?<p>For this question to focus on the economic feasibility alone, assume that such a clone: 1) could achieve some semblance of providing relevant search results without infringing any proprietary tech, and 2) either would have sufficient adoption that keywords would be in demand and generate revenue ( but don&#x27;t assume there&#x27;s necessarily enough to cover costs, since that is the point we&#x27;re trying to answer ), or that there would be some other way of funding ( subscription, PaYG ), that would generate revenue, and 3) that there would be enough of a market for this ( just assume that, even tho it seems unlikely that a less-amazing clone would have any market against a monopoly search provider ).
======
19eightyfour
I guess I'm looking for a simple yes or no in terms of yes, the costs of the
public cloud would not preclude such a service from being ( even marginally )
profitable. Or no, the public cloud is priced in such a way that a business
alike to Google Search could never exist atop it without heavy losses.

But I realize the reality may not be so simple. Hopefully I've constrained
this enough with assumptions that the question is answerable and clear. And I
don't know enough about the economics or infrastructure, or usage, to be able
to even begin to calculate this reasonably myself, nor to have a feel for it.
My feeling is "it would be close." which could be totally off the mark.

I guess my intention from asking this question is to expand my intuition for
the "lay of the land" of cloud computing economics in 2017, so that I can
better judge the type of and feasibility of businesses based on this model.

Finally, I'm not sure if this is the best place to ask and apologize if I've
annoyed anyone here by asking such a ( possibly construed as ) bold question.
It's not a super important question, it just popped into my head. I checked
out reddit but didn't see any cloud forums that were active. And this place is
probably better than asking on SE, because the answers, even if a bit irate or
confrontational sometimes, are probably more diverse and have a chance of
being more informed and in depth, in my experience.

~~~
hobofan
Considering that the profit margins of Google search are pretty high, yes, I
think it's possible to run it economically, though maybe not competitvely. You
would also be limited to a 90-95% product, because a lot of optimizations are
not possible in the public cloud vs. your own data centers.

------
d--b
Well, not the biggest competitor, but DuckDuckGo runs mostly on amazon AWS:
[http://highscalability.com/blog/2013/1/28/duckduckgo-
archite...](http://highscalability.com/blog/2013/1/28/duckduckgo-
architecture-1-million-deep-searches-a-day-and-gr.html)

~~~
jpalomaki
DuckDuckGo is not a very good comparison, since they don't do their own web
indexing. Instead they aggregate the web search results from various sources
like Bing, Yahoo and Yandex [1].

[1]
[https://duck.co/help/results/sources](https://duck.co/help/results/sources)

~~~
lucideer
One of the sources listed their is DuckDuckBot, which is them doing their own
web indexing.

But yes, they do augment it enormously with 3rd-party input. It's unlikely to
be their primary source.

------
exolymph
It depends on what you mean by "economically" — if you mean cheaply, then no.

~~~
19eightyfour
Exactly as I said, like revenues outweigh costs.

But you're right that economically can mean cheaply. I ought to change it to
"economically feasible" since that phrase has more connotation of revenues
outweigh costs. Unfortunately I can't edit it now tho.

------
zhte415
You mention cost, but you don't mention quality. 'Clone of PageRank' seems the
largest cost in terms of quality.

~~~
19eightyfour
I did but looking at it again it isn't that clear: "replicating basic
functionality", "some semblance of relevant search results". You're right that
emphasizing that more would be clearer.

So since the edit horizon for the post has expired I'll add it here.

Regarding quality criterion: replicating PageRank is possible ( link graph
betweeness centrality ), and AFAIK, it's not this algorithm, although it is
useful that yields Google's current edge, it's their proprietary
embellishments. What I intended for this question is it is a replication of
PageRank algorithm, which is still AFAIK, very useful. We can also assume that
we add any low cost improvements to it that help to fight spam or otherwise
improve results. Qualitatively, people looking at these search results would
say, "well they're not quite as good as Google, but they are good, there's
little spam, they're aware of my search history context." So basically, a
lightweight version of Google Search.

