

Google's Colossus Makes Search Real-Time By Dumping MapReduce - brown9-2
http://highscalability.com/blog/2010/9/11/googles-colossus-makes-search-real-time-by-dumping-mapreduce.html

======
stingraycharles
I don't get it. They were using Map/Reduce as a way to build the index, which
they were able to query in mere milliseconds. This article claims that in
order to facilitate Google Instant, they had to ditch the Map/Reduce-oriented
updating of the index.

How are these mutually exclusive? If you look at quotes like these:

"Goal is to update the search index continuously, within seconds of content
changing, without rebuilding the entire index from scratch using MapReduce."

To me, this seems as if this change has nothing to do with Google Instant.
This has more to do with being able to respond instantly to new content,
instead of being able to query the index quickly.

It sounds like they added support for distributed stored procedures on top of
BigTable, which reminds me a bit of the way MongoDB implemented Map/Reduce.
But I bet that they in no way at all have dumped Map/Reduce.

~~~
fizx
> To me, this seems as if this change has nothing to do with Google Instant.
> This has more to do with being able to respond instantly to new content,
> instead of being able to query the index quickly.

Right, this was part of the caffeine update, which happened months ago.

> But I bet that they in no way at all have dumped Map/Reduce.

The old way of doing calculations on the web graph was giant iterations on the
adjacency matrix via map reduce. In the new system, they are probably doing
local walks in the instantiated graph. These local walks are simple
iterations, not map-reduce.

~~~
Anon84
> In the new system, they are probably doing local walks in the instantiated
> graph. These local walks are simple iterations, not map-reduce.

Actually it using Pregel, a variation on Map/Reduce:

[http://googleresearch.blogspot.com/2009/06/large-scale-
graph...](http://googleresearch.blogspot.com/2009/06/large-scale-graph-
computing-at-google.html)

[http://horicky.blogspot.com/2010/07/google-pregel-graph-
proc...](http://horicky.blogspot.com/2010/07/google-pregel-graph-
processing.html)

<http://portal.acm.org/citation.cfm?id=1582716.1582723>

------
tarvaina
The Register article linked by the blog post (
[http://www.theregister.co.uk/2010/09/09/google_caffeine_expl...](http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/)
) is more informative than the blog post itself.

To me, it looks like the blogger makes a few wrong assumptions: He confuses
Google Instant with Google Realtime. He assumes that "something like database
triggers" is actually very much like database triggers and could be e.g. used
to check the integrity of the data. He goes wildly off on a tangent with
Internet DOM.

------
DougBTX
This article seems to confuse real-time search (producing results from web
pages put online moments ago, such as news articles and twitter feeds) and
Google Instant (search as you type results using Google Suggest).

------
fauigerzigerk
Today it's MapReduce but wait a few years...

09/11/2020: Google has found that its BigTable database is very limited in
terms of the types of queries it can perform and that the kind of data
modeling it enforces leads to a number of nasty inconsistencies. Under lead
engineer Matt Codd, Google researchers have been working on a secret project
codenamed 'Join' in order to remedy this situation. At this year's I/O
conference the search behemoth will present the successor to its dated
BigTable system. The new database system is rumored to be named DB2.

------
mkramlich
I doubt they abandoned Map/Reduce, just changed where/how it's being used in
their architecture. It's way too powerful of a technique, especially at their
scale, to abandon it entirely.

Also, I didn't find this article to have a lot of substance. If anyone knows
of any more detailed description of the under-the-hood changes, please point
it out for us.

------
guelo
As an engineer I'm blown away that they can pull this off at all. As a user
one thing that's pissing me off about Instant is that it doesn't honor the
'Number of Results' search setting, it always returns 10 results.

I feel like the Louis CK airplane internet guy, "pfff this is bullshit"
<http://www.youtube.com/watch?v=rOtEQB-9tvk>

