

Course: The Google Technology Stack - yarapavan
http://michaelnielsen.org/blog/lecture-course-the-google-technology-stack/

======
amix
My guess is that this stack is getting out-dated, or at least improved vastly
as Google tries to do index updates in near real-time. The major problem with
map-reduce is that the results aren't in real time (e.g. you collect data and
then you run analytics on this data), running the analytics can take a lot of
time depending on the data set and the hardware in play. Calculating the
PageRank is done via map-reduce and this task can take an awful long time,
since the data set is huge. This has resulted in slow index updates. I don't
know how Google has solved this problem, my guess is that they have thrown an
awful amount of hardware at the problem or have improved their stack.

What's a better way to do it? I think it's creating an algorithm that can be
updated in real time and where you don't have to re-calculate the rank for
every page on each update. Such an algorithm would require a very different
stack than Google currently uses and my guess that their architecture will
move into this direction as they try to make their search real-time (which
from what I have read and experienced they are trying to do).

~~~
abecedarius
For the PageRank part, I'd expect the previous solution to make a great seed
to iteratively solve on the updated dataset.

I've seen some ideas about incremental mapreduce batted around and apparently
implemented: <http://www.google.com/search?q=incremental+mapreduce>

------
elblanco
It's great that people are looking at Google's tech as a source of teaching
material. But other big companies have equally interesting tech, some have
been around longer...yet it can be very hard to learn anything about what they
do...

------
yarapavan
The FriendFeed Room for the course: <http://friendfeed.com/lecture-course-on-
the-google-techno>

