

Google: Large-scale Incremental Processing Using Distributed Transactions - yarapavan
http://research.google.com/pubs/pub36726.html

======
bravura
What I really want to know is: Given that I used to do one updates of size n,
and now I can do k updates of size m (n=k*m) in the same time, I can
essentially do k times as many updates. What is k?

"By replacing a batch-based indexing system with an indexing system based on
incremental processing using Percolator, we process the same number of
documents per day, while reducing the average age of documents in Google
search results by 50%."

This statistic, about a 50% reduction, is difficult to interpret. It is hard
to understand what sort of improvement on would see on other measures, like
the one I proposed above.

~~~
equark
The 50% statistic is related to what matters to Google: changes in real search
results. It is pretty irrelevant for anybody else. The speed of adding a
single page to the index and updating is, of course, much, much faster
(~1000x).

I can't parse the measure you propose above.

------
yarapavan
Percolator provides two main abstractions for performing incremental
processing at large scale: ACID transactions over a random-access repository
(makes it easier for programmers to reason about the state of the
repository)and observers, a way to organize an incremental computation.

