The Anatomy Of Search Technology: Crawling Using Combinators 45 points by pathdependent on May 28, 2012 | hide | past | web | favorite | 5 comments

 Ok, I'll bite. From the article:`````` Now let's see how combinators (which we discussed in the previous blog posting) might make doing some of these computations easier. `````` So, how are they defined in the previous blog posting?`````` A combinator is an atomic operation on a cell of a database that is associative and preferably commutative. `````` No, it's not. A combinator is a function with no free variables. Even the example is wrong:`````` "Add(n)" is an example of a simple combinator; it adds n to whatever number is in the cell. `````` If "Add" was a combinator, the cell would have to be one of the parameters of "Add". Otherwise, the cell is a free variable.To be clear, I'm just complaining about the misuse of the term "combinator", since it's a word with a strict mathematical definition and no other common-language interpretation (like "function" or "operation" have). I'm not commenting on the actual content of the article.
 The initial value of the cell is (eventually) one of the parameters of add(n) -- when you compute the final value. Before you get to that point, the various add(n) operations aimed at a given cell are combined. In the diagram in the first posting in the series, 18 add(1) operations on the same cell turn into a single add(18) operation. It's only then the cell is read (or the bucket is merged) that the final value of the cell is computed.
 I understand, but I still think it's iffy calling it a combinator. Maybe calling it lazy evaluation (sort of) would be better?(I'm not sure if you saw my edit before you replied. I added the last two sentences only a few minutes before your response.)
 I found this to be a remarkably bad article. Out of this you'd learn nothing whatsoever about what it takes to build a moderate-scale (10 million pages) never mind large-scale web crawler.
 The goal of this article was to only talk about how combinators make crawling easier. If you'd like a more general introduction to the topic of crawling, I provided some references in the 3rd paragraph.

Search: