How do you use a probabilistic approach to scraping data? Were you able to get a low number of false positives?

Sorry for the confusion. They are used for "merging" scraped data from various sources, not in the scraping process itself. For example, they help in figuring out if similar-sounding listings on related websites refer to the same "thing".

If interested, take a look at this (and related) papers: http://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

That makes more sense. Thanks! I'll check out the paper. I was hoping you had some revolutionary new scraping method.

