

How do you build an aggregator? - mg1313

Ok, I've seen this announcement:
--------
The service crawls 3,500 open source forges and gathers statistics and data on more than 300,000 open source projects and 300,000 open source developers.
---------<p>Or...
---------
This services crawls and aggregates travel data from thousands of websites...
---------<p>And so on...<p>How actually an aggregator is built? How it works? Some insights would be helpful for me because I am trying to build a service which would be an aggregator (I won't build it myself but I want to understand).<p>Thank you.
======
nreece
You may start with reading <http://en.wikipedia.org/wiki/Web_crawler>

~~~
mg1313
So,using one of those crawlers presented in that article would be enough? I
read something about Nutch,Lucene,Sphinx but I'm not sure how the process of
building an agregator works: go crawl some websites,get the data in the
database or disk, analyze the data, display the data according to some
specified criterias. Are those crawlers spidering RSS feeds or a special
crawler is needed for that? A diagram would be helpul...

~~~
nreece
Aggregation and Search are two different components. Full-text search (
<http://en.wikipedia.org/wiki/Full_text_search> ) will help you index the
aggregated data, so that it can be searched through a UI.

------
mg1313
Hmmm...no ideas?

