

Why Writing Your Own Search Engine is Hard - darragjm
http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=143

======
jkush
Is it just me or is this a silly article? I'm not really sure what the point
is. There's the overweighted focus on hardware which isn't what makes writing
a search engine hard. Then there's the pseudo code towards the end which
really made the whole things bizarre.

~~~
corentin
It's silly in that the author means, by search engine, a clone of Google's
architecture:

"A crawler gets the Web pages off of that pesky Web and onto your beautiful
disks. You'll need lots of disks.

Then you need to index these pages--say which page has which words. This will
tell you that Janet Jackson was found on the www.superbowl.com page. Usually,
indexing happens locally on the disks where your crawler dumped these Web
pages. Hey, why move them?"

It's just one possibility. An alternative (not necessarily the best one,
though) could be a personal spider that starts somewhere and is intelligent
enough to find what you're looking for without crawling the entire web. Or a
peer-to-peer search engine, to share the infrastructure between the users. Or
a search engine that only crawl in the subset of the web you're interested in
(i.e. you manage a list of domains per user profile).

~~~
papersmith
A peer-to-peer search engine sounds good. The difficult part is convincing
users to donate the resources for your cause.

~~~
euccastro
How do Skype and bittorrent achieve that?

------
bharath
Its seems that search (at least text search) is more or less a solved problem.
Trying to innovate in the search space today is more or less as futile as
trying to innovate in the operating systems space about 10-12 years ago. Noone
really cares given that the state of the art works quite well.

~~~
corentin
Just because operating systems all look the same from far enough nowadays
doesn't mean no innovation is possible. As an example, we are slowly moving
from the old file/directory organization of files to a less hierarchical
system (tagging). Or maybe we could destroy the notion of files completely?

The horse and carriage worked quite well, too.

