
Write an Internet search engine with 200 lines of Ruby code - nickb
http://blog.saush.com/2009/03/write-an-internet-search-engine-with-200-lines-of-ruby-code/
======
sqs
Web search only becomes interesting when you scale it up. At small scale it's
just fetching pages and tying together library code, which is fine but not
particularly interesting or educational. Once you scale up, though, it
involves tons of complicated and fascinating topics (math, distributed
systems, compression, networking, query planning, databases, etc.). You also
start having to make trade-offs among features, which is often a good sign
that a problem is complex enough to be worthy of your time.

------
henning
Did he really need to engage in monkeypatching even in this small example?

Do you really pollute a class as fundamental as String as cavalierly as in
this code?

~~~
teej
There's absolutely no reason. Then again, there's plety of bad code to go
around in this crawler. Just a few nuggets:

* Magic strings with no comments.

* Global variables

* Attempting to write his own URI parser.

------
dhotson
It's actually not that difficult to write a text database from scratch
(without libraries like Lucene). Most text databases are based on inverted
indexes at their core. Implementing basic boolean operators turns out to be
pretty easy to implement as well.

It's one of the exercises I like to do when learning a new language. I'd
recommend giving it a shot if you haven't tried it before.

------
aaronblohowiak
Or, you could use Y! Boss in 3.

<http://github.com/jpignata/bossman-gem/tree/master>

However, this is a good first toe-in-the-water for people who want to get more
into ruby and web-crawling.

