In so far as search engines go it's certainly worth playing around with Lucene. It's well implemented and you'll learn a lot of what really matters when it comes to indexing and retrieval.
For the text processing (classification, data extraction) side It may also be worth brushing up on your stats (a good excuse to learn R) and checking out Mahout http://lucene.apache.org/mahout/
In addition there are a wealth of online video lectures that may inspire you: http://www.datawrangling.com/hidden-video-courses-in-math-sc... and http://videolectures.net/mlss04_hofmann_irtm/ and http://videolectures.net/Top/Computer_Science/
In so far as search engines go it's certainly worth playing around with Lucene. It's well implemented and you'll learn a lot of what really matters when it comes to indexing and retrieval.
For the text processing (classification, data extraction) side It may also be worth brushing up on your stats (a good excuse to learn R) and checking out Mahout http://lucene.apache.org/mahout/