I think most of was was written also applies to any normal programming language. You could write this in Python, Ruby, Javascript, Java or C# without any problems. The code would probably be easier to read also. The only special thing is the web page scraping that could be done by a library but the same thinking about scalability and the use of a single computer instead of a hadoop cluster still holds even if you're reading from file systems or databases.