Good point, some of these datasets are open/free...others are not. If anyone is going to use a "seemingly-free" data set in open source code, try to get permission under the appropriate license. There is a good overview of the difficulties here: http://en.wikipedia.org/wiki/Open_Data