Moreover, iRods is written in C++, which can be an advantage to Node.js at various levels. First of all, because it is easier to provide interoperability with other languages. Second, because many data centers are very conservative (you often see CentOS/RHEL 3/4/5, or even SUSE Linux Enterprise Server), and will not be happy to install the relatively bleeding-edge Node.js stack.
 In practice, a lot of scientific data is provided for non-commercial use. This is often a necessity, because the data was originally provided by a commercial entity, who don't want to provide the same data to competitors. E.g. in NLP, a lot of treebanks are based on news papers. They can often be redistributed freely for non-commercial purposes, but not for commercial purposes.
Dat is basically a generic, pluggable database replication tool.
No it doesn't. Having to use buffers is not easy and the BigNum OpenSSL stuff is slow and limited (only integers). I have personally had a hell of a time supporting PostgreSQL's numeric type in a Node web server. Can it be done with Node? Sure, but it is not easy or fast.
If dat was just for moving buffers around then it would probably be okay, but it is wanting to be the place for data transformations as well which is what concerns me.
Here's a concrete example: A police department in a city hosts an Excel spreadsheet on their web server called Crime-2013.xls. It contains all of the reported crime so far this year and gets updated every night at midnight with all of the new crimes that were reported each day.
Say you wanted to write a web application that showed all of the crime on a map. To download the new data every night you'd have to write a custom program that downloads the .xls file every night at midnight and imports it into your application's MySQL database.
To get the fresh data imported you can simply delete your entire local crime database and re-import all rows from the new .xls file, a process known as a 'kill and fill'.
But the kill and fill method isn't very robust, for a variety of messy reasons. For instance, what if you cleaned up some of the rows in the crime data in your local database after importing it last time? Your edits would get lost.
Another option is a manual merge, where you try and import each and every row of the incoming Excel file one at a time. If the data in the row already exists in the database, skip it. If the row already exists but the incoming data is a new version, overwrite that row. If the row doesn't exist yet, make a whole new row in the database.
The manual merge can be tricky to implement. In your import script you will have to write the logic for how to check if an incoming row already exists in your database. Does the Excel file have its own Crime IDs that you can use to look up existing records, or are you searching for the existing record by other method? Do you assume that the incoming row should completely overwrite the existing row, or do you try to do a full row merge?
At this point the import script is probably a few dozen lines and is very specific to both the police department's data as well as your application's database. If you decide to switch from MySQL to PostgreSQL in the future you will have to revisit this script and re-write major parts of it.
If you have to do things like clean up formatting errors in the Police data, re-project geographic coordinates, or change the data in other ways there is no straightforward way to share those changes publicly. The best case scenario is that you put your import script on GitHub and name it something like 'City-Police-Crime-MySQL-Import' so that other developers that want to consume the crime data in your city won't have to go through all the work that you just went through.
Sadly, this workflow is the state of the art. Open data tools are at a level comparable to source code management before version control.
Why did they need to say that? I don't want a tool that has a preference for something so stupid as academia.
But I'll probably forget this and start loving Dat if it manages to enable this "open data revolution".
"Although Ogden's background is in city government, the Dat team is now squarely focused on the needs of scientists. That's largely because of the Sloan Foundation's focus. 'I don't come from a scientific background and wasn't even thinking about science data,' he says. 'But they convinced me that I should.' He explains that scientists have to deal with many of the same issues with formats and tracking changes that city governments do. Using Dat, Ogden says, much of this complexity could be abstracted away, at least for some users of the data."
I don't think this is a reason for hating the authors or the project. Academic scientists face a lot of the same problems as users of open data, and if the Sloan Foundation wants to pay to solve those problems for science, the project moves forward more quickly, and people using open data in other ways still benefit.
So assuming that an individual fruit tree produces 20,000 calories of edible fruit annually, and there are a couple dozen fruit trees in a typical American city, we will have spent a hundred man hours in app development and testing to turn half a million potential calories into a few thousand, as we inflict the tragedy of the commons on these public resources and encourage people to pick the immature fruit before someone else with the app does.
That idea is so stupid, by next week I expect to see 8 startups with a combined valuation of 80 million dollars all attempting to monetize the 24 fruit trees on public property in Mountain View by selling ads targeting "urban nomads" (aka homeless), or by paying homeless to gather unripe fruit for each other in whatever litecoin or ripple clone is in vogue that week.
On the other hand, there are pretty sound reasons for planting edibles on public spaces instead of merely ornamental plants.