Hacker News new | past | comments | ask | show | jobs | submit login

It does not really compare, I guess thats also why its not mentioned.

The whole premise of the paper is that data can be analyzed in situ. That means loading from its original place in the original format without any previous transformations. This is in contrast to the traditional approach of database systems that the data has to be loaded first into a database.

This paper describes a way how unprocessed unindexed data can be efficiently used to answer queries using a database system. The novelty of this approach is mostly that they build a index on the fly that can be reused later and examining the idea of directly using the raw files. For efficient loading of CSV files which is also mentioned in the NoDB paper, I think those details are better described in a later paper from TU München[1] that examines this aspect in more detail.

HDT for RDF or Parquet[2] and ORCFiles[3] in the Big Data space, are (binary) formats where the data is already processed and stored in a more efficient format than plain old text CSV files. Creating these files can already be compared to just loading the data into a database. The only difference that format used for data storage is open and can be used by many systems. So its a completely different setting.

Still its an interesting thought to make databases aware of the indexed information in those file formats besides CSV so that these can also be directly used without loading.

[1] http://www.vldb.org/pvldb/vol6/p1702-muehlbauer.pdf

[2] https://github.com/Parquet/parquet-format

[3] https://cwiki.apache.org/confluence/display/Hive/LanguageMan...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: