

Ask HN: Data storage question - matthodan

How would you store user-submitted tabular data if you didn't know the column data types of the submitted data prior to it being uploaded?<p>Note: The data must be able to be queried.<p>One idea is to convert each cell in the table of data into a row in a database that holds the value, data type, and position of the cell in the uploaded table.<p>Any other thoughts?
======
mbrubeck
I used to work at blist (now called Socrata) where we had to solve this exact
problem.

You could use a schema-less database - either a document store like CouchDB,
or a schema-less table/column store like Cassandra or Tokyo Tyrant.

Or you could use a standard RDBMS and generate new tables on the fly. (This is
what blist did when I worked there; I think they may have a different storage
model now.) Or implement a column store or tuple store as a layer on top of
the RDBMS, like Infobase:

<http://openlibrary.org/about/tech>

~~~
matthodan
I've heard that the "create tables on the fly" route is bad practice for
reasons of _optimization_ , though, I have to admit I'm not sure what those
reasons are in practice.

------
nostrademons
Read in the file, then issue the appropriate CREATE TABLE commands through
your database connection to dynamically create the appropriate columns. Keep a
table with metadata on the tables you've created (you really only need the
table names, you can issue a DESCRIBE to get everything else, but there may be
a bunch of other metadata you'd like to store, like the original file name &
format, date uploaded, etc.) Query as normal.

I've been down the row-per-cell route before. It seems to be one of those
ideas that everyone comes up with, sounds really clever at first, and is the
wrong solution in 99% of cases. Problem is that it's really hard to get
efficient querying - almost every query requires a full table scan.

------
vyrotek
Windows Azure Table Storage supports schema-less data. You can give it any
sort of entity and store them in something similar to a table but without
schema restrictions. I use it for one of my projects and love it.

<http://www.microsoft.com/azure>

edit - Thought I would also point out you can whatever language you like to
use it. You use a REST API to perform all your queries.

------
johnm
This sounds perfect for a native XML "database". Check out e.g., MarkLogic
Server. That's what we built <http://markmail.org/> on top of.

------
matthodan
Thanks for all of the great comments. I'll need to do a little reading to get
up to speed on these ideas.

------
acangiano
You could use DB2 Express-C (which is free); you can then store the data as
XML and query it at will.

