

Document Stores: Please Give Me A Standard API - lecha
http://kirkwylie.blogspot.com/2009/12/document-stores-please-give-me-standard.html

======
moe
Yes. SQL should be that standard API.

Most document- and KV-stores only support a subset of what can be expressed
with SQL. In fact, I have yet to see a feature that can _not_ be expressed in
SQL.

Thus, the advantages of using SQL are obvious:

    
    
      * SQL is well understood and mature
    
      * People already know SQL
    
      * SQL parsers and clients are dime a dozen
    
      * SQL lends itself reasonably well to interactive 
        use by humans. Much better than typing raw 
        javascript into a console (MongoDB) or having to write
        actual code for every little data-manipulation task
        (most others).
    
      * Existing SQL-infrastructure can be leveraged.
        For example your favorite ORM could easily grow an
        adapter for a "NoSQL"-Db when it's just a slightly 
        different SQL dialect instead of a completely
        distinct API.

~~~
evgen
> SQL should be that standard API.

Unlikely. How will SQL deal with low-level semantics that do not provide
transactions in the way SQL-users understand them but requires the user to
understand the concepts of the CAP principle? How well does SQL deal with
conflicting data being returned from retrieval operation? If SQL is being used
as nothing more than a small set of verbs (a la memcache) then why bother?

I certainly hope that document-store developers do not take the lazy route and
prematurely converge on a standard like SQL. Fortunately it seems that few of
them are particularly interested in this route and the more likely path seems
to be providing multiple interfaces with RESTful HTTP, memcache, and a
programmable/functional interface using JavaScript and/or the language the DB
is coded in (Erlang and Java primarily.) Other possible candidates here are
things like LINQ, Hive's QL, and Pig.

~~~
moe
_Unlikely. How will SQL deal with low-level semantics that do not provide
transactions in the way SQL-users understand them but requires the user to
understand the concepts of the CAP principle?_

I'm not sure what you mean. SQL Databases have varying feature-sets already
and SQL copes just fine. A "NoSQL" backend would simply be yet another type of
database, lining up with e.g. MySQL and Postgres.

It's just a matter of applying the verbs (SELECT, CREATE, INSERT, UPDATE) in a
meaningful way.

The individual stores could even extend the language to a degree, just like
the RDBMS flavours do today. The important wins would be a common baseline
(select foo from bar), a standarized way for human interaction (adhoc queries
in a console are very useful) and a much easier migration path from
traditional RDBMs.

 _How well does SQL deal with conflicting data being returned from retrieval
operation?_

Again I wonder what you mean?

If you're referring to versioned stores, or stores with particular semantics,
then I'd imagine things like that to be simply embedded into the response.
I.e. all tables returned by a select-statement could simply contain an
additional column containing the tuple version. The user can then deal with
that meta-data using the usual, time-tested SQL machinery (select .. where
version=, group by, etc.)

Yes, this means every store needs slightly different treatment. But that's no
different to how we treat RDBMs today - a common baseline with variance in the
more specific features (column-types, indexes, triggers etc.).

 _I certainly hope that document-store developers do not take the lazy route
and prematurely converge on a standard like SQL._

That's a strange wish to have as it only makes life harder for everyone. I
don't see anything bad about SQL (the language) that would prevent it from
taking this role.

Moreover the query language doesn't need to have ties into the underlying
implementation - it's merely a common vocabulary.

SQL seems like a natural choice due to its ubiquity.

LINQ, in my book, is an ORM and operates at a different level of abstraction.
Pig and HiveQL otoh are exactly what I have in mind - SQL dialects.

~~~
evgen
_[Not converging on SQL] only makes life harder for everyone._

No, it makes life harder for people trying to use off-the-shelf ORMs and other
RDBMS tools. I would much rather make life harder for them than make it harder
for the developers of these databases.

 _I don't see anything bad about SQL (the language) that would prevent it from
taking this role._

I, for one, do not think that it is expressive enough to cover all of the
different paradigms being explored. It is based on a row/column view of the
data that may not be appropriate and which may require additional hoops to be
jumped through in order to get the data presentable for the assumptions that
SQL makes. Who would be responsible for these transformations? IMHO it should
be the end-user, but if SQL gains traction any time soon the db developers
will be repeatedly browbeaten by DBAs who can't comprehend why anyone would
not share their viewpoints regarding data structuring and could you please
make things look like the RDBMs systems we are used to kthxbye...

Adding SQL seems to offer very little at this point except future headaches.

~~~
moe
_I, for one, do not think that it is expressive enough to cover all of the
different paradigms being explored._

You are not alone with that opinion, I've seen it a couple times in similar
discussions. Yet I'm missing a concrete example for a piece of functionality
that can't be reasonably wrapped.

Browbeating is irrelevant in either direction. What matters for adoption is a
sane interface for day-to-day use and that includes proper tooling for
interactive use - an area where many of the contenders are still sorely
lacking.

------
iamaleksey
A standard might lead to slower innovation at this point.

------
lecha
The author focuses on document stores, but the same argument can be made to
key/value stores.

It is time for NoSQL community to come up with a standard key/value API. Is
anyone working on it?

As one commented said "JDBC came out ~25 years after SQL was created.". By
today's standards 2.5 years should be about right time to expect a standard :)

~~~
antirez
Can't talk for the other NoSQL players, but for Redis this is more or less
impossible since the key-value business is something like 10% of the current
API. How to cover the other 90%?

Maybe in the future Redis may be able to listen in another port to talk the
memcached protocol, but this is not an high priority feature for now.

What I think people should do is to abstract the interaction with the DB _in
their code_. So that it's just a matter of writing adapters to support new KV
stores, and this adapters can take advantage of peculiarities in specific
stores.

For instance if using Redis one can use Redis Sets to take the list of
friends. When using something different will serialize data as JSON, and so
forth.

~~~
gthank
Since a lot of these different data stores have fairly different semantics,
I'm not sure how much value there is in trying to create a common set of
abstractions in your code.

~~~
antirez
There is a lot of value if the DB API is abstract enough, like:

    
    
        (id) addBookmark(title,url,taglist)
        (bookmark) getBookmark(id)
        (array) searchBookmarksByTag(taglist)
    

And so forth. You can switch from SQL to a NOSQL solution and so forth without
to touch the app but just a single file with all the DB API.

If it's at very low level like Db.get(), DB.set(), ... is still useful if the
application just using a strict common subset of features
(get/set/exists/expire/incr).

