

MongoDB Is Abusing JSON - sm_sohan
http://smsohan.com/blog/2013/01/17/abusing-json/

======
mglukhovsky
Here's what this query looks like in RethinkDB (also based on JSON documents):

    
    
      r.table('orders')
       .pluck('cust_id','ord_date','price')
       .groupBy('cust_id','ord_date', r.sum('price')).
       .filter(r.row('reduction').gt(250))
    

We use the hard-coded attribute 'reduction' because _groupBy_ automatically
gets compiled to our distributed map-reduce infrastructure. There is currently
no _as_ command (though it could easily be simulated with _map_ ). I'll add a
GitHub issue for this shortly, since we should add sugar for it.

~~~
greendestiny
The beauty of JSON is you could incredibly easily write such jquery like
chained expression to generate the mongo query. JSON is a good ascii
representation of structured data thats compact, quite readable and reasonably
human editable.

Its not an abuse of JSON to use it as a way of representing queries, but its
probably a shame that Mongo haven't provided a better way of generating
queries.

~~~
sm_sohan
Totally agree with you.

------
cheald
Aggregation is absolutely one of Mongo's weaknesses. It's not great at ad-hoc
aggregation like MySQL or whatnot is, and the fact that it tends to lend
itself to denormalized data makes SQL-style reporting clunky at best.

It does a lot of things better than SQL, too. Consider, for example, the query
"Give me a list of all posts with all of these tags, by any of these authors,
sorted by post date descending"

    
    
        db.posts.find({
          tags: {$all: ["foo", "bar", "baz"]},
          author: {$in: ["Joe", "Jane"]}
        }).sort({post_date: -1})
    

In SQL, you'd end up with something like:

    
    
        SELECT posts.* FROM posts
          INNER JOIN post_tags t1 ON t1.tag = "foo" AND t1.post_id = posts.id
          INNER JOIN post_tags t2 ON t2.tag = "bar" AND t2.post_id = posts.id
          INNER JOIN post_tags t3 ON t3.tag = "baz" AND t3.post_id = posts.id
          WHERE posts.author = "Joe" or post.author = "Jane"
          ORDER BY post_date DESC;
    

Its strength is denormalization; since you can denormalize entire lists or
maps of data into a document, and then index and query on them, you can end up
performing queries that would be ridiculously ugly and tedious in SQL.

~~~
djb_hackernews
not to take away from any of your great points in this post but isn't it the
same as:

    
    
        SELECT posts.* FROM posts
          INNER JOIN post_tags pt ON pt.post_id = posts.id AND pt.tag IN ('foo', 'bar', 'baz')
          WHERE posts.author IN ('Joe', 'Jane')
          ORDER BY post_date DESC;
    

My SQL is rusty, I could be missing something but they seem essentially
equivalent if you use the SQL helpers such as IN.

~~~
cheald
No, because that'll select any post that contains any of those three tags, not
the posts that contain all three tags. AND vs OR.

It's worth noting that I threw the SQL query a bone by denormalizing post_tags
into one table. In a properly relational DB, you'd have a tags table, a posts
table, and a post_tags join table, so the query gets even hairier (or you have
to do two queries).

    
    
        SELECT posts.* FROM posts
          INNER JOIN post_tags pt1 ON pt1.post_id = posts.id
          INNER JOIN tags t1 ON t1.tag = "foo" and pt1.tag_id = t1.id
          INNER JOIN post_tags pt2 ON pt2.post_id = posts.id
          INNER JOIN tags t2 ON t3.tag = "bar" and pt2.tag_id = t2.id
          INNER JOIN post_tags pt3 ON pt3.post_id = posts.id
          INNER JOIN tags t3 ON t3.tag = "baz" and pt3.tag_id = t3.id
          WHERE posts.author = "Joe" or post.author = "Jane"
          ORDER BY post_date DESC;
    

Yikes.

It's _also_ worth noting that this generates a _massive_ temp table to be
sorted, which is very likely going to end up causing you to have to do a
filesort. In practice, you'd probably break this down into 3 queries (forgive
my mixing languages):

    
    
        $tag_ids = SELECT id FROM tags WHERE tag IN ("foo", "bar", "baz")
    
        $post_ids = SELECT posts.id FROM posts
          INNER JOIN post_tags pt1 ON pt1.post_id = posts.id and pt1.tag_id = $tag_ids[0]
          INNER JOIN post_tags pt2 ON pt2.post_id = posts.id and pt2.tag_id = $tag_ids[1]
          INNER JOIN post_tags pt3 ON pt3.post_id = posts.id and pt3.tag_id = $tag_ids[2]
          WHERE posts.author = "Joe" or posts.author = "Jane"
    
        $posts = SELECT posts.* FROM posts WHERE id IN ($post_ids) ORDER BY post_date DESC;
    

Easily doable in both languages, but Mongo's denormalized structure makes this
sort of use case a _ton_ simpler.

~~~
SigmundA

      SELECT * FROM post 
      WHERE id IN
      (
         select post_id from post_tags
         INNER JOIN tags ON post_tag.tag_id = tag.id
         WHERE tag.tag in ("foo","bar","baz")
         GROUP BY post_id
         HAVING COUNT(*) = 3
      )
      AND author IN ("Joe","Jane")
      ORDER BY post_date DESC;
    

Of course normalizing tags is silly as they are natural keys so it should be
as you first stated:

    
    
      SELECT * FROM post 
      WHERE id IN
      (
         select post_id from post_tags
         WHERE tag in ("foo","bar","baz")
         GROUP BY post_id
         HAVING COUNT(*) = 3
      )
      AND author IN ("Joe","Jane")
      ORDER BY post_date DESC;

~~~
cheald
Are subselects still ridiculously inefficient in MySQL? It's been a long while
since I've used them. I do like how those queries read, though.

~~~
SigmundA
Sorry I only use proper DB's like PostgreSQL and MSSQL who optimizer's have no
problems with this ;).

Also what I like about the count approach is you can do "show me all the posts
with at least 2 out of the 3 tags matching" fuzzier search if you desire.

------
mattparlane
I've been through the hassle of programatically piecing together complex SQL
queries, and I'd far rather be able to just put together hashes that represent
my query.

SQL was originally designed so that people who were savvy but not necessarily
developers were able to query databases, but I can't think of the last time my
boss would have wanted to run some random query against our production
database.

~~~
msarchet
This is a daily occurrence at some companies

------
byoung2
There is always <http://querymongo.com/> which will convert SQL to a MongoDB
query.

~~~
cheald
I actually dislike this tool a fair bit, because it gets people to continue
thinking of Mongo as "Mysql plus WEB SCALE" or whatever. It's a totally
different database and doesn't do well with highly-normalized relational-style
data, and the idea of an "automated converter" seems to reinforce the idea
that it's just a drop in for MySQL that automatically solves all your scaling
woes, when that's just utterly and completely false.

Trying to shove a MySQL square into the Mongo triangle just _isn't going to
work out that well_.

~~~
Goopplesoft
You mean SQL not MySQL.

Also that might be one thing it does but it also allows people to transition
from SQL queries they know to mongodb queries. It helps the learning process.

~~~
cheald
No, I mean MySQL. The tool linked is specifically tooled for MySQL queries.

I fully support it as a learning tool, but Mongo's query language isn't all
that hard to pick up if you already know how SQL works. Getting stuck into the
mindset of "just write relational DB-oriented software and then generate some
Mongo queries for it" is an awful practice that is sure to lead to much pain
and suffering.

------
aidos
I'm just finishing off a project that was built using Mongo and I've run into
this as well.

Other gotchas too, like feeling like you can store any old json structure in
your db when you can't.

Dots are reserved because they're part of the query syntax. Fair enough, but
it's pretty crappy to have to unpick a whole data structure because it was
fine until a random bit of UGC was entered (that's where my last fews hours
just went).

It does feel like the data and the query syntax are too crossed over to me.

------
benologist
You picked an ugly mongo query, and there are many. You compared it to a
concise SQL query, and there are many that are not.

MongoDB's limit(x) and skip(y) are a shitload nicer than most of Microsoft's
ideas about pagination. It was only in SQL Server 2012 that they came up with
"OFFSET" instead of "google it"....
[http://stackoverflow.com/questions/2244322/how-to-do-
paginat...](http://stackoverflow.com/questions/2244322/how-to-do-pagination-
in-sql-server-2008)

~~~
corresation
There are extremely few modern cases where pagination at the database layer is
a good approach.

~~~
kogir
How about any cases where you don't want to transfer all the rows you're
skipping over the wire? As in, nearly all cases?

~~~
corresation
You wouldn't be "skipping" those rows because they would be transferred once.
The database is almost always the most constrained part of a solution, and the
various hack pagination techniques are almost always (not 100%. More like 98%)
a naive mistake.

~~~
anko
I am really interested in this argument, because I am currently converting a
mysql based web search to use apache solr. I have been thinking about all the
arguments about pagination a lot.

The thing I don't know about, though, is the "database is almost always the
most constrained part of a solution". On this simple site search, we tend to
notice the apaches taking a lot more cpu than the databases calls. I guess
when our site gets bigger, you'd imagine it is easier to scale out the apaches
than the database (with sharding), but we would still have a lot of room to
improve the mysql layer anyway (memcached for example).

------
samarudge
JSON is used as a query language because it's fast, easy to parse and easy to
generate dynamically. If you have a query interface for users, SQL is probably
a better choice, but Mongo chose JSON for performance reasons.

If you don't like dealing with it directly, use something like MongoEngine so
you're not working with the raw queries, or if having readable, easy to
understand queries is important, use a SQL database.

Everything is a compromise, with Mongo's query language you're sacrificing
readability for performance.

( This is not a comparison of a SQL database to Mongo, just the time it takes
for a SQL engine to parse the query into an execution plan )

~~~
marshray
> JSON is used as a query language

Maybe a little more accurate to say JSON is used as a base layer for the query
language.

JSON is "JavaScript Object Notation". But the "meaning" of the query is in the
objects being denoted, not the notation used to represent them as text. So
comparing Mongo's use of JSON to SQL is apples-to-oranges.

We could encode SQL as JSON too:

    
    
        {"query": "SELECT * FROM things;"}
    

or

    
    
        {"query": [
            {"SELECT": "*"},
            {"FROM": "things"} ] }
    

without affecting the expressive power of the SQL language one bit.

------
taylorbuley
I don't think it's fair to take JSON out of context from the rest of the query
API. Or maybe it's being overly generous, I'm not sure. Either way, this is
the same way object literals are constructed in JavaScript. So is the beef
w/JavaScript?

For me the Mongo shell is just enough so-called "richness" and
"expressiveness" (Try it yourself: <http://try.mongodb.org/>). There's a
certain magic to passing objects to functions (and being able to, say, read
the body of a function by typing that object into the CLI).

------
pestaa
TL;DR: JSON sucks for representing queries.

~~~
davidlumley
It's not JSON, but rather that MongoDB's query language sucks. I'm not
entirely sure how to fix it - perhaps make it slightly more verbose and
meaningful?

~~~
pestaa
Probably there are better ways to express a query in JSON compared to how
MongoDB does it, but I'd take a step back and ask whether a nested map is the
best approach to think about it.

~~~
cheald
I actually prefer Mongo's query language to SQL, because it makes sense to my
programmer brain. It's basically "push operand, push values", which makes a
ton of sense. It's just plain old Polish notation.

------
maxharris
Meh. I count roughly the same number of tokens either way. I really don't
think this is that big of a deal.

------
gumbo
I feel the same about the query language of Mongo. The first goal of a query
language should be ease of use. With mongo having to type all those extra
characters quotes, brakets, square brakets, colons is very annoying. You need
to type a lot to get any reasonable output.

~~~
woah
You might like coffeescript.

    
    
        db.orders.aggregate [
          $group:
            _id:
              cust_id: "$cust_id"
              ord_date: "$ord_date"
    
            total:
              $sum: "$price"
        ,
          $match:
            total:
              $gt: 250
        ]

------
andrewmunsell
MongoDB is a NoSQL-type database, so it wouldn't make sense to have a SQL
query interface... I think they did a good job with the API for not using SQL.

Plus, the API isn't really _abusing_ JSON. It isn't pretty, but it's not
abuse.

~~~
Firehed
People seem to be confusing "NoSQL" and "non-relational". Mongo happens to be
both, but there's nothing fundamental that puts the two together. You may not
get the full capabilities of SQL with non-relational data (JOINs, etc), but
there's no reason that non-relational data stores couldn't parse normal SQL
and execute the appropriate queries.

You can make a relational database that doesn't support the SQL syntax, and
you can use SQL syntax to interact with schemaless data (for added fun, try
throwing JSON in a mysql/postgres text field).

I'd agree with the article saying this is an abuse of JSON, though. It's a
format to represent data; more accurately, potentially-nested key:value
stores, arrays, and scalar types. A query is not data (unless you're one of
those "my database has a 'queries' table" types)

~~~
nslater
How is a query _not_ data?

~~~
Firehed
In the same way that code isn't data.

Yes, it's bytes in memory like any other data, but it's the means rather than
the end. You care about the data, and the query (or code) is what you use to
get it.

------
deepinsand
I think it's great. I've spent too much time parsing SQL strings into well
typed data structures, and you get it for free with Mongo.

~~~
anko
I kinda agree (about the free part), but I just wanted to say that if you are
parsing sql strings you really should get an ORM. Plug in a well tested,
optimised library and never think about parsing sql strings again. It also
makes things easier to test because you get nice separation of concerns.

------
coenhyde
Mongodb queries are much easier to dynamically build than SQL. And this is so
because Mongodb uses JSON for queries.

------
gummydude
have you seen Elasticsearch API?

