

Aggregating NBA data, PostgreSQL vs MongoDB - sanityinc
http://tapoueh.org/blog/2014/02/17-aggregating-nba-data-PostgreSQL-vs-MongoDB

======
sanityinc
> Let's ignore the [mongo] query format itself, as it's obviously meant to be
> generated by a tool rather than typed by a human being. Here's the same
> query in SQL, with the result this time

Priceless.

~~~
threeseed
Probably easier to read if you write it this way:

    
    
      db.games.aggregate([
      { $match : { date : { $gt : ISODate("1999-08-01T00:00:00Z"), $lt : ISODate("2000-08-01T00:00:00Z") }}},
      { $unwind : '$teams' },
      { $match : { 'teams.won' : 1 }},
      { $group : { _id : '$teams.name', wins : { $sum : 1 }}},
      { $sort : { wins : -1 }},
      { $limit : 5 }
      ]);

------
raphinou
I've migrated data I stored in mongo to a postgresql, but kept it in json
format. The ease of querying this brought is impressive thanks to postgresql's
json support. The only problem I encountered was with nested json objects. I
keep telling myself I should blog about it, to illustrate data collection
framework for smaller shops for who it wouls be overkill to deploy Hadoop or
Cassandra.

~~~
MetaCosm
Our team recently did something similar. We went from MongoDB to PostgreSQL.
We converted most of our well known fields over to proper PostgreSQL fields,
and by using UUIDs in PostgreSQL we still had opaque sharable ids. On top of
that, for developer flexibility, we included a little json dumping ground
column in most of our tables. This allows developers to quickly add in some
data they care about without dealing with migrations and database versioning.
Then, as needed (and only if needed) we move the data out of json to proper
columns, mostly for performance or relational reasons.

------
chris_wot
So if I understand this correctly, the original article [1] shows that the
mongodb queries being used rely on doing projections and selections on the
data, then they apply aggregations.

However, unlike a relational engine which uses a query plan, you need to
actually tell the aggregation engine in which order to process the data.

Doesn't this seem a little... backward? If the dataset is complex enough, the
two things that come to mind are:

1\. There would be a lot of redundancy in the hierarchical dataset.

and

2\. That seems like a lot of futzing around to get the most optimal
aggregation path!

What am I missing here?

p.s. I do tend to agree that the SQL syntax is rather more readable, but then
again I've been looking at SQL for a lot longer than I have JSON style
queries. The first query is not _that_ unreadable!

1\.
[http://thecodebarbarian.wordpress.com/2014/02/14/crunching-3...](http://thecodebarbarian.wordpress.com/2014/02/14/crunching-30-years-
of-nba-data-with-mongodb-aggregation/)

------
taylonr
Just to nitpick, the article says aggregates have been in SQL for 3 or 4
decades, but SQL wasn't an ANSI standard until 1986 (28 years ago, almost 3
decades.) I get that a language exists before it is standardized, so were
aggregates in SQL in the 70s/ early 80s or is this just a hyperbolic
statement?

~~~
thraxil
COUNT() and SUM() are aggregates. I'd be pretty surprised if they, or
equivalents, weren't in very early versions of SQL.

