
Announcing MoSQL - nelhage
https://stripe.com/blog/announcing-mosql
======
gfodor
FYI you can store unstructured data in PostgreSQL (and query it) with the
introduction of hstore. So knock one more reason to use MongoDB instead of
PostgreSQL off your list. (Disclaimer: the length of my list to use MongoDB
has always been a constant that is less than one.)

<http://www.postgresql.org/docs/9.1/static/hstore.html>

~~~
mrkurt
Wow, hstore really isn't a great alternative to an actual document DB. The
"better" Postgres option would be a JSON type and functional indexes.

~~~
shawn-butler
There is a JSON type but it just validates content.

HSTORE can be fully indexed (gIST and GIN). Just have to roll your own object
graphs for nesting if that's what you need to do.

I swear I have typed this exact same comment previously. Deja vu, maybe

~~~
mrkurt
JSON type gives you some typed values within the doc, multi-level nesting,
etc. You can add functional indexes
([http://www.postgresql.org/docs/9.1/static/indexes-
expression...](http://www.postgresql.org/docs/9.1/static/indexes-
expressional.html)) to index specific attributes within the JSON, do legit
sorts over values, reasonable array queries, etc. It seems much, much closer
to what Mongo does than anything you can do with hstore.

~~~
shawn-butler
I think you just restated my comment. Do you believe expression indexes do not
apply to HSTORE?

I consider both HSTORE (key/value) and the current JSON type and record
functions are just intermediate steps to a fuller API [0].

[0]: [http://www.postgresql.org/message-
id/50EC971C.3040003@dunsla...](http://www.postgresql.org/message-
id/50EC971C.3040003@dunslane.net)

------
physcab
This is pretty cool but I'm struggling to see what the use cases are, atleast
for analysis. There might be quite a bit of benefits for running application
code that I'm not aware of. With regards to analysis though, their own example
question is "what happened last night?" but then they go on to say that it is
a near real-time data store. Does it matter that it is a real-time mirror
then?

I've always liked the paradigm of doing analysis on "slower" data stores, such
as Hadoop+Hive or Vertica if you have the money. Decoupling analysis tools
from application tools is both convenient and necessary as your organization
and data scales.

~~~
nelhage
(I wrote MoSQL)

PostgreSQL scales surprisingly well for this purpose, and is much nicer for
interactive queries than Hadoop/Hive. We use Impala[1] for some larger
datasets, but Impala is comparatively new, and it's nice to have something as
battle-tested as postgres here.

As for the "why do we need realtime?": In my mind the benefit of a near-
realtime replica is not that you actually often _need_ it, but that it means
you never have to ask the question of "Was this snapshot refreshed recently
enough?", and never end up having to wait several hours for an enormous
dump/load operation, when you realize you _did_ need newer data.

[1] [http://blog.cloudera.com/blog/2012/10/cloudera-impala-
real-t...](http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-
queries-in-apache-hadoop-for-real/)

~~~
ozgune
Out of curiosity, did you guys consider running SQL on MongoDB using Postgres
foreign tables? What were the pros and cons of that approach for your use-
cases?

(In full disclosure, I wrote mongo_fdw for PostgreSQL.)

~~~
nelhage
I actually prototyped our PostgreSQL solution using mongo_fdw (Incidentally, I
throw together Debian packaging here, if you're interested:
<https://github.com/nelhage/mongo_fdw>).

Our experience was that mongo_fdw doesn't (yet?) give postgres enough
information and knobs to plan JOINs efficiently, which is one of the things we
wanted. I got a decent amount of leverage out of using mongo_fdw and then
cloning to native tables using SELECT INTO, though :)

~~~
ozgune
That's neat. :) For mongo_fdw, we're expecting better join performance once we
upgrade to PostgreSQL 9.2 FDW APIs and start collecting table statistics.

I'm sure there are other factors involved for MoSQL, but they are probably
outside the scope of this post. I'd love chat about them offline.

------
dugmartin
Reading the headline I thought they were introducing a SQL like interface to
their API, sort of like FQL for Facebook and I got a little excited. Something
like this to get the email addresses of all your active trial subscribers:

SELECT c.email FROM customers c, subscriptions s WHERE c.subscription_id =
s.id AND s.status = "active" and s.trial_start IS NOT NULL;

(where of course the customer and subscription tables would be a virtual view
on your customers and subscriptions)

~~~
pc
Hm, that could be pretty cool actually. Especially if we also added a REPL for
interactive queries at manage.stripe.com.

~~~
dugmartin
You're welcome Patrick. I'd recommend looking at Antlr4 to parse the
"StripeSQL" commands.

~~~
jaytaylor
Thanks for the Antlr reference; I've been meaning to learn more about it and
you've pushed me to finally start!

------
jabagonuts
At what point do you abandon mongodb and just use postgresql?

~~~
danielpal
At no-point. Stripe is doing it right. They are using the right tool for each
job. Mongo for storage speed etc and then postgres to analyze query etc.

This kind of comment shows how little knowledge you have about NoSQL and SQL.
Is not a SQL vs NoSQL, it's about using the right technology for the job.

~~~
dennis82
are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL
database for a financial services company. Postgres is more than capable of
sustaining the necessary speeds of a startup.

Relational databases were created in the first place to solve these very
problems around transactionality and analytics for finance.

This library is a beautiful example of reinventing the wheel, and otherwise
creating a patchwork of unnecessary - and ultimately brittle - infrastructure.

~~~
gdb
(I work at Stripe.)

Where we use MongoDB, it's not because of speed. PostgreSQL is certainly
capable of fast performance. MongoDB is useful for its ability to log freeform
data as well as for its replication model. (We use sharded MongoDB in a few
places, but mostly use straight replica sets.)

We use MySQL, MongoDB, PostgreSQL, and Impala. They're all useful in different
places.

~~~
monstrado
(Clouderan here)

How are you liking Impala? We just dropped 0.5 release yesterday which
includes the JDBC driver :D!

Edit: Awesome job on the Ruby client, it's great!

~~~
gdb
It's been great -- setup was a bit of work (we're on Ubuntu, so had to build
from source), but once up and running it's allowed us to do lots of ad-hoc
analysis that would have been too hard otherwise.

I've been meaning to write a MoSQL equivalent for our Impala data, but at the
moment we're doing a more traditional ETL.

~~~
odellk
gdb - If you have Impala, Hadoop, and Hive right now. Why use MongoDB instead
of HBase and make it all work in a happy harmony?

------
nodesocket
10gen also has a nice python app which syncs by tailing the MongoDB oplog to
an external source. Most common is Solr.

[https://github.com/10gen-labs/mongo-
connector/tree/master/mo...](https://github.com/10gen-labs/mongo-
connector/tree/master/mongo-connector)

Seems to be high quality, and supports replica sets.

------
e1ven
Very neat project. I can see several use-cases for this where I work- It'd be
nice to have alternatives means of searching through data.

I'd also like to mention a project I've been contributing to, Mongolike

[My fork is at <https://github.com/e1ven/mongolike> , once it's merged
upstream, that version will be the preferred one ;) ]

It implements mongo-like structures on TOP of Postgres. This has allowed me to
support both Mongo and Postgres for a project I'm working on.

------
Ensorceled
Nice. Real businesses need a data warehouse and SQL is the right tool for that
job.

I thank them for releasing this.

~~~
taligent
> Real businesses need a data warehouse and SQL is the right tool for that
> job.

Honestly. I don't think you could be more misinformed if you tried.

Hint: Google "Big Data".

~~~
knightni
...data warehouses in general mostly use SQL, and lots of businesses use data
warehouses successfully. Teradata, Netezza, Oracle, DB2, etc. I'm not sure why
his statement was controversial - SQL's a great language for reporting and
analytics.

~~~
PommeDeTerre
I've had to deal with a lot of NoSQL advocates whose experience with SQL or
relational databases doesn't extend beyond MySQL.

Of course, it's understandable why they have a bad impression of SQL; they've
only ever used one of the most inept implementations around.

Those who are willing to try one of the more mature and sensible relational
database systems usually see quite quickly the value that such systems
provide.

~~~
bermanoid
Beyond a few terabytes of data, Postgres is just as worthless as MySQL, and
every other non-experimental SQL option comes with a "call us" price tag.

If there are production-ready options for biggish data other than NoSQL or
high priced commercial analytics dbs, please share...

~~~
petit_robert
Considering the space requirements of a DB with a properly designed schema, I
have to wonder what on earth you can do that generates beyond a few terabytes
of data, and how you can make any sense out of it.

Also, your comment is rather ambiguous, I certainly hope you're not calling
Postgresql experimental, because that would be laughable; and there are
several examples of multi-terabyte databases using it.

------
Ingaz
I thought that "young" NoSQLs sometime in will got SQL interface.

Look at old NoSQLs: Intersystems Cache got SQL interface, GT.M (in PIP-
framework) also got SQL.

My impression that MongoDB looks a lot like MUMPS storage with globals in
JSON.

------
andrewjshults
Is there currently support for "unrolling" arrays or hashes into tables of
their own? If not, would definitely be interested in helping to add that on
(we use arrays on documents quite a bit, but have run into a number of
situations where a simple SQL query for analysis could have quickly replaced a
bunch of mongo scripts).

~~~
dcraw
I've added that capability to mongo_fdw, which I use for getmetrica.com. I'll
be contributing it back soon (after that 9.2 API conversion). Would be happy
to talk to you about the wrapper or Metrica. Email's in my profile.

~~~
andrewjshults
FYI, the email field from the profile doesn't actually get displayed publicly.
Mine is (username) @gmail.com

~~~
dcraw
Whoops. Okay, emailing.

------
ElGatoVolador
If you need to make a tool(and use twice the amount of storage) to be able to
"query your data" in a SQL manner while using noSQL, it probably means you are
using the wrong tool for the job.

~~~
j-kidd
Actually, it is pretty common to replicate the transactional data into another
data store for analytical purpose. However, using PostgreSQL as the OLAP data
store may not be the wisest move.

------
hgimenez
Author of MoSQL, did you consider just using the MongoDB FTW instead?
<https://github.com/citusdata/mongo_fdw>

~~~
nelhage
(I wrote MoSQL)

I actually played with mongo_fdw. At this point, it's a really cute hack, and
useful for some things, but it doesn't give Postgres enough information and
knobs to really let the query planner work effectively, so it ends up being
really slow for complex things. I do love the concept, though.

~~~
BlackJack
What were your thoughts on MongoConnector? ([https://github.com/10gen-
labs/mongo-connector/tree/master/mo...](https://github.com/10gen-labs/mongo-
connector/tree/master/mongo-connector))

------
bryanjos
I love this idea. I can see myself using MoSQL pretty soon. Does it handle
geospatial data? Can it replicate geospatial data from Mongo to a Geometry
data type in Postgres?

------
danso
Out of curiousity, but what is the rest of Stripe's stack like? Ruby,
apparently, but I'm assuming they don't use any kind of Mongo ORM at all.

------
scragg
Someone should write a client library so you can do ad hoc data aggregation
queries without using SQL. You can call it NoMoSQL :)

------
meaty
Also useful when MongoDB blows chunks because it was a crap architectural
decision and you quickly port your app to raw SQL...

------
arthulia
Can't wait for NoMoSQL

------
Uchikoma
Waiting for BroSQL.

------
govindkabra31
how do you deal with sharded mongo clusters?

~~~
umur
(disclosure: I'm one of the founders at Citus Data)

hey, one way to do that is to use the MongoDB foreign data wrapper - also
mentioned in some of the earlier threads.

mongo_fdw (<https://github.com/citusdata/mongo_fdw>) allows you to run SQL on
MongoDB on a single node. Citus Data allows you to parallelize your SQL
queries across multiple nodes (in this case, multiple MongoDB instances) by
just syncing shard metadata. So you would effectively run SQL on a sharded
mongo cluster without moving the data anywhere else.

another idea could be to use MoSQL to neatly replicate each mongo instance to
a separate PostgreSQL instance, and then use Citus Data to run distributed SQL
queries across the resulting PostgreSQL cluster.

------
dschiptsov
_MongoDB is great for a lot of reasons_ \- record-level locking? multiple
concurrent writes? append-only journals?

I have read than in version 2.x they announce some features, so, it is
greatness?

