
Building a MongoDB Clone in Postgres - jerrysievert
http://legitimatesounding.com/blog/building_a_mongodb_clone_in_postgres_part_1.html
======
jeffdavis
Relational databases have a history of absorbing the advantages of other
systems when they come along, particularly changes to the model (cf. object
databases and XML databases). As the author shows, a similar thing will happen
quite quickly for document models.

Architectural changes are slower, but you can also start to see this happening
in postgres with features like unlogged tables (i.e. don't write to the
recovery log for changes to this table) and transaction-controlled async
commit (i.e. don't wait for this transaction to hit disk). More changes are in
the works.

MongoDB will have a chance to have an impact, and will be successful if they
are able to keep innovating. If they just stand still or incrementally
improve, they will be marginalized. Now that Mongo has center stage in the
NoSQL movement, it will be interesting to see what they do next.

~~~
autarch
I'd phrase this differently. Once you implement Mongo in Postgres, what you
have isn't a relational database.

What you _are_ taking advantage of is the incredibly solid underlying
infrastructure of Pg, which provides reliability, scalability, transactions,
replication, etc.

But the data model itself is no longer relational from the application's
perspective.

~~~
jeffdavis
Every data model is just a degenerate special case of every other data model
;)

------
mtkd
I do encourage any SQL user, who hasn't already tried MongoDB, to fire it up
and try it themselves. Mongoid in Ruby is fairly fast to get started.

I've been using SQL since early 90s. For web apps and large collections I've
started using MongoDB more recently. It's one of the most exciting
technologies I've used in a long time.

It takes a while to stop thinking SQL, but once you pass that it's really very
primitive (in a good way) - and frictionless for development. It really fits
what I want to do with web apps in particular.

So far I've not needed to scale it, but I've less apprehension about that than
I used to have about scaling SQL back in the day - the large denormalised
tables I used to build from SQL for performance are now the default.

SQL still has a place and MongoDB is no replacement for complex models, but
give it a try before buying that you can do it all with Postgres tweaks.

~~~
bad_user
Experimenting with new technologies is encouraged, especially since cross-
pollination of ideas happen this way.

However I would actually advise developers to stop and think if they really
need MongoDB or the latest fad, because their current relational database,
such as PostgreSQL, does a mighty fine job for most of their needs.

Why? Because I have never seen angry opinions about PostgreSQL losing people's
data. Or about how the modeling tools and architecture exposed by PostgreSQL
are insufficient for certain problems ... not until you're operating at
Google's scale, and MongoDB won't save you there ;)

    
    
         give it a try before buying that you can do it all 
         with Postgres tweaks
    

My main problem with most NoSQL solutions is that I have to tweak the problems
I have to fit the solution, instead of the other way around. Technologies that
can be tweaked simply rock.

~~~
taligent
I guess if only I "woke up" I would realise that PostgreSQL is the answer to
all my problems. Guess what. PostgreSQL is just another SQL database. It's
definitely one of the best ones but it still stuffers from all the same
issues, limitations and frustrations. Many of which stem not from the database
itself but from the relational modelling and tools to support it.

I use Java + MongoDB and life is significantly better now that I don't have to
worry about the domain model so much. I can have lists, maps etc and can add
classes or make changes seamlessly. It's worth the tradeoffs for me.

~~~
gbog
Ahem, lists and maps seem a very easy thing to do in SQL. Where you get to the
limits of the relational model is when three structure diversity is out of
your hands, for instance a big bunch of parametrized messages.

~~~
taligent
Lists/Maps require extra tables which means more scripts, more migrations,
more backups, more worry.

With MongoDB all I have to do is add Map<String,String> myMap to a Java class
and that's it.

~~~
dstorrs
Actually, Postgres has had an Array data type since at least v8.0.
<http://www.postgresql.org/docs/8.0/static/arrays.html>

But yes, you're right that maps require a join table.

~~~
masklinn
> But yes, you're right that maps require a join table.

Not in postgres: <http://www.postgresql.org/docs/9.1/static/hstore.html>

------
StavrosK
This is (almost) exactly what Goatfish[1] does, except it uses SQLite, and can
create SQL indexes on any arbitrary field. I needed a schemaless, embeddable
store, so I decided to go with that, since Postgres already has the hstore.
It's still very simple and preliminary, but it works, and it's very useful for
prototyping.

I'd like to develop it some more, if people found it useful and started using
it.

[1] <https://github.com/stochastic-technologies/goatfish>

------
roncohen
If you just need a simple key/value, a good way to go is HSTORE in PostgreSQL.
It allows only string key/values, and not complex structures. Use the JSON
datatype if you need lists, nested objects etc.

With HSTORE you get indexes on keys.

~~~
k_bx
I wonder if postgres will adopt BSON instead of JSON and implement all the
querying -- that would be so much better.

p.s.: guys who disagree -- why would you do that? I mean, BSON lets you
quickly skip (embedded) documents you're not interested in, since it stores
their size. Also it has type info and some additional types. So it's just
"better JSON" for storing and navigating, and if postgres wants to have
querying in JSON, they will either take BSON or invent the wheel for something
similar.

------
DEinspanjer
It is interesting, but what I really need in the JSON functionality of PG is
some internal representation that will allow fast and efficient exploration of
the JSON blob within the query. i.e. being able to refer to a single attribute
within the select/where/groupby clauses without having to pay the toll of
serde every time.

~~~
rhizome
Isn't this a category error? If you need to query within serialized data, you
don't want to serialize. Normalize your database for this.

~~~
randomdata
Wouldn't that kind of defeat the purpose of using a document database in the
first place? Being able to throw unstructured data into the system, and then
being able to query on that data once the space is better understood is where
the document databases really shine.

If you are able to start with rigid structure, you, in many cases, could have
just used a relational database to begin with.

~~~
rhizome
_Wouldn't that kind of defeat the purpose of using a document database in the
first place?_

Yes. Use the right tool for the job.

~~~
mgkimsal
Defining 'right', and being able to predict 'right' for future
iterations/versions of an app's life is where that mantra seems to fall down.

I'm not a huge nosql fan right now, given that I sometimes need document-style
schemaless data storage, but I always (eventually) need adhoc reporting and
relational querying capabilities on projects. With that (fore)knowledge, I may
as well always choose a relational db.

------
peterhunt
This is the right way to think about building a "NoSQL" datastore.

I think that schemaless, eventually-consistent data stores have a place and
are useful. I just think that most of the current efforts are throwing away
years of investment in SQL datastores. Rather than thinking of NoSQL as a
brand-new paradigm shift that requires a ground-up reimplementation, we need
to think of it as layer of abstraction on top of MySQL and memcache (or your
preferred setup). Re-implementing all of the work that has gone into these
projects is a bad idea and is contrary to The Unix Way.

Thinking of SQL databases as a storage engine rather than the kitchen sink is
the key to building a scalable system.

------
baq
so, um, yeah... <http://www.postgresql.org/docs/9.1/static/hstore.html>

~~~
jerrysievert
still key/value, no deep packet inspection.

------
nsanch
Unless he's planning to build sharding on postgres too, I think he's missing
the point.

~~~
jerrysievert
while sharding is an important aspect of mongodb, i don't consider it the most
important feature.

~~~
nsanch
I don't know if it's the _most_ important feature, but I wouldn't build a
serious site on top of anything that didn't have some sort of built-in
sharding story.

With postgres you have to roll your own. If you want to bridge the gap from
postgres to mongo, I think that's where you have to start.

~~~
jshen
"but I wouldn't build a serious site on top of anything that didn't have some
sort of built-in sharding story."

There are many serious sites that don't need sharding.

~~~
nsanch
Fair, my statement was overly broad. Sites that are read-only or store blob
data in something like S3 can often avoid sharding for quite a while and rely
on machines to just get bigger over time.

That said, if your site grows in some way you didn't originally anticipate and
you get to a point where you need to shard, but can only do so by changing
data stores, then it's sad.

~~~
jeltz
I would say most sites will do just fine without sharding. You can get very
far by just scaling up with a more expensive database server and caching the
most common read operations. Some of the largest websites in the world do not
need to do more than this.

~~~
taligent
Like ? I don't know of ANY decent sized website that uses a single database
server.

~~~
nsanch
An example is stack overflow. They've been able to scale up instead of scaling
out.

[http://highscalability.com/blog/2009/8/5/stack-overflow-
arch...](http://highscalability.com/blog/2009/8/5/stack-overflow-
architecture.html) is the link I can find right now.

------
NewtonsFolly
Sounds legit.

