I've been working on this project on-and-off for a few years now, and thought it was time to show it to the public.
The Gist:
BedquiltDB is a vaguely mongodb-inspired json store built on top of PostgreSQL's jsonb column type. It does the things you'de expect, creating _id fields automatically, creating tables on the first write, etc.
It is implemented as a postgres extension (a horrible lump of PL/PgSQL), and client libraries for python, node and clojure.
While I'm not using it in production for anything, it is pretty well tested, and I had fun making it, which is what really matters :)
It seemed to be the most performant option available, aside from C.
I wouldn't be opposed to re-writing in a more-performant extension language if needed.
EDIT: also, PL/PgSQL is installed by default, and I wanted to see if I could get this project to work without any other dependencies aside from a base PostgerSQL 9.4 installation.
> It even is much faster compared to PL/pgSQL (at least in a project I've been working on lately)
That's good to hear. At the time I had read somewhere that PL/V8 can be slow because it needs to cast/convert data back and forth between javascript and native Postgres representations. It's possible that that assessment is wrong, however.
PL/V8 maintainer here - yes, JSON/JSONB structures are brought through the C++/Javascript membrane, but for the most part it doesn't seem too slow (full disclosure, i wrote a mongo clone on PL/V8 a few years ago).
but, there is also the ability to simply query from inside a function using plv8.execute(), which gives you the ability to use JSONB operators, which are much faster.
Ofc make your own benchmarks before taking PLV8 into consideration.
In my case PL/pgSQL FOR loop was particularly slow; lack of builtin data structures like map or stack (though stack is emulated in V8 via array) caused performance problems – array operations are really slow in PL/pgSQL (compared to V8), instead of using map I've use indexed temporary table (creation of such is quite costly).
Still you might have a point here – PL/pgSQL stores execution plans with the function, so there is a chance of being more efficient here.
It provides a uniform, mongo-like interface over various storage backends, most notably SQL (Postgres, SQLite) via SQLAlchemy, MongoDB itself (well, naturally) and a flat-file based backend that doesn't have any external dependencies.
That's very nice. It reminds me of my own https://github.com/fiatjaf/pgjson, but of course yours is bigger, seems much more feature-complete and battle-tested. I think I'm going to use it for something and see how it goes.
Is there a list of people using it somewhere? Is someone using it for serious things?
I have to mention here postgrest I'm very fan of it, to me it's a nice equilibrium between documents and relational. For complex queries I just create db views that become very easy http requests, returning json "documents".
I don't want to discredit the technical merit of BedquiltDB or similar approaches, but if you already know http and sql what are the benefits of going BedquiltDB over postgrest ?
Since you're a user, I'm going to ask: what do you think as Postgrest with its UI working as a standalone app for boring data management and that kind of stuff? The case I have in mind is one in which a small company wants to get some data into the computer, for no specific reason, and wants to pay for an expensive software. Postgrest for the rescue! Or isn't it a good idea? (I have never used it aside from the demo)
Yes! postgrest + ng-admin for boring CRUD admin panels is perfect.
https://github.com/marmelab/ng-admin
And here a repo that has the http interceptor configured for postrest.
https://github.com/marmelab/ng-admin-postgrest
ngAdmin it's based on Angularjs, but marmelab has a version based on Reactjs too, that I did't have time to test yet.
[edit] Just to clarify, you said "its UI" but postgrest and ng-admin are independent pieces of software, that happened to work very nicely together.
There are more and more relational databases that are beginning to accomplish what MongoDB is (sort of) doing. And ironically, doing it better. They all seem to use PostgreSQL (one that springs to mind is ToroDB).
I'm trying to understand why anyone would pick MongoDB over PostgreSQL these days?
If you have an existing project that uses MongoDB, MongoDB has a lot of unusual behaviors that are difficult to reproduce (and not very well specified). Sorting on array values is tricky, for example.
Err because things like this didn't even exist until 10 minutes ago and aren't production-ready. I don't really understand people who keep saying this. Postgresql supports JSON, but it doesn't have all of the code to go on top of it so that you can use it like mongo. As in, so you can do projects.find or projects.insert instead of a bunch of sql.
Then there are all kinds of other issues related indexes and the differences between mongo and PostgreSQL and so on.
There's an unwritten rule here on Hacker News that says that you are supposed to bash any piece of technology that received praise when it was originally introduced on HN, but is now 3 years old or older. The moment in time when something transitions from new and shiny to old and unfashionable is generally characterized by a comment of the lines of "Why would anyone use x when you can use y?".
Conversely, the more recent a piece of tech is, the more likely it is go into production right away, with people only starting to question their decision after the 3 year new-and-shiny barrier is broken (but usually not because of its faults, but because something newer and shinier can now replace it).
well, in mongo's case, there has been some rather strong criticism leveled at it over time.[1][2][3] some aspects have probably improved, but (as i don't follow mongo) i can't say how proactive and public mongo's owner has been in addressing criticisms.
That would be good advise, except for the fact that a. I couldn't give a crap if MongoDB started here on HN or not, and b. MongoDB is outperformed by ToroDB, and on the upside you can use regular SQL to do analytics if you use ToroDB. Also, MongoDB has a raft of issues that still haven't been addressed.
Edit: incidentally, don't you think it's ironic that the "new and shiny" thing that was supposed to knock relational databases (now over 36 years old) off it's perch has been found to be lacking in a variety of areas, not least of which is that a "schemaless" document store almost always gains a defacto schema?
Seems to me that the ability to join and divide two relations is a fantastic application of relational algebra and not something you can do particularly well in MongoDB at all.
There are different types of indexes that mongo supports within a document that are more complex than what PostgreSQL can do. Or at least that is what I remember last time I checked.
I mean that this persons code is not production-ready. As in there is no chance it emulates all of the features of mongo or has been performance tested etc.
So, I haven't used Mongo, but here's a list of things I found to be bad about jsonb (use case was a project where we need to be able to quickly aggregate over large numbers of items with small-cardinality metadata):
1. Can't sort using a GIN index.
2. Stats are shit with GIN indices - Postgres assumes that every filter has 100 matches, every array has 99 elements or something. Usually this works well, sometimes (specifically in my case) it's terrible. Specifically, it means that if you combine an aggregation over a GIN filter with an inner join, the performance will be awful because postgres picked a nested loop join for expected 100 results instead of a hash join for the actual 50,000.
3. This also means that additional expression indices you add basically aren't getting used, because it'll always appear better to the query planner to use the GIN index first. This includes things like sorting.
4. Because of how jsonb is implemented (basically arrays containing fields ordered by key alphabetical), we discovered that we were taking out more and more fields out of our jsonb field and placing them in other columns, which kinda missed the point of it (but was very good that we could do so, yay relational data model).
5. Sorting can become an epic chore if you don't have stuff like enum types etc, especially when you try to combine it with tsvectors.
So, very happy with Postgres, but don't slam people who don't use Postgres because of insufficient json support - because Postgres' json support although excellent has quirks that will bite you with certain use cases. Very good, but not a panacea.
1. Yeah, the reason for that is that by default a GIN index doesn't support greater than or less than operators. However, you can introduce btree like functionality by using btree_gin and I believe it should start working.
2. That's (probably) occurring because the default statistics target is 100. Just increase them with:
ALTER TABLE gin_index ALTER COLUMN indexed_column SET STATISTICS 1000;
That should hopefully help with all the other issues you are seeing.
P.S. while looking at MongoDB a little further, I discovered that it doesn't support sorting by collation. This was logged as a JIRA bug in 2014, and as of yet I don't believe there is a solution.
Except that ToroDB, as an example, is a drop-in replacement for MongoDB and communicates using the MongoDB wire protocol. So basically, you can use ToroDB to use the MonogoDB API as you would normally.
As an example? It is like the only thing that does it and it is so new that nobody has used it anywhere. It also looks like it just tries to map a document database back into a relational database implementation.
It might be worth taking a look at Citus, which provides a roughly equivalent distribution and sharing story for Postgres. It's also recently been reworked as an extension and now available as open source [1] so that you can run it yourself.
Also, speaking as someone who was an operator of a large Postgres DB and who is now an operator of a large Mongo DB, although Mongo has a nominal lead in this one area for the time being, by going to Mongo you sacrifice a lot (I really can't emphasize this point enough; the distribution/sharding story is really the only thing that Mongo is doing a better job of right now, Postgres is a clear winner in every other way). I'm really looking forward to (and hoping for) extensions like Citus becoming more mature so that optimal choice of database will be obvious under all circumstances.
I didn't see anything about ToroDB dealing with sharding/replication on their site - It looked like it was only client protocol. If it can handle the replication elements, can you help me find a doc for that part?
It's useful to be able to contain quotes in other quotes without escaping them.
i.e. '"hello there"' or "it's nice to see you"
Also, on many projects I've worked on, single quotes were for strings that were not meant to be shown to users (keys etc), and double quotes were for strings that would eventually be shown on a screen. It looks like that's what they're doing here.
> Also, on many projects I've worked on, single quotes were for strings that were not meant to be shown to users (keys etc), and double quotes were for strings that would eventually be shown on a screen. It looks like that's what they're doing here.
Interesting, I've never seen that convention before.
In Ruby, I use single-quoted strings for things that can't possibly, by their nature, contain any string interpolation, to indicate that this is so; and then double-quoted strings for more arbitrary text that either contains interpolation, or might, in the future, get interpolation added. The coder here might be used to Ruby programming.
Though, also, it occurs to me that—since this is a dictionary that will become JSON—its type is {String => Object}. The single-quoted strings sort of read to me as indicating strings that couldn't be any other type; while the double-quoted strings are strings just because strings happen to be the response there, but could be any other object.
I do this too, single quotes for simple/static strings, double quotes for (potentially) complex/dynamic strings. Nothing wrong with that as long as you are consistent, though I suppose it's a very subjective thing.
I wish this was defined by pep8, but sort of glad it's not.
Much python code uses single quotes, with double quotes for doc strings. Except where it is more convenient.
More convenient places include structures when escaping strings. "Joe's easier here's why." Is better than 'Joe\'s easier here\'s why.' Also structures which are JSON structures too (makes copy/pasta easier).
Another commenter articulated this, but my general style was single quotes on 'atoms' (e.g. mostly dictionary keys or set values), double quotes on anything intended to be read by a person. For that reason: to avoid \' all over.
Though with py3, using alt-shift-] on my machine gives me ’, so I can do 'Joe’s easier, here’s why.'
If it becomes API compatible with MongoDB, it could be especially popular with Meteor developers.
I'm developing an app on Meteor now where the CRUD operations for the largest class of users work well with the MongoDB conventions, but I also need to prepare for the other portion of users who will need more complex queries to analyze the data produced by the first portion.
I like this. You can use traditional SQL along side it. It proves NoSql is a special case of SQL. I'd like to see Postgres eventually offer an official NoSql api.
Oracle does have something - CONNECT BY (part of their hierachical queries). Sadly, to do something similar in PostreSQL you need to use a recursive CTE.
I've been working on this project on-and-off for a few years now, and thought it was time to show it to the public.
The Gist:
BedquiltDB is a vaguely mongodb-inspired json store built on top of PostgreSQL's jsonb column type. It does the things you'de expect, creating _id fields automatically, creating tables on the first write, etc.
It is implemented as a postgres extension (a horrible lump of PL/PgSQL), and client libraries for python, node and clojure.
While I'm not using it in production for anything, it is pretty well tested, and I had fun making it, which is what really matters :)
Questions welcome!