Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: BedquiltDB – A Mongo-like JSON doc store built on Postgres (bedquiltdb.github.io)
145 points by s_kilk on April 19, 2016 | hide | past | favorite | 65 comments


Hi All,

I've been working on this project on-and-off for a few years now, and thought it was time to show it to the public.

The Gist:

BedquiltDB is a vaguely mongodb-inspired json store built on top of PostgreSQL's jsonb column type. It does the things you'de expect, creating _id fields automatically, creating tables on the first write, etc.

It is implemented as a postgres extension (a horrible lump of PL/PgSQL), and client libraries for python, node and clojure.

While I'm not using it in production for anything, it is pretty well tested, and I had fun making it, which is what really matters :)

Questions welcome!


Any particular reason to choose PL/PgSQL over other extension languages?


It seemed to be the most performant option available, aside from C.

I wouldn't be opposed to re-writing in a more-performant extension language if needed.

EDIT: also, PL/PgSQL is installed by default, and I wanted to see if I could get this project to work without any other dependencies aside from a base PostgerSQL 9.4 installation.


Not requiring any external dependencies is a huge plus in my book.


+23478723497894235787345897435 for this!

I loathe and detest dependencies! Just saying.


But how else can you make sure your pad_left function is well written and widely tested?


Yes, it makes sense to write it in whatever was most performant


Especially when PL/V8 seems natural when working on JSON(b).

It even is much faster compared to PL/pgSQL (at least in a project I've been working on lately).


> It even is much faster compared to PL/pgSQL (at least in a project I've been working on lately)

That's good to hear. At the time I had read somewhere that PL/V8 can be slow because it needs to cast/convert data back and forth between javascript and native Postgres representations. It's possible that that assessment is wrong, however.


PL/V8 maintainer here - yes, JSON/JSONB structures are brought through the C++/Javascript membrane, but for the most part it doesn't seem too slow (full disclosure, i wrote a mongo clone on PL/V8 a few years ago).

but, there is also the ability to simply query from inside a function using plv8.execute(), which gives you the ability to use JSONB operators, which are much faster.


Ofc make your own benchmarks before taking PLV8 into consideration.

In my case PL/pgSQL FOR loop was particularly slow; lack of builtin data structures like map or stack (though stack is emulated in V8 via array) caused performance problems – array operations are really slow in PL/pgSQL (compared to V8), instead of using map I've use indexed temporary table (creation of such is quite costly).

Still you might have a point here – PL/pgSQL stores execution plans with the function, so there is a chance of being more efficient here.


Did you try using plain SQL with unnest()?


Years? But JSONB hasn't been alive for such a long time.


Maybe they started with hstore and replaced that over time.


A while ago I wrote a similar system, called BlitzDB:

https://github.com/adewes/blitzdb

It provides a uniform, mongo-like interface over various storage backends, most notably SQL (Postgres, SQLite) via SQLAlchemy, MongoDB itself (well, naturally) and a flat-file based backend that doesn't have any external dependencies.

Feel free to check it out, feedback is welcome.


That's very nice. It reminds me of my own https://github.com/fiatjaf/pgjson, but of course yours is bigger, seems much more feature-complete and battle-tested. I think I'm going to use it for something and see how it goes.

Is there a list of people using it somewhere? Is someone using it for serious things?


We use it in production at QuantifiedCode, so it's pretty stable but still a bit undocumented (which we will fix soon).


I have to mention here postgrest I'm very fan of it, to me it's a nice equilibrium between documents and relational. For complex queries I just create db views that become very easy http requests, returning json "documents". I don't want to discredit the technical merit of BedquiltDB or similar approaches, but if you already know http and sql what are the benefits of going BedquiltDB over postgrest ?


Since you're a user, I'm going to ask: what do you think as Postgrest with its UI working as a standalone app for boring data management and that kind of stuff? The case I have in mind is one in which a small company wants to get some data into the computer, for no specific reason, and wants to pay for an expensive software. Postgrest for the rescue! Or isn't it a good idea? (I have never used it aside from the demo)


Yes! postgrest + ng-admin for boring CRUD admin panels is perfect. https://github.com/marmelab/ng-admin And here a repo that has the http interceptor configured for postrest. https://github.com/marmelab/ng-admin-postgrest ngAdmin it's based on Angularjs, but marmelab has a version based on Reactjs too, that I did't have time to test yet.

[edit] Just to clarify, you said "its UI" but postgrest and ng-admin are independent pieces of software, that happened to work very nicely together.


Thank you!


There are more and more relational databases that are beginning to accomplish what MongoDB is (sort of) doing. And ironically, doing it better. They all seem to use PostgreSQL (one that springs to mind is ToroDB).

I'm trying to understand why anyone would pick MongoDB over PostgreSQL these days?


If you have an existing project that uses MongoDB, MongoDB has a lot of unusual behaviors that are difficult to reproduce (and not very well specified). Sorting on array values is tricky, for example.

For new projects, I agree with you!


That sounds like a pretty good reason to migrate away from it!


Err because things like this didn't even exist until 10 minutes ago and aren't production-ready. I don't really understand people who keep saying this. Postgresql supports JSON, but it doesn't have all of the code to go on top of it so that you can use it like mongo. As in, so you can do projects.find or projects.insert instead of a bunch of sql.

Then there are all kinds of other issues related indexes and the differences between mongo and PostgreSQL and so on.


There's an unwritten rule here on Hacker News that says that you are supposed to bash any piece of technology that received praise when it was originally introduced on HN, but is now 3 years old or older. The moment in time when something transitions from new and shiny to old and unfashionable is generally characterized by a comment of the lines of "Why would anyone use x when you can use y?".

Conversely, the more recent a piece of tech is, the more likely it is go into production right away, with people only starting to question their decision after the 3 year new-and-shiny barrier is broken (but usually not because of its faults, but because something newer and shinier can now replace it).


well, in mongo's case, there has been some rather strong criticism leveled at it over time.[1][2][3] some aspects have probably improved, but (as i don't follow mongo) i can't say how proactive and public mongo's owner has been in addressing criticisms.

[1] http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-ne...

[2] http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

[3] https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-read...


That would be good advise, except for the fact that a. I couldn't give a crap if MongoDB started here on HN or not, and b. MongoDB is outperformed by ToroDB, and on the upside you can use regular SQL to do analytics if you use ToroDB. Also, MongoDB has a raft of issues that still haven't been addressed.

Edit: incidentally, don't you think it's ironic that the "new and shiny" thing that was supposed to knock relational databases (now over 36 years old) off it's perch has been found to be lacking in a variety of areas, not least of which is that a "schemaless" document store almost always gains a defacto schema?

Seems to me that the ability to join and divide two relations is a fantastic application of relational algebra and not something you can do particularly well in MongoDB at all.


3 years seems to be the time it takes from shiny wearing off to realizing stuff isn't so great after all.


I have never felt like I fit into this place.


What are the issues with indexes?

I would argue that JSON(b) support is now production ready in Postgres 9.5+. You can read, write, update, delete, and it is fast.


There are different types of indexes that mongo supports within a document that are more complex than what PostgreSQL can do. Or at least that is what I remember last time I checked.

I mean that this persons code is not production-ready. As in there is no chance it emulates all of the features of mongo or has been performance tested etc.


In PostgreSQL 9.4, GIN indexes were introduced to allow for indexing components of complex jsonb data types. So yeah, what else is the issue?

http://www.postgresql.org/docs/9.4/static/datatype-json.html


So, I haven't used Mongo, but here's a list of things I found to be bad about jsonb (use case was a project where we need to be able to quickly aggregate over large numbers of items with small-cardinality metadata):

1. Can't sort using a GIN index.

2. Stats are shit with GIN indices - Postgres assumes that every filter has 100 matches, every array has 99 elements or something. Usually this works well, sometimes (specifically in my case) it's terrible. Specifically, it means that if you combine an aggregation over a GIN filter with an inner join, the performance will be awful because postgres picked a nested loop join for expected 100 results instead of a hash join for the actual 50,000.

3. This also means that additional expression indices you add basically aren't getting used, because it'll always appear better to the query planner to use the GIN index first. This includes things like sorting.

4. Because of how jsonb is implemented (basically arrays containing fields ordered by key alphabetical), we discovered that we were taking out more and more fields out of our jsonb field and placing them in other columns, which kinda missed the point of it (but was very good that we could do so, yay relational data model).

5. Sorting can become an epic chore if you don't have stuff like enum types etc, especially when you try to combine it with tsvectors.

So, very happy with Postgres, but don't slam people who don't use Postgres because of insufficient json support - because Postgres' json support although excellent has quirks that will bite you with certain use cases. Very good, but not a panacea.


1. Yeah, the reason for that is that by default a GIN index doesn't support greater than or less than operators. However, you can introduce btree like functionality by using btree_gin and I believe it should start working.

2. That's (probably) occurring because the default statistics target is 100. Just increase them with:

ALTER TABLE gin_index ALTER COLUMN indexed_column SET STATISTICS 1000;

That should hopefully help with all the other issues you are seeing.

P.S. while looking at MongoDB a little further, I discovered that it doesn't support sorting by collation. This was logged as a JIRA bug in 2014, and as of yet I don't believe there is a solution.

https://jira.mongodb.org/browse/SERVER-1920


Except that ToroDB, as an example, is a drop-in replacement for MongoDB and communicates using the MongoDB wire protocol. So basically, you can use ToroDB to use the MonogoDB API as you would normally.


As an example? It is like the only thing that does it and it is so new that nobody has used it anywhere. It also looks like it just tries to map a document database back into a relational database implementation.


It also looks like it just tries to map a document database back into a relational database implementation.

Which is a problem why, precisely?


Eh, Mongo still has a better distribution/sharding/failover story, which really matters for some.


It might be worth taking a look at Citus, which provides a roughly equivalent distribution and sharing story for Postgres. It's also recently been reworked as an extension and now available as open source [1] so that you can run it yourself.

Also, speaking as someone who was an operator of a large Postgres DB and who is now an operator of a large Mongo DB, although Mongo has a nominal lead in this one area for the time being, by going to Mongo you sacrifice a lot (I really can't emphasize this point enough; the distribution/sharding story is really the only thing that Mongo is doing a better job of right now, Postgres is a clear winner in every other way). I'm really looking forward to (and hoping for) extensions like Citus becoming more mature so that optimal choice of database will be obvious under all circumstances.

[1] https://www.citusdata.com/blog/17-ozgun-erdogan/403-citus-un...


Doesn't ToroDB do this as a drop-in replacement though?


I didn't see anything about ToroDB dealing with sharding/replication on their site - It looked like it was only client protocol. If it can handle the replication elements, can you help me find a doc for that part?


Try this presentation:

http://www.slideshare.net/8kdata/torodb-scaling-postgresql-l...

Edit: does this help? ToroDB do certainly need to get better documentation :(

https://github.com/torodb/torodb/wiki/Setting-up-ToroDB-as-h...


From their examples:

  projects.insert({
      '_id':   "BedquiltDB",
      'description': "A ghastly hack.",
      'quality': "pre-alpha",
      'tags': ["json", "postgres", "api"]
  })
Does anyone actually mix quotes like this in python? It's horrible.


It depends.

It's useful to be able to contain quotes in other quotes without escaping them.

i.e. '"hello there"' or "it's nice to see you"

Also, on many projects I've worked on, single quotes were for strings that were not meant to be shown to users (keys etc), and double quotes were for strings that would eventually be shown on a screen. It looks like that's what they're doing here.


> Also, on many projects I've worked on, single quotes were for strings that were not meant to be shown to users (keys etc), and double quotes were for strings that would eventually be shown on a screen. It looks like that's what they're doing here.

Interesting, I've never seen that convention before.


Not sure where it comes from but I do the same. Single for most things like atoms etc and double for text (user facing / error messages / logging).


Well, I do it too. It's just for convenience, you use the apostroph character much more frequently in ordinary language.


I have no idea why I'd do that in Python.

In Ruby, I use single-quoted strings for things that can't possibly, by their nature, contain any string interpolation, to indicate that this is so; and then double-quoted strings for more arbitrary text that either contains interpolation, or might, in the future, get interpolation added. The coder here might be used to Ruby programming.

Though, also, it occurs to me that—since this is a dictionary that will become JSON—its type is {String => Object}. The single-quoted strings sort of read to me as indicating strings that couldn't be any other type; while the double-quoted strings are strings just because strings happen to be the response there, but could be any other object.


Huh, good catch. I guess I default to using single-quotes for dictionary keys, and double quotes for text strings


I do this too, single quotes for simple/static strings, double quotes for (potentially) complex/dynamic strings. Nothing wrong with that as long as you are consistent, though I suppose it's a very subjective thing.


> It's horrible.

i can dig it. works decently as visual articulation.


Single-quotes is the typical way to go. I do know when I'm not linting that I mix and match unintentionally. Easy to forget


I wish this was defined by pep8, but sort of glad it's not.

Much python code uses single quotes, with double quotes for doc strings. Except where it is more convenient.

More convenient places include structures when escaping strings. "Joe's easier here's why." Is better than 'Joe\'s easier here\'s why.' Also structures which are JSON structures too (makes copy/pasta easier).


Another commenter articulated this, but my general style was single quotes on 'atoms' (e.g. mostly dictionary keys or set values), double quotes on anything intended to be read by a person. For that reason: to avoid \' all over.

Though with py3, using alt-shift-] on my machine gives me ’, so I can do 'Joe’s easier, here’s why.'


Or py2 via # -- coding: utf-8 --


this would be killer .. indeed something I would pay for .. if you make it API compatible with Mongodb.

Meaning I can dropin use node-mongo, mongo-ruby, pymongo with your extension.


If it becomes API compatible with MongoDB, it could be especially popular with Meteor developers.

I'm developing an app on Meteor now where the CRUD operations for the largest class of users work well with the MongoDB conventions, but I also need to prepare for the other portion of users who will need more complex queries to analyze the data produced by the first portion.


I like this. You can use traditional SQL along side it. It proves NoSql is a special case of SQL. I'd like to see Postgres eventually offer an official NoSql api.


The biggest issue with the traditional relational model is that it doesn't do trees very well. Aside from that, it's pretty robust.


Yes, that would be hard! Maybe you could use Postgres's newish JSON for that. (Of course, that's not traditional SQL)


Oracle does have something - CONNECT BY (part of their hierachical queries). Sadly, to do something similar in PostreSQL you need to use a recursive CTE.


I've been using Marten which is a document DBA built on PostgreSQL for .net, https://github.com/JasperFx/marten

It's awesome, PostgreSQL is awesome.


Seems like something that would be built soon, by NoSQL enthusiasts, as Postgres implements JSONB with more and more cool features.


I have been looking for something like this for a while, but I didn't find it so I made my own, it has been a lot of fun.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: