Hacker News new | past | comments | ask | show | jobs | submit login
CouchDB 3.0 (couchdb.org)
266 points by ifcologne 38 days ago | hide | past | web | favorite | 154 comments

CouchDB is awesome, full stop.

While it's missing some popularity from MongoDB and having wide adoption of things like mongoose in lots of open source CMS-type projects, it wins for the (i believe) unique take on map / reduce and writing custom javascript view functions that run on every document, letting you really customize the way you can query slice and access parts of your data...

Example: I'm building a document analysis app that does topic + keyword frequency vectorization of a corpus of documents, only a few thousand for now.

I end up with a bunch of documents that have "text": "here is my document text..." and "vector": [ array of floating point values ...].

What I can do with couchdb is store that 20d vector and emit integers of it as a query key:

    var intVectors = doc.vector.map(function(val){
      return Math.floor(val)
    emit(intVectors, 1);
Then I can match an input document's vector (calculated the same as corpus documents), calculate a 'range' of those vectors, pass it as start and end keys, and super quickly get a result from the database of 'here are documents that have vectors similar to your input'...

Super fun, quick and flexible to work with!

> CouchDB is awesome, full stop.

I really like CouchDB. It is wonderful if you want that kind of DB. However, if you want a relational DB (and there are many, many, many reasons to want one), do not pick CouchDB. It works very poorly as a relational DB.

I have a legacy project that didn't quite understand this point and we have ended up paying the price for a document oriented DB in which it is hard to migrate and where we are constantly having to worry about the bandwidth to the view server. And then all the amazing, wonderful features of couch? We don't use a single one :-P Fail all the way around. However, I still like it and instead of retiring it, I've been slowly trying to start using the features that make it awesome, while mitigating some of the problems that have piled up over the years.

We're in a similar boat. My main goal this year is to retire it.

>CouchDB is awesome, full stop.

The problem I had with CouchDB is integrating it into a framework like Rails. CouchDB on its own does so much cool stuff. The "free" HTTP API and client replication via PouchDB are the two huge ones. But it just wasn't smooth enough to get the data out, use it where I wanted, and then save it back.

I had to write my own libs/helpers to interact and make it feel friendly to the developer when I used it with Rails in the past.

But after that, it was very nice.

Did you open source these by chance?


I came here to read about couchdb but ended up reading about grammar for 2 mins. Does HN have an offtopic flag and if so can I by default collapse all offtopic comments?

My favourite technique for remembering this is to say "it is an apostrophe" (because it's is "it is", and has the apostrophe (in case that isn't clear :-) )).

Yours is more amusing than mine, but...

I just remember that (possessive) pronouns don't have apostrophes. "My", "your", "his", "her", "their", and "our" don't have apostrophes, so why would "its"?

Meanwhile, contractions do have apostrophes. "Haven't", "should've", "that's", "you're", and "I'll" have apostrophes, so "it's" (meaning "it is" or "it has") should too.

Yep, thanks.

there's always one...as in there is always one who has to poke people for grammer. sighs.

Personally I wouldn't mind corrections on grammar. But I agree that the comments about grammar, even though in my opinion somewhat useful, still add noise to the discussion as a whole.

I've been thinking about this and, I think having a "grammar patrol" that had the ability to alter other people's comments for grammar on a site like HN or Reddit might be useful. Maybe. Of course there is potential for abuse. So you'd need a second group of people at the very least, who were given the ability to vote on the edits themselves as good or bad.

This second group would be the most difficult to establish and maintain at scale I think. Because whereas you could in theory allow anyone to volunteer as grammar patrol, you'd need to have a trusted group of people for group two. Otherwise, a group of trolls could sign up for both groups and make bad edits and then approve them.

At the end of the day it would probably require a lot more effort than what it's worth. Grammar mistakes are a bit of an eyesore but at the end of the day they usually aren't the end of the world.

Also, language is ever-evolving. So if in 50 years 80% of the population is consistently making a common set of grammar mistakes then that just means that the way to spell those words has changed.

Alternatively, allow people to suggest corrections to a message, a lot like code review tools allow line-by-line feedback or questions on code.

In order to prevent people from abusing it as another way to get their reply seen by everyone, you could make it a private communication between the comment author and the user giving feedback. Or make it possible for other users to see it, but make it hidden by default.

That way, if the author wants to correct their comment, they can, but nobody else has to see the clutter.

We could avoid all that and just have a way of correcting comments (with history, diff, and all that) which would show up upon clicking, say, "show edited version" or whatever. By default it would display the original version, but if someone wishes, one could display the edited version. This way the noise would be hidden. Perhaps to be able to add such corrections to comments, you would have to have at least n_1 karma, be registered for n_2 months, and so forth as a defense mechanism against trolls.

I am sleep deprived. I hope my idea comes across properly. It sounds good to me right now, might not sound good tomorrow. :P It is definitely complicated though and not sure it is worth the effort to implement and maintain.

That would actually be an interesting idea to pursue because I don't think it's even decidable and may require an AI/human to write back to understand. It depends on context that may not even be there. (Think of the absurdity of chain-comment-memes on Reddit, for example. The question may not even be answerable until lots of other people have replied. Now, you may not permit that as "grammar", but it speaks to the complexity of even defining grammar. Words are repurposed, etc. etc.)

Maybe... just maybe memes could be the key to AI /s

I make frequent minor grammar and spelling mistakes. Spoken english would rarely pick them up, nor would the listener be genuinely confused, but written english is different and while our Chomsky grammer parser is just one brain, we hear and read things differently. It is and It(possessive)s are not the same in semantic intent, and even sometimes in spoken flow people have to ask "did you mean it is, or it possesses" in some manner.

Spelling checkers which cannot read do not help. here and hear are both legal in the sentence parse in some ways, so cannot be detected as the wrong form without a higher semantic model. Few systems have this. Therefore, many small mistakes can creep through, apart from the ones I mis-type the system may itself be making them, in ways which our own spellchecking brain do not pick up. Here what I say, Hear what I say...

"Grammar". Sorry!

“Grammar.” Thanks.

In his original message right after "it's" he put "(I believe?)", implying he was unsure about the usage and inviting feedback so I responded to that. If I misread, my bad, but I thought he was specifically asking.

One interesting thing you can do with CouchDB is that you can have a webapp where a user can specify their own database and credentials and it works over HTTP(s). That's pretty unique. I'd love to see a SaaS using CouchDB and their "on-premise" offering just means the user provides their own database. I'm not sure how payment would work though - perhaps some verification proxy?

Firebase is the gold-standard for offline apps (as a service). CouchDB replaces Cloud Firestore, and Keycloak replaces Authentication. I haven't seen OSS equivalents of Cloud Functions, ML Kit, and the other things (e.g. In-App messaging, and Cloud Messaging). It'd be nice to have the entire stack of Firebase bundled as a group of OSS projects, including CouchDB.

Sad to see that per doc access control didn't make it in 3.0. Hopefully it'll be in 3.1.

Cloudant on IBM Cloud is CouchDB API/replication compatible and offers support for Apache CouchDB (1). Also, OpenWhisk integrates nicely with CouchDB/Cloudant and can even be a backing persistence for it (2)

(1) https://www.ibm.com/cloud/blog/announcements/announcing-supp... (2)https://github.com/apache/openwhisk/blob/master/tools/db/REA...

Cloudant is awesome, but it's way too expensive IMHO.

Partitioned dbs are supposed to allow you to query more cheaply, haven't implemented those yet.

They’ve recently shifted their pricing scheme to be more on-demand; before that, you needed to do multi-tenant at very small scale, or buy dedicated clusters.

We have dedicated clusters on Cloudant and they’ve run quite smoothly for many years. Someday we might switch to the on-demand IBM Cloud pricing, but haven’t done it yet.

Send me an email (in profile.) Would love to chat and see what we can do for you.

If you indeed work for Cloudant, please consider trying to convince someone to invest in PouchDB. It looks mostly unmaintained and it would be in IBM's and the community's interest to keep it running!

I'm really can't wait for the per-doc permissions because I'm building something very similar to what you're describing and with CouchDB!focusing on the database and auth side first and then adding functions.

So shameless plug if you're interested in signing up for the alpha: https://www.aspen.cloud

Yeah, I'm still disappointed that the MongoDB API outpaced the CouchDB Replication Protocol in general adoption. As nice as Cloudant can be some of the time, I know that my IT group would be a lot happier if we could use Cosmos DB (and/or if Cloudant would just directly support Azure data centers again).

Every now and again I wonder if I could implement the CouchDB Replication Protocol on top of Cosmos DB with a presumably hairy ball of Azure Functions and hoping someone beats me to needing that to exist and scratches that itch for me. (Cosmos DB's changes feed is so almost right for the job it hurts because it sounds like it should be easy, and yet I assume it won't be.)

For a Cloud Functions like project, OpenFaas seems like a promising project that I’ve been watching but have not yet had the chance to use.

I didn't understand. You mean it's unique to work over http(s)?

I mean your CouchDB instance itself is represented by a host and port and your application's data could be stored there and a native HTTP-based API to access said data. This is contrasted to most where you would need a driver and it's accessible only in the "back-end".

> itself is represented by a host and port and your application's data could be stored there

All databases are represented by a host and a port. I think you mean CouchDB offers a HTTP-based API that allows queries to be run without requiring a database-specific library and that because it's HTTP, it can be accessed via a browser.

That was my point exactly. Http API us very cool but very far from being unique to couchdb (see InfluxDB, ClickHouse, Prometheus, etc.)

Haha, yes, exactly. I omitted the most important part - the [HTTP-based API].

To take that further, I really like the idea of running something locally like PouchDB and then letting it sync with a remote CouchDB using the replication protocol.

That's where I fell into love with the CouchDB world. Building offline-first databases in PouchDB, and letting that sync to any HTTP address that speaks the replication protocol is sometimes a dream. In practice there are so many hurdles, sadly. (CORS, CSPs, firewalls, not enough things speaking the replication protocol that should, ...)

I’m just getting into pouchdb and am liking it. I love the idea, for sure. I ran into a replication issue running through a proxy that had something to do with sessions being cached, but that was more the fault of the proxy.

My biggest current concern is client side search and size. I’ve been developing a private journalling/notes app with some fairly particular bells and whistles for my own personal use. Although it’s mostly just text, I want to store a lot of text. I would much prefer the search not happen on the server, as I’d like to encrypt all data that hits the server.

Have you used pouchdb quick search? If so, in your experience, can it handle full text search on about 1,000-5,000 documents with about 10kb worth of text each?

Ideally I’d also like to store data uri for png sketches, and maybe photos. But I know photos especially would balloon the database size quite a bit/am worried I’d hit client side storage limits extremely quickly (I think I read some mobile devices have a 50mb limit, but I haven’t researched it that thoroughly yet)

I've not tried quick search. For the most part in the applications I've worked on I've just relied on the main primary key index (the _id field) for most lookups.

Generally I'm using a `folder/structure/ULID` approach to keys and its really easy with start_key and end_key on allDocs to grab an entire "folder" at a time. I've had some pretty large "folders" and not seen too much trouble. At this point the biggest application I worked on pulls a lot of folders into Redux on startup and so far (knock on wood) performance seems strong. (ULIDs [1] are similar to GUIDs but are timestamp ordered lexicographically so synchronizations leave a stable sort within the folder when just pulling by _id order.)

At least as far as my queries have been and what my applications needs have been, PouchDB is as fast or faster than the equivalent server-side queries (accounting for HTTPS time of flight), especially now that all modern browsers have good IndexedDB support. (There were some performance concerns I had previously when things fell back to WebSQL or worse, such as various iOS IndexedDB polyfills built on top of bad WebSQL polyfill implementations, and also a brief attempt that did not go well to use Couchbase Mobile on iOS only.)

Photos have been the bane of my applications' existence, but not for client-side reasons. I had PouchDB on top of IndexedDB handling hundreds of photos without breaking a sweat and those size limits all have nice opt-ins permission dialogs for IndexedDB if you exceed them. Where I found all of the pain in working with photos was server side. CouchDB supports binary attachments, but the Replication Protocol is really dumb at handling them. Trying to replicate/synchronize photos was always filled with HTTP timeouts due to hideously bloated JSON requests (because things often get serialized as Base64), to the point where I was restricting PouchDB to only synchronize a single document at a time (and that was painfully slow). Binary attachments would balloon CouchDB's own B-Tree files badly and its homegrown database engine is not great with that (sharding in 3.0 would help, presumably). Other replication protocol servers had their own interesting limits on binary attachments; Couchbase in my tests didn't handle them well either and Cloudant turned out to have attachment size limits that weren't obvious and would result in errors, though at least their documentation also kindly pointed out that Cloudant was not intended to be a good Blob store and recommended against using binary attachments (despite CouchDB "supporting" them). (It sounds like the proposed move to FoundationDB in CouchDB 4.0 would also hugely shake up the binary attachment game. The 8 MB document limit already eliminates some of the photos I was seeing from iOS/Android cameras.)

I'd imagine you'd have all the same replication problems with large data URIs (as it was the Base64 encoding during transfers that seemed the biggest trouble), without the benefits of how well PouchDB handles binary attachments (because of how well the browsers today have optimized IndexedDB handle binary Blobs).

The approach I've been slowly moving towards is using `_local` documents (which don't replicate) with attached photos in PouchDB, metadata documents that do replicate with name, date, captions, ULID, resource paths/bucket IDs (and comments or whatever else makes sense) and a Blurhash [2] so there's at least a placeholder to show when photos haven't replicated, and side-banding photo replication to some other Blob storage option (S3 or Azure Storage). It's somewhat disappointing to need two entirely different replication paths (and have to secure both) and multiple storage systems in play, but I haven't found a better approach.

[1] https://github.com/ulid/spec

[2] https://blurha.sh/

Awesome, really appreciate the detailed reply! Lots of info in there.

You’ve convinced me I should store photos separately; will probably do something similar to what it sounds like you did and have pouch just store a pointer either to S3 or some other backend for file storage. I don’t anticipate photos being that important a feature for me, so having them be disabled when offline might be acceptable, otherwise I’ll probably opt for storing them locally and only syncing metadata/pulling stuff from s3 when metadata changes indicate a change to the file like you’re doing.

Cloudant off IBM cloud. Full disclosure; I utilized it to support the application layer on IBM cloud.

I'm doing this with BigQuery for Logflare (logflare.app).

I built two products on CouchDB 1.x starting in 2010 ... version three is another amazing step forward! For my more recent projects, I've replaced CouchDB with clustered PostgreSQL using JSON columns as I really enjoy the ability to write SQL queries for against the JSON and to use the built-in full-text search capabilities. I think both CouchDB and clustered PostgreSQL are amazing tools and it's nice to be able to choose between them as needed. The best advice I've heard is to choose CouchDB when you know your queries ahead of time and the data "schema"[1] is variable and choose PostgreSQL when you know your data ahead of time and your queries are variable.

[1] In this case, a JSON document but either with a JSON-schema or marshaled/unmarshaled into a strict type.

I've gotten the impression that clustered Postgres still isn't very straightforward to run. Do you mind elaborating on your ideal setup and point to some resources?


It's not straightforward at all but it's better than it was five years ago ... you can use something more "meta" like SymmetricDS (https://www.symmetricds.org/). I haven't used it personally but a dirt simple way to get an HA, scalable PostgreSQL instance would be to use Amazon's Aurora DB.

JSON Schema has been a big benefit for our use case. Our iOS, Android, and web app all pull in a schema from one repo, which serves up that schema via Cocoapods, Gradle, or npm. We built it years ago and it’s worked smoothly ever since.

CouchDB is awesome and feels way ahead of its time. Its design docs are extremely powerful, to the point that you can build entire web apps with CouchDB alone (not that that's recommended anymore). Plus with PouchDB you can create offline-first apps that sync with a remote CouchDB instance.

If you like PouchDB, you should also check out RxDB. It is build on top of PouchDB and is optimised for realtime-applications where you can subscribe to queries and stuff.


Ahead of its time?

PL/SQL also allowed (and allows) you to create entire apps within a database.

I haven't heard of CouchDB in quite some time, great to see it still improving.

I used it years ago when I was experimenting with Ionic[0]. What appealed to me was that I could use CouchDB (cloud) and PouchDB[1] (device) to and have a replicated copy of the data locally. The application was used in areas where network connection was very limited. Using this strategy I was able to ensure the mobile devices data was as recent as the last time it had a network connection.

[0] - https://ionicframework.com/

[1] - https://pouchdb.com/

I can confirm that the stack still works well :) We've been developing a cross-platform app for the German market - therefore the need of offline capability - since 2017 and never had any real issues with Pouch/Couch, that part just worked. The upgrade from Ionic 3 to 4 was was quite painful though.

For user authentication I've forked the nowadays unmaintained superlogin package [1], which still does a great job when keeping the dependencies up to date.

[1] https://github.com/LyteFM/superlogin

Reducing max document size from 4GB down to 8MB seems hyper-restrictive.

For those interested, looks like the guts of CouchDB are going to be swapped out for FoundationDB.


8MB is just the default, you can switch it back to 4GB if you want, but you won't have an easy time switching to 4.0 due to the 8MB limit imposed by FoundationDB.

It seems really odd to me that they are changing to an 8MB limit. "Document storage database" that can only hold 8MB per document. It can't hold full text output for a large documents... What am I missing here?

In CouchDB 4.0 the backend of CouchDB will be switching to FoundationDB, which will have a 8MB limit, so they're preemptively making the change. You can remove the limit now if you'd like.

Yes, but how is a 8mb document size limit a good thing?

FoundationDB needs is a consistent (as per CAP) database, to make that work in a distributed fashion, there are limits on distributed transactions in size and time. CouchDB needs to adopt those in order to move to FoundationDB.

The 3.x series will be supported for a long time for folks who can’t move up. Until then, as others have stated, you can up the default limit.

In CouchDB terms, you’d store data like this in attachments, not as raw JSON.

If you're trying to store single GB documents in couch, you're doing it wrong... Unless those are binaries you can usually fragment data logically across many documents, then write custom views to aggregate however you need to.

Updates on huge docs would be painful!

I agree that 4GB is more than a sane person should probably be using. I don't think that going above 8MB is very hard or uncommon though. If I'm going to spread everything across many different documents and document types and then join them all together, I then have to make a case for why I'm still choosing Couch over a RDBMS.

I agree in large with your point that multi-GB documents is perhaps excessive but this does create a heck of a migration problem for a lot of users that aren't even close to 4GB.

It does, but keep in mind for 3.0 this is a change to the default settings, not a hard cap. The idea is to give people lots of warning and time to do any migrations necessary prior to 4.0.

Couch/Pouch combination is really slow when document +attachment are too large. Based on my experience, Practical limit was more like 20-30Mb, and I'd suggest not even getting close to that. 8Mb simply recognizes reality.

That’s another great point, an exactly why we are doing this. It is a best practice already.

At least since 2.0 (haven't used Couch before), the docs have always recommended to only use small documents in a CouchDB and to use an external storage for large files.

The IBM Cloudant free tier only allows Doxs up to 1 MB.

So this doesn't really come as a surprise or feel hyper restrive to me.

That still exists? I thought apple bought them and shuttered them.

FoundationDB got re-released as an open source project: https://www.foundationdb.org/blog/foundationdb-is-open-sourc...

> – Updated to modern JavaScript engine SpiderMonkey 60

Yes ^^ !

Congrats to the team. These people are some of the nicest and most supportive devs I know of in the OSS community (or whatev').

They show a great deal of patience in their slack channel and are always welcoming and answering stupid questions from idiots like me.


At this point why would you use CouchDB over something like MongoDB?

Seriously asking...

Over the past 5 years MongoDB has gotten a great storage engine, transactions, distributed transactions, multi master replication, first class change streams and is very very solid as a foundational piece of infrastructure you can rely on while CouchDB has languished. I can’t imagine reaching for it in my tool belt when I need a document store over MongoDB but I’m obviously biased so I’m wondering if there is a lot I’m missing.

Obviously it’s cool from a more open source databases standpoint — I love learning about how things are built and evolve over time.

1. MongoDB is no longer Open Source.

2. MongoDB's design has historically been terrible; and, from my current experience with clients, is still a source of 'WTF's.

The main reason most people use CouchDB is because of the HTTP API and offline support with Couchbase Mobile and PouchDB. Doesn't CouchDB have most of those things already from 2.3?

I don't think couchbase has couchdb in mind for the mobile client anymore

Correct. The newest version of CouchBase mobile no longer supports CouchDB as a replication target. It can still be accomplish with the CouchBase Sync Gateway, but get complicated quickly.

I evaluated Couchbase mobile about a year ago and found although it worked well once setup there was a lot of overhead and the docs seemed a little all over the place and the fact that you can't also use the same DB on the web anymore with PouchDB meant I ultimately dropped it. It's a shame because there isn't really anything open source / self hosted like it for mobile.

MongoDB recently bought Realm which is an amazing mobile database with first class replication, so if I was starting a new project that also needs mobile I would definitely go with MongoDB.

Especially if you, jinjin2, are either a bot, or marketing shim??

I just verified every single post of you on hackers news in last 12 months, and every one, in different DB discussions are doing the some shamefull plug about Realm.

E v e r y o n e.

And not a single disclaimer from you, not to mention some of your posts (like here, what janl pointed) are simply not true.

Disgracefull. I'm leaving this comment for anybody in the future doing check on you to notice this.

last time I looked, realm did “last write wins” “conflict resolution”, which is just marketing speech for “randomly losing customer data”, which is something CouchDB decidedly works very hard to never ever do ;)

MongoDB has _suspiciously_ amazing SEO and marketing. CouchDB's by contrast is awful.

As silly as that sounds as a reason to choose CouchDB it demonstrates where the respective company's priorities lie.

MongoDB raised $311M in funding and is now a publicly traded US corporation: https://www.crunchbase.com/organization/mongodb-inc#section-...

CouchDB is a community-run Apache Software Foundation project (that has corporate contributors as well as individual contributors) at a much lower scale than MongoDB.

Of course MongoDB’s marketing is better, they are spending a lot of money on it. If you’d like to help out with CouchDB’s marketing, we can always use another helping hand :)

Similar story with RethinkDB which was actually a really nice database. I know it's not dead but I can't imagine starting a new project with it.

MongoDB is a an actual for-profit company, so of course it has marketing. CouchDB is an open source apache project. You are comparing apples to oranges.

Its not just CouchDB, I did a lot of searching a little while back and I generally just don't like how tightly they've wrapped up almost all of the results. If you go by searching alone its overwhelmingly MongoDB positive to such an extent that its hard to believe its organic.

It's definitely impressive marketing but when I'm deciding on which tool to use that arguably works against them as opposed to for them.

So you are choosing software on it's technical merit and not just on popularity and hype!? What if there are no breaking changes every third month, what are you gonna do? Solve actual problems!? :P

It sounds to me like they are choosing their software based off their marketing techniques.

Interesting. So they are catching a ride in the hype train by adopting the hyped technology... So this is why the tech stack is picked by and marketed to the business/directors rather then engineering.

How tightly they wrapped up the results? Perhaps you could give examples?

Mongo reminds me of Tesla - early products were shoddy but had incredible hype.

"It's incredibly fast!" "It's web scale!" "It'll be able to drive itself cross country in a couple of months!"

And I think the unfounded/exaggerated hype bothered a lot of us, but like Tesla, the actual product seems to have improved quite a lot as they've had time and resources to throw at it. So sure, some of us remember Teslas and MongoDB from 2013 and scoff, but the current reality is much different.

CouchDB is licensed under Apache License 2.0, while MongoDB uses SSPL, which was rejected by the OSI https://www.zdnet.com/article/mongodb-open-source-server-sid...

The link explains that CouchDB can have replicas on mobile phones and websites, meaning clients don't always have to be connected to the internet.

> The Couch Replication Protocol lets your data flow seamlessly between server clusters to mobile phones and web browsers, enabling a compelling offline-first user-experience

Does MongoDB have multi master replication or the classic election of one master from a pool of candidates ?

CouchDB has another pattern, each master is really a master and you can have live replication but also offline replication. You can connect two clusters every new moon and they will synchronize. For sure the clients may have to deal with potential conflicts but in practice it's very neat and that's what makes couchdb worth it if you need this feature.

MongoDB does replica sets and sharding. As far as I know it doesn't support multi-master architectures or data syncing. Even the Atlas-style global data distribution is sharding, right?

They are tools that solve different problems, imo. In CAP theorem[0] you have 3 groups of DBs.

CA databases: SQL databases that are hard to scale ("partition") but are always consistent and available.

CP databases: MongoDB style databases that are consistent and partition tolerant, but trade availability (sometimes your queries will fail during high load).

AP databases: CouchDB style databases. They are always available and are partition tolerant, but you may be querying stale data.

[0]: https://en.wikipedia.org/wiki/CAP_theorem?wprov=sfla1

> At this point why would you use CouchDB over something like MongoDB?

They are very different databases. But since they have come up are around the same time and because they look very similar on the surface, you might think you chose between them.

But when you look more closely at detail decisions on the technical details, at almost every point, where CouchDB goes one way, Mongo went the other way.

I’m not saying either decisions are better or worse, it’s just that they are very different database that you should evaluate on their merits, not just superficially.

Has anyone here tried to use couchdb directly within an elixir/erlang OTP application? As like, „mix install“? Would kill for couchdb as a library!

Congrats to the whole team.

Looking forward to CouchDB 4.0/FoundationDB goodies. Do we have any roadmap details on this.

Oh wow, this is great news. I though the project was effectively long dead. Is there a new/up-to-date "couchapp" too?

> Default installations are now secure and locked down.

More good news!

Anyone have recent experience with couchdb?

I see the (quickstart) docs use plain http - should one terminate ssl in front, eg with a recent version of haproxy?

I wish couch was used whenever users ask for an app to "sync to Dropbox". I don't know if this changes with 3.0 but couch is naturally database per user, took me five minutes to install on my rpi with docker, very good admin interface, the database is the frontend (no driver or separate process), and let's the application layer handle conflicts.

We use https://github.com/jo/couchdb-bootstrap successfully.

CouchDB does SSL natively, but we do recommend HAProxy.

For anyone else looking to quickstart but on Kube, https://operatorhub.io/operator/couchdb-operator. Should add 3.0 soon.

I'm surprised to see so much love for CouchDB in this thread. I don't think it's been widely adopted in corporate america and has lost the war to MongoDB closed source or not.

I joined a company where it's being used backing a mobile app with couch/pouch in production. We can't wait to get off of it. Writes are slow. Reads are worse. Having a DB per user is a scaling and backup nightmare. If you run into any issues, it's a ghost town.

I'm glad the CouchDB Team is forging ahead, but who is really using this database?

Would you be willing to say more? Inquiring minds want to know.

I sadly can’t name names, but rest assured the fortune 500 is heavily involved.

OTOH, publicly known big companies using CouchDB include Apple and IBM.

And I worked on a team that used CouchDB’s offline capability in the 2015 Ebola crisis in West Africa. That work also lead to the first Ebola vaccine ever.

That’s why we do CouchDB :)

CouchDB/PouchDB looks very promising for offline first apps, but I can’t understand how to restrict bad clients. Client potentially could insert document of huge size or execute expensive query and degrade experience of other clients on the same server. Is it any way to prevent this?

A couple ways:

One you implement validation functions [1] on user databases to control what kind of data can be inserted into couch. These functions can only be changed by database admins, not users, so can act as a security mechanism controlling what goes in.

As mentioned by others you can also implement a proxy. This doesn't have to interfere with sync functionality, you just have to make sure you proxy all the endpoints in the replication protocol [2]. Envoy [3] is one such proxy that essentially applies document level permissions to a CouchDB database without interfering with sync.

If the goal is just to limit document size, or throttle clients trying to hammer the API, this doesn't even have to be a custom proxy, and reverse proxy with the needed control knobs (such as NGINX) will do. You can of course combine this with validation functions, using validations to ensure the everything that comes in is the right "shape" and using NGINX and it's ilk to apply throttling and sane request limits.

At scale there's a decent chance you want a proxy in front of your Couch instance anyway, since Couch is truly multi-master, meaning you probably want to balance your clients across all your nodes anyway.

[1] https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate... [2] https://docs.couchdb.org/en/stable/replication/protocol.html... [3] https://github.com/cloudant-labs/envoy

Thank you for pointing at validation, I'll check it. It's not completely clear what is it possible to limit not only particular document but database, or how to handle conflict if document changed on pouch, but rejected on couchdb server.

I'm not sure about current time but previously it was a problem that couchdb file grow until some limit on filesystem and couchdb just crashed.

Start of the envoy readme: it's not battle tested or supported in any way. Also it doesn't do any validation apart from limiting permissions for different users.

It's easier to reimplement couchdb than to create smart proxy that will estimate is this query expensive or not.

I'm not saying about rate-limiting proxy or load-balancing to different backends which could be implemented on nginx or something else.

I'm not clear what you mean by limiting "not only particular document but database". As far as a document changing in pouch and rejected on the server, that's one of two scenarios.

1) The client you wrote is bugged and generated bad data. This scenarios can occur just as easily using Postgres and an application server. What does your app server do if a client tries to send bad data? (Answer: Whatever you told it to do. Most likely throwing a 500 when your databases refuses the incoming data.)

As for what will happen when pouch syncs to couch the server will let everything else sync, but not the bad document. The return value from the API call will tell you what documents didn't sync.

2) Someone is intentionally trying to shove bad data into your database. In this case it's worked as advertised and rejected the bad data. What do you care if a malicious client breaks?

What kind of "expensive" query are you envisioning? Mango queries don't support joins, and only simple equality filters, so in general the worst thing someone could do is send a query that doesn't use an index, but why are you letting the client query the server in the first place? Just have the client sync and query client side. Or don't allow access the the _find endpoint and restrict them to the map/reduce view you handwrote.

If you must let them send arbitrary queries (which to me implies a relatively trusted user, but let's pretend their not), then run the query with a limit of 1 or 0, and examine the execution stats to see if they are using an index, and check their query to see if their limit is reasonable. But at this point you've now entered into a scenario that's going to be very difficult with a custom API too.

> I’m not clear what you mean by limiting "not only particular document but database".

I’ve limited document size to 10mb and ratelimited updates to 10 per second. Client starts to update document with random data 10 requests per second. As far as I understand couch stores all versions at least some time. This means that this one client could fill space on my server 100mb/s. There is no such issues with postgress, and no one allow clients execute raw queries on database without any application server. Document only 10mb but database is huge.

> What kind of "expensive" query are you envisioning?

I have never used couch, so I don’t know what could be expensive. May be some lookup without index or something like this.

Sorry for my ignorance, is it true that if I limit couch only to replication it will not be any not indexed lookups?

Looks like implement secure system with couch is very hard but I can’t find any best practices, mostly only authentication and basic validation.

> I’ve limited document size to 10mb and rate-limited updates to 10 per second. Client starts to update document with random data 10 requests per second. As far as I understand couch stores all versions at least some time. This means that this one client could fill space on my server 100mb/s. There is no such issues with PostgreSQL, and no one allow clients execute raw queries on database without any application server. Document only 10mb but database is huge.

Ah! Now we are getting somewhere! Your concerned about someone filling your disk.

OK, let's modify your scenario a little. Instead of updating an existing document, they create a new document. This a malicious client, why do updates that'll get cleaned up in a few minutes when I can make it permanent?

So, CouchDB allows these writes, and now your disk is full.

What does Postgres with a custom API do? Allows these writes, and now your disk is full.

Your allowing 10MB documents because that makes sense for your application right? So your Postgres table is going to have a binary column or some other column meant to hold bulk data, and your API is going to accept it.

If it doesn't make sense, lower the max document size. Apply validations to limit what fields can be written to, and how big they can be. In Postgres this is called your "schema". Couch being "schemaless", it's now your validation function. Couch is no different from any other schemaless database such as Mongo, RethinkDB and FoundationDB in this regard.

Also your rate limiting here is weak. If I can post to your sever at 100Mb/s second, I can saturate a 1GB link with only 10 clients. Doesn't matter if you reject my posts, if I can send them to the server, I can DOS you pretty easily.

The main thing Postgres gives you here is that it requires you to define your schema upfront (unless you use JSON columns, in which case it joins the schemaless club above). Couch will happily let you not, in which case someone wants to write a record of their car maintenance into your recipe book app? Couch is good with that. But take a step back. what actually stops them from putting that in the "description" column of your Postgres recipe app? Not much. So you have to think about what's important. Do I actually need to make sure these are all the same "shape"? If so I need a validation function. If I can just shrug and say "garbage in, garbage out", then I just need controls around how much data they can insert, but hey, I needed that for Postgres anyway.

> Sorry for my ignorance, is it true that if I limit couch only to replication it will not be any not indexed lookups?

Correct (enough). The entirety of CouchDB is built around efficient replication. While it's not going to use a formal "index" getting all of the changes after a specific rev is an efficient operation.

It’s trivial to limit number of created documents in postgres, couchdb or application server though validation, I’m talking about updating document not creating new. In posgres if I update 1mb document used space will not always grow. In couch db situation is different. In case of relation db you have application server with custom logic and validations, couchdb from other side is accessible from outsize.

My idea that it’s very hard to create safe couchdb based system and most recommendations limited to setup nginx proxy and authenticate users which is not enough.

> It’s trivial to limit number of created documents in postgres, couchdb or application server though validation, I’m talking about updating document not creating new. In posgres if I update 1mb document used space will not always grow. In couch db situation is different. In case of relation db you have application server with custom logic and validations, couchdb from other side is accessible from outsize.

It is? It's unclear to me why I'm allowing 10 updates to a (largish, 10MB! Use a file or store it in S3!) document per-minute, but not 10 creates. Maybe I'm building Google Docs? Except I'd want old revisions, so those are creates. Plus 10 Mb is a huge spreadsheet. But sure lets roll with it. Actually Couch does not keep old versions of documents around, only old revision numbers. When a document is updated, the old version becomes eligible for compaction (basically garbage collection). So your attacker has to be fast enough to outrun the compactor, while being slow enough to not get temporarily banned from your service. It seems like less effort to me to use this power to flood your network I/O, which is almost certainly lower than your disk I/O. Or just choke your Postgres server on it's 100Mb/s disk I/O for updates + whatever is required to maintain your indexes.

I'm not actually advocating for Couch over Postgres. In my mind Postgres should be the default choice, and you switch to something else if you have a reason. For Couch, the biggest reason is sync is built in, in such a way that you can leverage it for your own applications with minimal effort. In my experience sync can be devilishly hard for non-trivial cases, so depending on your app, that can be pretty compelling.

But so far you seem to be focused on DOS attacks your not going to find separate advice for Postgres vs Mongo vs Couch, because the backing system doesn't matter. The attacks and mitigations are identical no matter the back-end, namely stop the traffic before it consumes your resources.

Couch is not equivalent to mongo or relational because it accessible to clients if we want synchronisation. Securing app server is manageable problem and there is huge number of resources how to do it correctly.

In case of couch I've not seen any secure open-source example.

I'm not focused on DOS attacks, I'm just proposing different attack vectors.

Is it trivial? Let’s say you have a back end and an app that lets you post comments, like this site. How do you stop someone from spamming comments? Each comment is represented by a row in a table so the space will grow.

If you need to limit the number of items it is trivial. You need to write something like `has_many :things, :before_add => :limit_things` in app server or create constraint in sql.

Spam prevention is not trivial but mostly solved problem. You can find a lot of articles about this topic.

But creating secure couchdb looks like very non-trivial.

Yeah... that's a Rails callback, not an SQL constraint, and can't be relied upon in the face of multiple simultaneous requests. Which kind of demonstrates my point. With a custom API, you have to understand your system, it's requirements, and it's limitations. You can't just read a blog post on "securing your webapp" and assume it's good.

Couch is no different. You have to understand Couch, you have to understand it's features and limitations, and build your system within those constraints.

You seem to be asserting because Couch is designed to be internet connected it can't be secure. If that's true, then I guess every customer on IBM Cloudant (Couch as a service), Realm (another database designed for mobile sync), and Firebase (Google database as a service) are all in trouble and just don't know it yet.

Security for all systems is non trivial. Thinking it is assures your system is not secure.

I'm not asserting that couch is unsecure, I need such database but the problem that I can't see any resource that could help me design secure production system.

You can check even trivial rails blog or todo example from some book and it will be limited in scope but more or less secure. I'm having hard time to find secure couchdb example.

> Security for all systems is non trivial. But not equally hard.

If you use firebase you should understand that you getting vendor lock-in and in some cases you can spend much more money, but for some types of projects this platform is ok for me.

Same with couchdb, I understand that if I get replication with client, I need to pay by reorganising data or may be spend more resources to make system secure. There is no free lunch.

You are comparing apples to oranges. Again, are you imposing a hard constraint on the number of comments someone can make?

With the example o gave you could have a constraint in CouchDB achieve the effect but there are simply other examples one could use.

No I'm not assuming constraint on the number of comments. First example shows how easy limit number created objects. Spam prevention is other topic not so trivial but mostly solved problem.

Also as far as database size, I don't believe there is a hard limit. I think you might be thinking of when MongoDB would silently corrupt databases larger than 2GB on it's 32-bit version.

As far as I remember it was a filesystem limit not couchdb limit. It was a problem that file always grow and couchdb crashed when limit exceeded. Can't find particular issue, but googling show some issues [1] that make me think that we should be very careful with db size.

[1] https://stackoverflow.com/questions/40752578/couchdb-views-c...

You resolve that issue the same way you would resolve the same issue if you were using Postgres - you introduce some back-end.

For your example specifically I'd use a proxy.

Custom backend means no synchronisation and no advantages over postgres.

Do you propose to create proxy that parses query and estimates complexity? I think this task at least as hard as implementing couchdb myself (actually harder)

Is there any secure open source code with pouchdb/couchdb integrations?

Your backend can be a reverse proxy that authenticates requests then passes them off to CouchDB (or PouchDB, since that also runs on the server). I have an example up @ https://github.com/daleharvey/noted. The server is 200 lines and does signup / email authentication etc.

This server can't prevent authenticated user from uploading huge document of running expensive query.

Any reverse proxy can limit the the size of a document upload. Even just plain NGINX can do that. Just set the client max body size.

As for queries, it kind of depends on your model. Mango queries are pretty limited (no joins, no arbitrary filters), so it's not necessarily as easy as you think to write one that hosed performance. A client could of course write one that doesn't use an index, which may or may not be a concern.

An easy option if it is though is just don't expose the `_find` endpoint, which effectively limits your users to the map/reduce queries you've written (unless you give them admin they don't have the ability to create their own).

A popular model is for the clients to run the queries locally, the server doesnt need to expose any query endpoints, only the ones necessary for replication.

Is it any documents that describes secure couchdb architecture? Most of the articles I find are limited to authentication and basic permissions.

What kind of document are you looking for here? There is [1], but yeah, that covers access controls. As do the MongoDB [2] and Postgres [3] documents.

I feel like your thinking about Couch as exposing your entire PostgreSQL DB to the internet, whereas with couch, a common model is to have a single database per user. In the Postgres model, providing the end user with any direct access is a nightmare, because every other users data is in there and I have to keep other users from viewing/modifying it. In Couch, you give them access to their database and only their database, that's how you isolate users.



[3]https://www.postgresql.org/docs/7.0/security.htm [3]

> What kind of document are you looking for here? > There is [1], but yeah, that covers access controls. As do the MongoDB [2] and Postgres [3] documents.

Mongo and postgress usually is not accessible for clients only for backend. Security handled by backend mostly and there is a plenty of resources how to implement secure server side applications which discusses attack vectors and how to make secure apps. Thankfully to this thread I’ve got few good ideas, that may help to design secure couchdb architecture (such as remove _find endpoint) but I’ve not seen any in-depth document about couchdb.

> I feel like your thinking about Couch as exposing your entire PostgreSQL DB to the internet

No, why do you think so?

I wasnt worried about that since it is a basic proof of concept, adding that would make it ~210 lines of code.

There are plenty of proxies that do that with some config like nginx. Even if you were using a relational database with a backend you’d still have to solve the same problem.

If I use backend I can create all validation logic in application server. But in this case no automatic synchronisation.

One of the major selling point of couchdb is replication protocol for client-server data syncing. When you design product with posgress you don't allow to execute raw sql queries from clients without any application server. But looks like it is recommended way to update data in couchdb world if you want to have synchronisation. I can't understand how can this architecture be secure?

Couchdb has options for controlling which documents are replicated. This may help depending on your use case.

nginx couldn't solve the "execute expensive query" though right, only limit max size. I guess you could do a request timeout + blacklist, but that would also be hard to do right, since at heavy load some proper clients might get blacklisted.

Other than being a great solution for some problems I wanted to highlight the fact that CouchDB has commited to SpiderMonkey (the Mozilla JS engine) since the very beginning and is one of the few projects helping to fend away the V8 monoculture.

Many commenters here still think CouchDB is the same thing it was many years ago.

CouchDB was a simple but very powerful idea (that still needed improvements), but it was coopted into something not very nice nor good nor useful.

See my old rant about it and why it failed: http://web.archive.org/web/20170530122143/http://entulho.fia...

is the lucene search indexer synchronous with couchdb days updates?

I'm wondering how people solve the common search after create pattern when using external indexes

Yup, works with clustering and everything: https://blog.couchdb.org/2020/02/26/the-road-to-couchdb-3-0-...

I once read that the right way to use CouchDB is for every user to have its own database. However, how does this work with BI? Or with public data that should be known by all users? Do I create a single centralized DB just for that kind of data? Maybe aggregate data from all users' DBs? Genuinely curious.

For public data, you can try to partition it in such a way that writes can be merged without any potential conflicts. E.g. a user's posts are in a separate partition.

I have never done this with CouchDB, but the technique is described in Martin Kleppman's __Designing Data Intensive Applications__.

You can replicate all per-user DBs into a central database today.

We are working on per-document-access-control at the moment, to support this use-case out of the box

Can't wait to see what CouchDB 4.0 with FoundationDB at it's core does for the db.

This is a great release too!

anybody acquainted with pouchdb devs ? just to know if there are plans to migrate already or not

For those who don't want to follow the link, there are no changes to the replication protocol in Couch 3.0, so PouchDB already works.

I haven't used CouchDB in years. I just downloaded and installed it. Interesting that there are no apparent links to client libraries in different languages. Perhaps most people just use the HTTP API Reference and roll their own.

There are a couple of client libraries in .NET

Some are no longer maintained. Some still work.

In Ruby there is a CouchRest gem, which I've used, but to be honest a REST interface that talks JSON is so easy to use that I've often thought we'd be better off without anything specific.

Thanks, that makes sense.

CouchDB is good. Yes. I still dream of the day when the cluster will balance shards automatically and recover better from losing and replacing nodes. :D

Maybe I'm being petty, but it doesn't fill me with confidence when the ssl certificate on their website isn't even configured properly (valid for uberspace.de domain).

To clarify: seems their main site is on apache.org. But, their www.couchdb.org site (hosted on uberspace.de) doesn't have a correct cert.

For me I get a valid Let's Encrypt certificate that has blog.couchdb.org in its SAN list.

Their blog is hosted on Wordpress.com which seems to be using Let's Encrypt to generate one certificate for multiple different, unrelated custom domain names.

Maybe you encountered a bug where it served the wrong cert for a different batch of custom domains.

Update: someone has now fixed it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact