
CouchDB 3.0 - ifcologne
https://blog.couchdb.org/2020/02/26/3-0/
======
splatcollision
CouchDB is awesome, full stop.

While it's missing some popularity from MongoDB and having wide adoption of
things like mongoose in lots of open source CMS-type projects, it wins for the
(i believe) unique take on map / reduce and writing custom javascript view
functions that run on every document, letting you really customize the way you
can query slice and access parts of your data...

Example: I'm building a document analysis app that does topic + keyword
frequency vectorization of a corpus of documents, only a few thousand for now.

I end up with a bunch of documents that have "text": "here is my document
text..." and "vector": [ array of floating point values ...].

What I can do with couchdb is store that 20d vector and emit integers of it as
a query key:

    
    
        var intVectors = doc.vector.map(function(val){
          return Math.floor(val)
        })
        emit(intVectors, 1);
    

Then I can match an input document's vector (calculated the same as corpus
documents), calculate a 'range' of those vectors, pass it as start and end
keys, and super quickly get a result from the database of 'here are documents
that have vectors similar to your input'...

Super fun, quick and flexible to work with!

~~~
treis
>CouchDB is awesome, full stop.

The problem I had with CouchDB is integrating it into a framework like Rails.
CouchDB on its own does so much cool stuff. The "free" HTTP API and client
replication via PouchDB are the two huge ones. But it just wasn't smooth
enough to get the data out, use it where I wanted, and then save it back.

~~~
teddyc
I had to write my own libs/helpers to interact and make it feel friendly to
the developer when I used it with Rails in the past.

But after that, it was very nice.

~~~
ahnick
Did you open source these by chance?

------
newfeatureok
One interesting thing you can do with CouchDB is that you can have a webapp
where a user can specify their own database and credentials and it works over
HTTP(s). That's pretty unique. I'd love to see a SaaS using CouchDB and their
"on-premise" offering just means the user provides their own database. I'm not
sure how payment would work though - perhaps some verification proxy?

Firebase is the gold-standard for offline apps (as a service). CouchDB
replaces Cloud Firestore, and Keycloak replaces Authentication. I haven't seen
OSS equivalents of Cloud Functions, ML Kit, and the other things (e.g. In-App
messaging, and Cloud Messaging). It'd be nice to have the entire stack of
Firebase bundled as a group of OSS projects, including CouchDB.

Sad to see that per doc access control didn't make it in 3.0. Hopefully it'll
be in 3.1.

~~~
Graphguy
Cloudant on IBM Cloud is CouchDB API/replication compatible and offers support
for Apache CouchDB (1). Also, OpenWhisk integrates nicely with
CouchDB/Cloudant and can even be a backing persistence for it (2)

(1) [https://www.ibm.com/cloud/blog/announcements/announcing-
supp...](https://www.ibm.com/cloud/blog/announcements/announcing-support-and-
a-kubernetes-operator-for-apache-couchdb)
(2)[https://github.com/apache/openwhisk/blob/master/tools/db/REA...](https://github.com/apache/openwhisk/blob/master/tools/db/README.md)

~~~
newfeatureok
Cloudant is awesome, but it's way too expensive IMHO.

~~~
Graphguy
Send me an email (in profile.) Would love to chat and see what we can do for
you.

~~~
mauflows
If you indeed work for Cloudant, please consider trying to convince someone to
invest in PouchDB. It looks mostly unmaintained and it would be in IBM's and
the community's interest to keep it running!

------
smoyer
I built two products on CouchDB 1.x starting in 2010 ... version three is
another amazing step forward! For my more recent projects, I've replaced
CouchDB with clustered PostgreSQL using JSON columns as I really enjoy the
ability to write SQL queries for against the JSON and to use the built-in
full-text search capabilities. I think both CouchDB and clustered PostgreSQL
are amazing tools and it's nice to be able to choose between them as needed.
The best advice I've heard is to choose CouchDB when you know your queries
ahead of time and the data "schema"[1] is variable and choose PostgreSQL when
you know your data ahead of time and your queries are variable.

[1] In this case, a JSON document but either with a JSON-schema or
marshaled/unmarshaled into a strict type.

~~~
jimstr
I've gotten the impression that clustered Postgres still isn't very
straightforward to run. Do you mind elaborating on your ideal setup and point
to some resources?

Thanks!

~~~
smoyer
It's not straightforward at all but it's better than it was five years ago ...
you can use something more "meta" like SymmetricDS
([https://www.symmetricds.org/](https://www.symmetricds.org/)). I haven't used
it personally but a dirt simple way to get an HA, scalable PostgreSQL instance
would be to use Amazon's Aurora DB.

------
knubie
CouchDB is awesome and feels way ahead of its time. Its design docs are
extremely powerful, to the point that you can build entire web apps with
CouchDB alone (not that that's recommended anymore). Plus with PouchDB you can
create offline-first apps that sync with a remote CouchDB instance.

~~~
code-is-code
If you like PouchDB, you should also check out RxDB. It is build on top of
PouchDB and is optimised for realtime-applications where you can subscribe to
queries and stuff.

[https://github.com/pubkey/rxdb](https://github.com/pubkey/rxdb)

------
Phillips126
I haven't heard of CouchDB in quite some time, great to see it still
improving.

I used it years ago when I was experimenting with Ionic[0]. What appealed to
me was that I could use CouchDB (cloud) and PouchDB[1] (device) to and have a
replicated copy of the data locally. The application was used in areas where
network connection was very limited. Using this strategy I was able to ensure
the mobile devices data was as recent as the last time it had a network
connection.

[0] - [https://ionicframework.com/](https://ionicframework.com/)

[1] - [https://pouchdb.com/](https://pouchdb.com/)

~~~
lytefm
I can confirm that the stack still works well :) We've been developing a
cross-platform app for the German market - therefore the need of offline
capability - since 2017 and never had any real issues with Pouch/Couch, that
part just worked. The upgrade from Ionic 3 to 4 was was quite painful though.

For user authentication I've forked the nowadays unmaintained superlogin
package [1], which still does a great job when keeping the dependencies up to
date.

[1]
[https://github.com/LyteFM/superlogin](https://github.com/LyteFM/superlogin)

------
hajile
Reducing max document size from 4GB down to 8MB seems hyper-restrictive.

For those interested, looks like the guts of CouchDB are going to be swapped
out for FoundationDB.

[https://blog.couchdb.org/2020/02/26/the-road-to-
couchdb-3-0-...](https://blog.couchdb.org/2020/02/26/the-road-to-
couchdb-3-0-prepare-for-4-0/)

~~~
splatcollision
If you're trying to store single GB documents in couch, you're doing it
wrong... Unless those are binaries you can usually fragment data logically
across many documents, then write custom views to aggregate however you need
to.

Updates on huge docs would be painful!

~~~
meddlepal
I agree in large with your point that multi-GB documents is perhaps excessive
but this does create a heck of a migration problem for a lot of users that
aren't even close to 4GB.

~~~
Volundr
It does, but keep in mind for 3.0 this is a change to the default settings,
not a hard cap. The idea is to give people lots of warning and time to do any
migrations necessary prior to 4.0.

------
johnchristopher
> – Updated to modern JavaScript engine SpiderMonkey 60

Yes ^^ !

Congrats to the team. These people are some of the nicest and most supportive
devs I know of in the OSS community (or whatev').

They show a great deal of patience in their slack channel and are always
welcoming and answering stupid questions from idiots like me.

~~~
janl
<3

------
tbrock
At this point why would you use CouchDB over something like MongoDB?

Seriously asking...

Over the past 5 years MongoDB has gotten a great storage engine, transactions,
distributed transactions, multi master replication, first class change streams
and is very very solid as a foundational piece of infrastructure you can rely
on while CouchDB has languished. I can’t imagine reaching for it in my tool
belt when I need a document store over MongoDB but I’m obviously biased so I’m
wondering if there is a lot I’m missing.

Obviously it’s cool from a more open source databases standpoint — I love
learning about how things are built and evolve over time.

~~~
Quarrelsome
MongoDB has _suspiciously_ amazing SEO and marketing. CouchDB's by contrast is
awful.

As silly as that sounds as a reason to choose CouchDB it demonstrates where
the respective company's priorities lie.

~~~
dumbfounder
MongoDB is a an actual for-profit company, so of course it has marketing.
CouchDB is an open source apache project. You are comparing apples to oranges.

~~~
Quarrelsome
Its not just CouchDB, I did a lot of searching a little while back and I
generally just don't like how tightly they've wrapped up almost all of the
results. If you go by searching alone its overwhelmingly MongoDB positive to
such an extent that its hard to believe its organic.

It's definitely impressive marketing but when I'm deciding on which tool to
use that arguably works against them as opposed to for them.

~~~
z3t4
So you are choosing software on it's technical merit and not just on
popularity and hype!? What if there are no breaking changes every third month,
what are you gonna do? Solve actual problems!? :P

~~~
dumbfounder
It sounds to me like they are choosing their software based off their
marketing techniques.

~~~
z3t4
Interesting. So they are catching a ride in the hype train by adopting the
hyped technology... So this is why the tech stack is picked by and marketed to
the business/directors rather then engineering.

------
anonyfox
Has anyone here tried to use couchdb __directly __within an elixir /erlang OTP
application? As like, „mix install“? Would kill for couchdb as a library!

------
crudbug
Congrats to the whole team.

Looking forward to CouchDB 4.0/FoundationDB goodies. Do we have any roadmap
details on this.

------
e12e
Oh wow, this is great news. I though the project was effectively long dead. Is
there a new/up-to-date "couchapp" too?

> Default installations are now secure and locked down.

More good news!

Anyone have recent experience with couchdb?

I see the (quickstart) docs use plain http - should one terminate ssl in
front, eg with a recent version of haproxy?

~~~
mauflows
I wish couch was used whenever users ask for an app to "sync to Dropbox". I
don't know if this changes with 3.0 but couch is naturally database per user,
took me five minutes to install on my rpi with docker, very good admin
interface, the database is the frontend (no driver or separate process), and
let's the application layer handle conflicts.

------
couchdb_ouchdb
I'm surprised to see so much love for CouchDB in this thread. I don't think
it's been widely adopted in corporate america and has lost the war to MongoDB
closed source or not.

I joined a company where it's being used backing a mobile app with couch/pouch
in production. We can't wait to get off of it. Writes are slow. Reads are
worse. Having a DB per user is a scaling and backup nightmare. If you run into
any issues, it's a ghost town.

I'm glad the CouchDB Team is forging ahead, but who is really using this
database?

~~~
staticautomatic
Would you be willing to say more? Inquiring minds want to know.

------
yatsyk
CouchDB/PouchDB looks very promising for offline first apps, but I can’t
understand how to restrict bad clients. Client potentially could insert
document of huge size or execute expensive query and degrade experience of
other clients on the same server. Is it any way to prevent this?

~~~
Volundr
A couple ways:

One you implement validation functions [1] on user databases to control what
kind of data can be inserted into couch. These functions can only be changed
by database admins, not users, so can act as a security mechanism controlling
what goes in.

As mentioned by others you can also implement a proxy. This doesn't have to
interfere with sync functionality, you just have to make sure you proxy all
the endpoints in the replication protocol [2]. Envoy [3] is one such proxy
that essentially applies document level permissions to a CouchDB database
without interfering with sync.

If the goal is just to limit document size, or throttle clients trying to
hammer the API, this doesn't even have to be a custom proxy, and reverse proxy
with the needed control knobs (such as NGINX) will do. You can of course
combine this with validation functions, using validations to ensure the
everything that comes in is the right "shape" and using NGINX and it's ilk to
apply throttling and sane request limits.

At scale there's a decent chance you want a proxy in front of your Couch
instance anyway, since Couch is truly multi-master, meaning you probably want
to balance your clients across all your nodes anyway.

[1]
[https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate...](https://docs.couchdb.org/en/stable/ddocs/ddocs.html#validate-
document-update-functions) [2]
[https://docs.couchdb.org/en/stable/replication/protocol.html...](https://docs.couchdb.org/en/stable/replication/protocol.html#)
[3] [https://github.com/cloudant-labs/envoy](https://github.com/cloudant-
labs/envoy)

~~~
yatsyk
Thank you for pointing at validation, I'll check it. It's not completely clear
what is it possible to limit not only particular document but database, or how
to handle conflict if document changed on pouch, but rejected on couchdb
server.

I'm not sure about current time but previously it was a problem that couchdb
file grow until some limit on filesystem and couchdb just crashed.

Start of the envoy readme: it's not battle tested or supported in any way.
Also it doesn't do any validation apart from limiting permissions for
different users.

It's easier to reimplement couchdb than to create smart proxy that will
estimate is this query expensive or not.

I'm not saying about rate-limiting proxy or load-balancing to different
backends which could be implemented on nginx or something else.

~~~
Volundr
I'm not clear what you mean by limiting "not only particular document but
database". As far as a document changing in pouch and rejected on the server,
that's one of two scenarios.

1) The client you wrote is bugged and generated bad data. This scenarios can
occur just as easily using Postgres and an application server. What does your
app server do if a client tries to send bad data? (Answer: Whatever you told
it to do. Most likely throwing a 500 when your databases refuses the incoming
data.)

As for what will happen when pouch syncs to couch the server will let
everything else sync, but not the bad document. The return value from the API
call will tell you what documents didn't sync.

2) Someone is intentionally trying to shove bad data into your database. In
this case it's worked as advertised and rejected the bad data. What do you
care if a malicious client breaks?

What kind of "expensive" query are you envisioning? Mango queries don't
support joins, and only simple equality filters, so in general the worst thing
someone could do is send a query that doesn't use an index, but why are you
letting the client query the server in the first place? Just have the client
sync and query client side. Or don't allow access the the _find endpoint and
restrict them to the map/reduce view you handwrote.

If you must let them send arbitrary queries (which to me implies a relatively
trusted user, but let's pretend their not), then run the query with a limit of
1 or 0, and examine the execution stats to see if they are using an index, and
check their query to see if their limit is reasonable. But at this point
you've now entered into a scenario that's going to be _very_ difficult with a
custom API too.

~~~
yatsyk
> I’m not clear what you mean by limiting "not only particular document but
> database".

I’ve limited document size to 10mb and ratelimited updates to 10 per second.
Client starts to update document with random data 10 requests per second. As
far as I understand couch stores all versions at least some time. This means
that this one client could fill space on my server 100mb/s. There is no such
issues with postgress, and no one allow clients execute raw queries on
database without any application server. Document only 10mb but database is
huge.

> What kind of "expensive" query are you envisioning?

I have never used couch, so I don’t know what could be expensive. May be some
lookup without index or something like this.

Sorry for my ignorance, is it true that if I limit couch only to replication
it will not be any not indexed lookups?

Looks like implement secure system with couch is very hard but I can’t find
any best practices, mostly only authentication and basic validation.

~~~
Volundr
> I’ve limited document size to 10mb and rate-limited updates to 10 per
> second. Client starts to update document with random data 10 requests per
> second. As far as I understand couch stores all versions at least some time.
> This means that this one client could fill space on my server 100mb/s. There
> is no such issues with PostgreSQL, and no one allow clients execute raw
> queries on database without any application server. Document only 10mb but
> database is huge.

Ah! Now we are getting somewhere! Your concerned about someone filling your
disk.

OK, let's modify your scenario a little. Instead of updating an existing
document, they create a new document. This a malicious client, why do updates
that'll get cleaned up in a few minutes when I can make it permanent?

So, CouchDB allows these writes, and now your disk is full.

What does Postgres with a custom API do? Allows these writes, and now your
disk is full.

Your allowing 10MB documents because that makes sense for your application
right? So your Postgres table is going to have a binary column or some other
column meant to hold bulk data, and your API is going to accept it.

If it doesn't make sense, lower the max document size. Apply validations to
limit what fields can be written to, and how big they can be. In Postgres this
is called your "schema". Couch being "schemaless", it's now your validation
function. Couch is no different from any other schemaless database such as
Mongo, RethinkDB and FoundationDB in this regard.

Also your rate limiting here is weak. If I can post to your sever at 100Mb/s
second, I can saturate a 1GB link with only 10 clients. Doesn't matter if you
reject my posts, if I can send them to the server, I can DOS you pretty
easily.

The main thing Postgres gives you here is that it requires you to define your
schema upfront (unless you use JSON columns, in which case it joins the
schemaless club above). Couch will happily let you not, in which case someone
wants to write a record of their car maintenance into your recipe book app?
Couch is good with that. But take a step back. what actually stops them from
putting that in the "description" column of your Postgres recipe app? Not
much. So you have to think about what's important. Do I actually need to make
sure these are all the same "shape"? If so I need a validation function. If I
can just shrug and say "garbage in, garbage out", then I just need controls
around how much data they can insert, but hey, I needed that for Postgres
anyway.

> Sorry for my ignorance, is it true that if I limit couch only to replication
> it will not be any not indexed lookups?

Correct (enough). The entirety of CouchDB is built around efficient
replication. While it's not going to use a formal "index" getting all of the
changes after a specific rev is an efficient operation.

~~~
yatsyk
It’s trivial to limit number of created documents in postgres, couchdb or
application server though validation, I’m talking about updating document not
creating new. In posgres if I update 1mb document used space will not always
grow. In couch db situation is different. In case of relation db you have
application server with custom logic and validations, couchdb from other side
is accessible from outsize.

My idea that it’s very hard to create safe couchdb based system and most
recommendations limited to setup nginx proxy and authenticate users which is
not enough.

~~~
Volundr
> It’s trivial to limit number of created documents in postgres, couchdb or
> application server though validation, I’m talking about updating document
> not creating new. In posgres if I update 1mb document used space will not
> always grow. In couch db situation is different. In case of relation db you
> have application server with custom logic and validations, couchdb from
> other side is accessible from outsize.

It is? It's unclear to me why I'm allowing 10 updates to a (largish, 10MB! Use
a file or store it in S3!) document per-minute, but not 10 creates. Maybe I'm
building Google Docs? Except I'd want old revisions, so those are creates.
Plus 10 Mb is a huge spreadsheet. But sure lets roll with it. Actually Couch
does _not_ keep old versions of documents around, only old revision numbers.
When a document is updated, the old version becomes eligible for compaction
(basically garbage collection). So your attacker has to be fast enough to
outrun the compactor, while being slow enough to not get temporarily banned
from your service. It seems like less effort to me to use this power to flood
your network I/O, which is almost certainly lower than your disk I/O. Or just
choke your Postgres server on it's 100Mb/s disk I/O for updates + whatever is
required to maintain your indexes.

I'm not actually advocating for Couch over Postgres. In my mind Postgres
should be the default choice, and you switch to something else if you have a
reason. For Couch, the biggest reason is sync is built in, in such a way that
you can leverage it for your own applications with minimal effort. In my
experience sync can be devilishly hard for non-trivial cases, so depending on
your app, that can be pretty compelling.

But so far you seem to be focused on DOS attacks your not going to find
separate advice for Postgres vs Mongo vs Couch, because the backing system
doesn't matter. The attacks and mitigations are identical no matter the back-
end, namely stop the traffic before it consumes your resources.

~~~
yatsyk
Couch is not equivalent to mongo or relational because it accessible to
clients if we want synchronisation. Securing app server is manageable problem
and there is huge number of resources how to do it correctly.

In case of couch I've not seen any secure open-source example.

I'm not focused on DOS attacks, I'm just proposing different attack vectors.

------
pawelk
Other than being a great solution for some problems I wanted to highlight the
fact that CouchDB has commited to SpiderMonkey (the Mozilla JS engine) since
the very beginning and is one of the few projects helping to fend away the V8
monoculture.

------
fiatjaf
Many commenters here still think CouchDB is the same thing it was many years
ago.

CouchDB was a simple but very powerful idea (that still needed improvements),
but it was coopted into something not very nice nor good nor useful.

See my old rant about it and why it failed:
[http://web.archive.org/web/20170530122143/http://entulho.fia...](http://web.archive.org/web/20170530122143/http://entulho.fiatjaf.alhur.es/notes/about-
couchdb/)

------
LoSboccacc
is the lucene search indexer synchronous with couchdb days updates?

I'm wondering how people solve the common search after create pattern when
using external indexes

~~~
janl
Yup, works with clustering and everything:
[https://blog.couchdb.org/2020/02/26/the-road-to-
couchdb-3-0-...](https://blog.couchdb.org/2020/02/26/the-road-to-
couchdb-3-0-easy-fulltext-search/)

------
haolez
I once read that the right way to use CouchDB is for every user to have its
own database. However, how does this work with BI? Or with public data that
should be known by all users? Do I create a single centralized DB just for
that kind of data? Maybe aggregate data from all users' DBs? Genuinely
curious.

~~~
CameronNemo
For public data, you can try to partition it in such a way that writes can be
merged without any potential conflicts. E.g. a user's posts are in a separate
partition.

I have never done this with CouchDB, but the technique is described in Martin
Kleppman's __Designing Data Intensive Applications__.

------
gigatexal
Can't wait to see what CouchDB 4.0 with FoundationDB at it's core does for the
db.

This is a great release too!

------
agumonkey
anybody acquainted with pouchdb devs ? just to know if there are plans to
migrate already or not

~~~
gtirloni
[https://github.com/pouchdb/pouchdb/issues/7987](https://github.com/pouchdb/pouchdb/issues/7987)

------
mark_l_watson
I haven't used CouchDB in years. I just downloaded and installed it.
Interesting that there are no apparent links to client libraries in different
languages. Perhaps most people just use the HTTP API Reference and roll their
own.

~~~
mikekchar
In Ruby there is a CouchRest gem, which I've used, but to be honest a REST
interface that talks JSON is _so_ easy to use that I've often thought we'd be
better off without anything specific.

~~~
mark_l_watson
Thanks, that makes sense.

------
seigel
CouchDB is good. Yes. I still dream of the day when the cluster will balance
shards automatically and recover better from losing and replacing nodes. :D

------
canada_dry
Maybe I'm being petty, but it doesn't fill me with confidence when the ssl
certificate on their website isn't even configured properly (valid for
uberspace.de domain).

To clarify: seems their main site is on apache.org. But, their www.couchdb.org
site (hosted on uberspace.de) doesn't have a correct cert.

~~~
lars_francke
For me I get a valid Let's Encrypt certificate that has blog.couchdb.org in
its SAN list.

