Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What NoSQL database do you recommend in 2020?
11 points by pier25 12 days ago | hide | past | web | favorite | 34 comments





Three out of five answers so far just questions the legitimacy of using noSql.

There are definitely use cases for noSQL and I clicked on this thread hoping for information and war stories about cockroach, mongo, redis, couchdb, the current state of noSql in postgres, and a few name I maybe never heard of.

Let me derail the conversation of berating this technology. I've happen to had a requirement which needs a dinamically changing dB structure (saving lots of json data from an dinamically changeable form). Which noSql db would you recommend for me? Any pitfalls? I'm primarily looking for self hosted solutions.


Honestly Postgres is probably my first choice for storing some dynamic JSON and I have used it for exactly that in the past. In particular, I used to to build a simple time series db for reporting on data from JSON payloads that the potential to hold arbitrary data.

The OP has harped quite a bit on connection limit issues and this is a valid concern but also something that you can mitigate by using connection pooling. Geographic replication is an issue and it's one of those things where I'm not really convinced any sql or nosql db offers a really good solution. For example, if you run mongodb in replica you must take extra steps to ensure your replicas do not suffer from split brain during a network partition. And as another commenter has pointed out, mongodb's transactions are flawed. I would not trust it for anything transactional beyond single document transactions in a non-replicated configuration.

DynamoDB is pretty decent for a document DB but I personally dislike working with it. It requires you to make many sacrifices that RDBMS's like postgres offer out of the box. For example, if you want to fetch all records you will have to do it in a loop because there are limits to the number of records that can be fetched at the same time. But of course you can't self host it.

Another hosted option is Datatomic. I've heard great things but have never used it so I can't really comment.


> you can mitigate by using connection pooling

Can you do connection pooling to Postgres from cloud functions?

I know there is a Node driver that does it for MySQL [1] but I've never seen one for Postgres.

[1] https://github.com/jeremydaly/serverless-mysql


Run something like PgBouncer in front of your postgres instance. https://www.pgbouncer.org/

For what it is worth, the popular postgres and mysql libraries for nodejs support connection pooling out of the box. Cloud functions break this functionality because you are spinning up an instance per connection rather than handling multiple connections with a single instance of nodejs. In this case I think that longer lived containers or vms that can autoscale my be a better solution. That is not to say you should not use cloud functions. I understand they offer many benefits. Managing a fleet of vms or containers introduces its own challenges. In general I think this is a good example of how simplifying one thing can complicate other things. There are no silver bullets. It's all about determining the compromises you can and cannot accept.


Thanks for the link to Pgbouncer, I'll check that.

I'm using Cloudflare Workers which are distributed on the CF network and run at the edge to reduce latency. Achieving the same thing with containers would be probably complicated and expensive.

Right now I'm using Fauna DB which is also distributed and doesn't have connection limits, but I want to explore all my options.

Edit:

If anyone wants to know more about PGBouncer and pooling with postgres I found this article very informative:

https://www.compose.com/articles/how-to-pool-postgresql-conn...


Maybe Mongo if you don't need ACID transactions [1].

https://twitter.com/jepsen_io/status/1261276984681754625


HN can be fairly rubbish at times. People here can get too tied to their tools to even consider that other tools may be useful to others. NoSQL has it uses and I personally love using one.

OP, try https://www.arangodb.com It is the the best NoSQL IMHO. Its multi model, extremely performant, has fantastic distributed/replication capabilities and good documentation. They even have a hosted offering of it.


Thanks for the suggestion.

It looks good but their cloud offering seems quite expensive starting at $0.20 per hour or about $150 per month.


yep. the hosted option is fairly new and they are still finding their feet. Arangodb has so many amazing stuff about it; AQL which is a query language which looks like code, Foxx which as integrated web server and so on.

This probably won't be a very popular opinion since it's not really a nosql database but if you have a relatively small dataset and not very complex querying needs you may be able to use Redis as a pretty decent datasource. Another solution may be Firebase Realtime Database although that will limit you vendor wise.

Redis only runs in memory and Firebase RTDB has many limitations and only works for the most simplistic use cases (I've been using it since 2016).

Firestore is better than the RTDB but still very limited compared to say Mongo or Fauna.


While Redis needs to keep the dataset in memory (which is why I added that it kind of depends on the size of your stuff) it does have quite robust persistence features [0] so it's very unlikely you'll ever lose your data even across reboots or crashes if it's configured correctly.

That being said, it's still not really a database engine on itself and would also require a slight paradigm change on how you think about your data and how you create your schema so ymmv. But I've personally used it across a few non-data-heavy projects as primary datasource and have been quite happy with it. It was also famously used as primary datasource for a well known adult website generating 200M pageviews/day even back in 2012 [1] [2] although I don't know if that is still the case.

[0] https://redis.io/topics/persistence

[1] http://highscalability.com/blog/2012/4/2/youporn-targeting-2...

[2] https://news.ycombinator.com/item?id=3597891


Thanks for the info.

I'm already using Cloudflare Workers so it would make more sense to simply use Workers KV instead of Redis which are stored at the edge:

Heroku Redis:

- $15 per month

- 50MB of memory

- 40 connections limit

Workers KV:

- $5 per month

- 1GB of storage (then $0.5 per GB)

- 10M reads (then $0.5 per 1M reads)

- Unlimited connections (via API or REST)


Yes that will probably be better in your case. I'm sorry I didn't really notice your other replies stating that you wish to go fully serverless.

On a side note, for the $15 a month Heroku charges for a 50MB Redis (wtf?) one could, for example, get a 4vCPU/8GB/160SSD server on Hetzner cloud with 20TB of traffic or multiple smaller servers and have a much better infrastructure and room to grow without any added costs. Oh well, I guess "managed" is where the big bucks are :) /rant


No worries.

I wish I could manage a VPS. I've dabbled with DO et al but I wouldn't sleep at night having a self configured VPS in production.


Yeah I get that, it does take a bit of work and time to learn and get used or comfortable with this stuff which a lot of the times could be better spent building or promoting your app and then just hire someone in the future to deal with that if needed.

Being used to configure my own servers with relatively low traffic/processing needs on the cheap, I just get weirded out sometimes by the very expensive nature of these managed services like heroku, aws, etc and their "charge for every little thing" mentality.

Still, the "elastic" functionalities and auto growth features they offer are usually pretty awesome and not trivial to setup on a VPS so it's a great and worry free option in that regard.

Hope you can find the right solution for your case, good luck :)


Good luck to you too kind stranger!

First determine whether Nosql is really the solution you want. Next once you think nosql is the solution you want, have your experienced old hands slap you a few times.

If that still doesn’t convince you, then you may actually may need nosql: go for Mongo.


Mongo was analyzed recently by Jepsen again and it didn't turn out well...

https://twitter.com/jepsen_io/status/1261276984681754625


What is your use case? SQL RDBMS are generally a sensible default and you should use NoSQL only in places where they cannot be used (this is mostly related to scale requirements that are too much for RDBMS to handle).

I'm looking for the best DB for a serverless backend (cloud functions).

Typically the problem with RDBMS is that it's very expensive to handle thousands of concurrent connections. NoSQL doesn't have that issue. For example FaunaDB is designed for serverless and has no practical connection limits, Mongo Atlas gives you 500 concurrent connections on the free tier [1], etc.

In comparison Postgres on Heroku only gives you 500 connections on the most expensive plans. Even the $50 per month Postgres plan only gives you 50 concurrent connections.

[1] https://docs.atlas.mongodb.com/reference/atlas-limits/


Most of the time when people think they need NoSQL they don’t. But if you really do, and value your data, FoundationDB

DynamoDB is pretty good, honestly

Why NoSQL?

Because generally speaking SQL databases don't work well with serverless and are a pain to distribute geographically.

Huh. A lot of us use serverless relational databases now, if by “serverless” you mean “runs on a remote server someone else manages.” AWS RDS, for example.

A pain to distribute geographically? What do you think big enterprises and banks use? Oracle or Mongo? If by “a pain” you mean “not free” then you’re right. Depends on how valuable your data is and how much you care about integrity.


No, by serverless I mean cloud functions.

Also I’m not a big enterprise nor a bank.


Distributing geographically generally comes up in large-scale systems. My point is that relational databases already support that. I know what cloud functions are. How do they not work with relational databases, any more than they don’t work with some NoSQL tool? Unless your cloud functions embed the database they’re using a remote service, right?

Because on peak traffic you could have thousands of cloud functions running, each with its own connection to the database. Typically SQL databases are much more strict on the number of concurrent connections as these consume more memory than on NoSql databases.

Edit:

IIRC I read that each connection to Postgres consumes 10MB or RAM.


Postgres core hackers are working to ease the limit on max allowable concurrent connections, for example:

https://www.postgresql.org/message-id/flat/20200301083601.ew...

Maybe the proposal above by itself isn't enough, but it's going to solve some of the many problems that put a restriction on max allowable connections in the first place. For example, I am not sure if there is anything in there to reduce the baseline per connection memory consumption, but maybe that will come up as the next problem to solve and will hopefully be solved sooner than later.


Could someone educate me on what you mean by "cloud functions" ? I do not understand the concept of "serverless".

To me, data has "live" somewhere. When one computer is storing that and serving it, it is by definition a server.

Is there something I've missed??


Serverless means the infrastructure is completely abstracted from you and it scales on demand.

Cloud functions are triggered on demand so these scale up and down as needed.


Amazon Aurora seems to fit.

Can you access Aurora securely from outside AWS?



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: