
What I learnt from building 3 high traffic web apps on an embedded kv store - yarapavan
https://hackernoon.com/what-i-learnt-from-building-3-high-traffic-web-applications-on-an-embedded-key-value-store-68d47249774f
======
SahAssar
> A key value database stores a data record using one primary key. The key
> means the record is uniquely identifiable, and can be directly accessed.

So, like the primary key in a SQL database? Wouldn't just using a primary key
in a SQL database give you much of the same performance if the key-value in
question already needs ACID anyway? (I get that pure caching key-value stores
are a different beast)

> Unlike a relational database, a NoSQL key value database is not obliged to
> scale vertically. It can scale over several machines or devices by several
> orders of magnitude

So, just like partitioning can do with a SQL database?

> I came up with a strategy of having multiple badger databases, each
> representing a collection. This way, if I ever have a need to scale the
> system beyond a single server, I could isolate each individual database and
> it’s corresponding program logic into a separate micro-service.

So, that's just a self-built partitioning system, right?

> where it is lacking is in search, since you’re only able to query for items
> by their keys and key’s prefix. But when paired with an indexing engine like
> blevesearch in golang, or elasicsearch and lucene

Doesn't the need for an external search index for even the most basic of
queries negate the whole point? If you are going to set this up and push to
two different datastores, keep them in sync, manually write microservies just
to respond to db-queries and to get partitioning and also get none of the
guarantees that a normal DB gives, what is the point?

And all of this is before you get into any of the capital-h Hard problems that
databases have been dealing with and learnt to solve over decades.

------
amirouche
The monolitic nature of embedded key-value stores based webapps makes it a
poor solution for scaling horizontally. But who cares?

Mind the fact, that given the database is embedded in the webapp processus,
you need some auth machinery to hit the database from the outside world
anytime. Which leads to more wheel reinvention like background process
scheduling like Celery.

At the end of the day, embedded key-value stores are great because they
require _less_ administration and have great performances but the community is
fragmented and tooling is not there yet.

I think embedded KV stores have still a chance in niche applications for
building expert systems that require persistence but don't change a lot. For a
web application which specifications (and schema) change all the time, it's a
poor solution.

edit: also, in the context of e.g. wiredtiger having transactions across
tables like the full-text search index and plain tables is very great and save
quiet a few painful experience of setting up the project.

------
ageitgey
I'm all for choosing the best tool for the job (even if it's against
conventional wisdom), but I don't really follow the arguments made here.

He talks about how he launched a "high traffic" Aliexpress-style marketplace
using an embedded K/V store.

Is the marketplace website running off of a single machine with no failover
plan when the host machine dies? Is there no need to generate sales reports
from external applications that want to pull data from the same database? Is
having no fixed schema really an advantage when dealing with orders and
financial data? Are saving serialization time and "tcp transport costs" by
keeping the app and data on the same machine a big advantage given that
everything is a huge single point of failure?

I'm not trying to be too critical. If you want to launch an app very quickly
and only have the budget for one machine to ever run it on and you are the
only consumer of the generated data, I could see how this would be an
appealing approach. But it also dismisses the important reasons most services
aren't designed and deployed this way (redundancy, sharing data with other
applications, financial data integrity, extensibility, etc).

------
cryptonector
My take is and has been for a long time that the right solution is something
like a key/value store on the backend + a SQLite4-style frontend, with a
Lustre-style lock manage co-located with the k/v backend, and key/value pairs
distributed by key ranges. This is... a lot like Spanner I think, though I
don't know if Spanner uses a DLM like Lustre or if it does something else for
synchronization.

This way you get the scalability benefits of a key/value store with all the
benefits of a SQL, and you get always-consistent, and even ACID semantics
(depending on how many failures there are at once and how many you design to).

EDIT: Oh, and check out
[https://github.com/rqlite/rqlite/](https://github.com/rqlite/rqlite/) .

~~~
amirouche
What is Lustre?

