
ActorDB – Distributed SQL database - iamd3vil
https://github.com/biokoda/actordb
======
atombender
ActorDB is a neat project. Simplified, it's a distributed SQLite where
consistency is guaranteed by the Raft protocol.

However, it has a unique and initially confusing data model where the database
is divided into "actors", which are self-contained shards. For example, a
database modeled on Hacker News would probably have an actor per user and an
actor per story, and probably an actor per thread.

Every actor acts like a self-contained database, with its own set of tables.
When you want to query or update data, you first tell it which actor(s) to
operate on; but unlike systems like Cassandra, the sharding is explicit, in
that the shards have identifiers, and there's no automatic sharding function.
Indeed, you can have actors that act like a basic database, e.g. a global,
shared list of lookup values could be a single actor.

You have ACID transactions within actors, and you can also do transactions
across actors, and queries can also span multiple actors. You can't do joins
across actors, as far as I recall. Schema migrations also become interesting,
since each actor has its own, entirely separate schema.

~~~
dnomad
> You have ACID transactions within actors, and you can also do transactions
> across actors

I gotta say... this project strikes me as very strange. If you're going to go
through all the trouble of sharding your data across a set of HA actors... why
not also shard behavior? Co-locate data and behavior! Isn't that a key driver
Actor model? Then you wouldn't communicate with your actors using a very
limited language like SQL you could send them real domain-specific messages.
Most importantly, when it comes to transactions across actors... you don't
need it. All you need is guaranteed message delivery to the various actor
mailboxes and all that complexity melts away. You need this _anyways_ if you
want your actors to be able to actually collaborate and send each other
messages. This whole project seems to be designed to _eliminate_ many of the
benefits of the actor model...

(I don't like the all-too-common HN cynicism but I also worry when I see stuff
like this. Either I'm crazy and missing something obvious... or everybody else
is crazy. But then much of what comes out of the Erlang space is... strange to
me. That culture seems to have a very unique idea of distributed computing.)

~~~
atombender
I think you have to think of the "actors" in ActorDB as shards, not as actors
in the "actor model" sense. The nomenclature here is unfortunate, because
actors themselves are sharded.

This is a database that permits any application to read and write data without
coupling it to a specific language or platform, e.g. Erlang. You just write
SQL. People know SQL. The difference is that your app must be shard-aware,
which can be an acceptable compromise for apps that wants to scale far. The
sharding also gives you some measure of enforced encapsulation, since you
can't have foreign keys (that I know) across actors.

This project might be particularly suitable for SaaS-type multitenant systems
that need to have a clear boundary between each tenant.

~~~
bobwaycott
Admittedly, I read through all the docs in one rapid sitting, but I thought I
came across mention of foreign keys across actors being supported. I’ll have
to double-check that, though.

------
thanatos_dem
Digging through the docs, I can't find any actual information of the
consistency and isolation guarantees other than "consistent" and "ACID".
That's an immediate red flag for me. What are the actual isolation
characteristics of the database?

Does it use snapshot isolation? Is it serializable? Is it linerizable? With
all the great work Kyle Kingsbury (aka Aphyr) has done on the Jepsen tests,
it's pretty clear that claiming to be "ACID" with no additional info isn't
sufficient for a modern database.

~~~
thelastidiot
Most docs for technical projects are written that way these days. The author
of an api, project or solution will assume an unquestioned adoption "because
it's free" from the randome audience with no specifics on how they are solving
and more importantly what's different from other cases. I don't know if this
is by laziness and writing procrastination, but assuming that your users will
know as much as you do on your journey to assemble a solution has become a
frustration when I discover new projects like this one.

~~~
RhodesianHunter
Likely has a lot more to do with the fact that the intersection between the
population that can create such projects and the population that can write
clear effective documentation is exceedingly small.

~~~
bonesss
I would also point to Paul Grahams insight around hackers and painters[0]: the
population who wants to create such projects is fantasizing about _coding_ ,
_hacking_ , publishing, bug squashing, and launches. Documentation isn't
necessarily pulling them to their PC after a heavy dinner.

-

[0] [http://www.paulgraham.com/hp.html](http://www.paulgraham.com/hp.html)

------
malisper
ActorDB sounds like an interesting piece of software. It looks like it's good
for simple applications that have large amounts of data. Some thoughts I have
based on the docs:

ActorDB has no concurrency at the actor level. This makes it a poor fit for
applications that have lots of concurrency around the same pieces of data. A
long running read or write on a single actor will lock out any other reads or
writes.

Likewise, distributed transactions lock all actors involved in the transaction
until the transaction completes.

It seems like reads have to go through a round of Raft. This increases latency
for reads. It also decreases throughput, although I'm not sure how big of a
deal that is given the lack of concurrency.

It's unclear to me how ActorDB guarantees serializability for multi-actor
transactions. You need some way to guarantee two multi-actor transactions will
execute in the same order on every actor. Based on the docs, Raft is performed
at the actor level and not across multiple actors. ActorDB does use two-phase
commit to guarantee atomicity across multiple actors, but there's no
description of how it handles serializability.

Based on my reading, ActorDB is good if you have lots of data and your queries
have either low concurrency and you don't require high throughput. If you have
high concurrency or require high throughput, my guess is ActorDB will be a
poor fit.

------
gregwebs
They have an introductory blog post that explains the database fairly well
(summary: it was designed for a shard per user):
[http://blog.biokoda.com/post/112206754025/why-we-built-
actor...](http://blog.biokoda.com/post/112206754025/why-we-built-actordb)

If you are looking at distributed SQLite solutions there is also rqlite/dqlite
and bedrock.

------
adamluzsi
I used to work with Pivotal's Greenplum, which is also a distributed db, and I
quiet liked it. Basically a postgresql with syntax sugar on partition , and
distribution across servers. I had the pleasure to never need any index in the
database.

This sounds to be an interesting project as well.

The question to me always about "how this will makes the project that use it
have little learning curve for the new recruits, easy to understand
integration in the code level and low maintenance on the long run"

------
codeadict
Great to see more databases written in Erlang.

~~~
alecco
Why would Erlang be good for SQL/relational databases?

~~~
giancarlostoro
The core strengths of Erlang: Erlang has built-in support for concurrency,
distribution and fault tolerance. There's many languages I enjoy, Erlang I've
never learned (syntax and nobody to pay me to work with it) it fully to it's
potential. I would also love to see a full blown SQL relational database in
Rust and D, and maybe a NoSQL one as well.

See:

[http://erlang.org/faq/introduction.html](http://erlang.org/faq/introduction.html)

~~~
antoaravinth
I don't quite get this, you are saying Erlang is fault tolerant? I'm confused
here because fault tolerance is something that the application using a
particular language needs to make sure right? Sorry if I mistaken your
statement.

~~~
nickpsecurity
ferd did a great write-up on how Erlang approaches reliability:

[https://ferd.ca/the-zen-of-erlang.html](https://ferd.ca/the-zen-of-
erlang.html)

It's more about assuming things will fail by default with good ways of
handling it built-in at the language level. A lot of people also find its
inventor's thesis enlightening. Here it is:

[http://ftp.nsysu.edu.tw/FreeBSD/ports/distfiles/erlang/armst...](http://ftp.nsysu.edu.tw/FreeBSD/ports/distfiles/erlang/armstrong_thesis_2003.pdf)

And for those concerned, there's also at least one project to write the native
functions in a safer language to reduce _their_ bugs a bit:

[https://github.com/hansihe/Rustler](https://github.com/hansihe/Rustler)

~~~
bonesss
A lot of these concepts have been carried forward to some popular frameworks
and languages. Akka (or Akka.Net & Akkling), is a great example. Railway
Oriented Programming
([https://fsharpforfunandprofit.com/rop/](https://fsharpforfunandprofit.com/rop/)),
has gained some traction in the .net sphere and, along with F#'s _Result
<'success, 'failure>_ type, emphasizes structured and exception-less error
handling.

It may be a poor mans Erlang, but better than nothing ;)

------
01walid
How does this compare to CockroachDB?

~~~
atombender
Apples to oranges, really. Cockroach presents a single database view that
happens to be sharded and replicated in a fault-tolerant, consistent way.
ActorDB doesn't.

With ActorDB you have to design your data model to shard at a high level. For
example, you could shard it by user; every user would have its own database
that ActorDB will shard and replicate for you. That database is for the most
part separate from everything else -- it has its own tables and indexes.

ActorDB provides tools to operate on multiple actors, but they're explicit.
For example, you can query across multiple actors, but this requires a small
SQL declaration at the beginning of the query to select which actors to query,
a bit like a "for each <actors> do <some SQL>".

You can also do transactions across multiple actors, though this uses two-
phase commit (which coincidentally is the strategy used by Google Spanner),
and requires some locking.

So Cockroach pretends to be a classic RDBMS (databases have tables and
indexes, but most apps just use a single database per app), allowing an
existing app to be ported with little effort. It would be harder to port an
app to ActorDB.

------
mappu
Another popular SQLite + Raft project is rqlite (
[https://github.com/rqlite/rqlite](https://github.com/rqlite/rqlite) ), but
the goals seem slightly different (rqlite only uses distribution for fault-
tolerance, not for sharding).

------
sessy
Any idea if the creators will offer this "as-a-service" in the near future.
How does this compare to Google's Spanner?

~~~
davewritescode
Spanner seems to solve a different problem. It’s effetively a multi-master
database where the leader election algorithm can be tuned to ensure that your
masters are geographically close to where writes/reads are actually happening.

Many databases are “partitioned” by user anyway so in this case the DB can be
smarter if it doesn’t have to handle cross partition queries.

Seems like the idea of partitioning a MySQL database taken to the next level.

------
pknopf
What part of CAP is being sacrified here? I'm not seeing it.

~~~
biokoda
It's raft; thus firmly CP.

~~~
sriram_malhar
That does not always follow. A db designer still has the flexibility to give a
client less than serializable/linearizable consistency if the client so
wishes.

For example, it is possible for the db to allow direct access to any replica,
not just the leader, and if the replica is not part of the quorum, the client
will see outdated information. This may be just fine for data that seldom
changes, or where a certain amount of staleness is just fine.

~~~
biokoda
From that point of view yes you are right. There is a ton of nuance in CAP.
Usually in these discussions the CAP question is if it is eventually
consistent or not.

~~~
thanatos_dem
That's the problem... There's _not_ a ton of nuance in CAP. CAP has a formal
definition and a system is provably at most CP or AP. Other terms that people
often associate with CAP is where the nuance comes in. The reality of the
matter is that systems _claim_ to be CP or AP, but most have assumptions or
bugs that violate it, so in the end, most systems are _just_ P.

Stale reads technically violate the definition of Consistent in CAP. Using
raft means unavailability of some subset of nodes can make the entire system
unavailable, violating the definition of Availability in CAP.

So the parent comment's suggestion of serving reads from any replica would
result in _only_ a partition tolerant system, without any formal availability
or consistency guarantees.

------
thomasfedb
An 'explicitly sharded' SQL database - effectively as many databases as you
need with some special sauce to let you run transactions across multiple
shards. Neat.

~~~
hestefisk
It’s like having your cake and eating it too.

~~~
davewritescode
I suspect that running transactions over multiple partitions would incur more
coordination overhead and probably won’t perform well.

~~~
StavrosK
Probably, but what's the alternative?

~~~
verelo
I don't get the feeling they had an alternative (I don't either!), i suspect
it was just commentary.

~~~
StavrosK
Yeah, it must be. I don't think there's a way to generically perform well on
distributed queries, so this is probably as good as it gets.

------
crudbug
I think the data model is similar to virtual actors [0] which have persistent
state. There is also CLR and JVM specific implementations of this model.

[0] [https://www.microsoft.com/en-us/research/project/orleans-
vir...](https://www.microsoft.com/en-us/research/project/orleans-virtual-
actors/)

------
vincentmarle
> Think of running a large mail service, dropbox, evernote, etc. They all
> require server side storage for user data, but the vast majority of queries
> is within a specific user.

I have this exact use case for a new project I’m working on so I’m fine with
the 1-db-per-user approach. ActorDB definitely sounds very interesting, but
are there any alternatives to compare it with?

~~~
antoncohen
Vitess ([https://vitess.io/](https://vitess.io/)) is the closest alternative I
know of. Vitess came from YouTube, and is used at some other sizable
companies. They have a similar model of sharding, they both support the MySQL
wire protocol and a binary RPC protocol (gRPC for Vitess, Thrift for ActorDB).

------
coleifer
Anybody using this in production? What particular problem were you solving
with actordb?

------
gigatexal
The readme sounds like it promises the end to all databases. I’d like to play
with it.

~~~
AtlasBarfed
CAP guarantees there isn't "one database to end them all"

You pick your poison.

Glossing over / ignoring / implying they solved CAP is very typical for
database marketing.

------
tybit
How do schema upgrades work? It looks like a specific actor type is tied to a
specific schema but I assume the schema upgrades must be eventually consistent
unless they are using 2 phase commit across all actors of the type?

~~~
biokoda
Schemas are per actor type and you can have many types. Actors are active or
inactive. When you run a query against one, when it becomes active it will
check if schema has changed and apply that before processing any query.

------
aisofteng
SQLite with Raft. Nice project.

------
EGreg
ActorDB might be great for distributed systems. How does it compare to
CocktoachDB? The latter does transaction planning across shards?

None of these DBs seem to be byzantine fault tolerant, though. Any examples of
BFT databases?

~~~
zzzcpan
> None of these DBs seem to be byzantine fault tolerant, though. Any examples
> of BFT databases?

Not sure what you mean. Obviously we can't have blockchain level of byzantine
fault tolerance against malicious nodes. Only against byzantine failures. But
then this is pretty much what CAP answers, where AP systems can handle
byzantine failures, because they don't need consensus, while CP can't, because
they do. Quorum logic is in the middle here, where you can handle some
byzantine failures and have some consistency.

------
gigatexal
Building on SQLite means no windowing functions which is a deal killer for me.

------
alecco
> with the scalability of a KV store, while keeping the query capabilities of
> a relational database.

So not doing good query performance for OLAP/DS. And "core" DB is in Erlang.

~~~
atombender
The core database is SQLite.

~~~
alecco
I love SQLite. But it is not good for OLAP or TCP-DS kind of workloads.

------
nine_k
I wonder how hard would it be to replace SQLite with Postgres or Maria. It
would add interesting in-actor query capabilities.

------
polskibus
I wonder what are the benefits in comparison to using Akka Persistence with
whatever database you want?

~~~
shuoli84
Then you locked in Akka?

~~~
haggy
The author of this project is locked into Raft. What's the difference?

~~~
shuoli84
They are different layers of abstraction.

Storage and computing are two layers in a normal application. You are free to
switch computing layers as long as storage layer speaking some standard
protocol.

AKKA + AKKA persistent, how can you switch the computing layer?

------
jacksmith21006
Another option for scaling SQL is the Vitess OSS. Might give it a look. Uses a
similar pattern of using an actor up front and then scaling MySQL on the back-
end.

[https://vitess.io/overview/](https://vitess.io/overview/)

Would like a similar solution but with Postgres on the back end instead.

------
yeukhon
Yet another distributed database. How many more do we need? _Time to just hide
in the cave again._ ¯\\_(ツ)_/¯

