
ActorDB: A distributed SQL database with the scalability of a KV store - ddorian43
http://www.actordb.com/
======
angrybits
So yet another database (for certain definitions of database) built for the
tiny fraction of applications that will:

a) warrant this kind of complexity b) be able to give up ad-hoc queries

So I guess if I ever need to build a highly scalable, fault tolerant,
distributed todo list.. I'll look into it. Not sure how I will aggregate data
to run the dashboards that my MBAs will be up my butt about though. Guess we
can cover that detail later?

~~~
biokoda
a) ActorDB is not actually all that complex.

b) ad-hoc queries tend to be the first thing out the window once you hit a
certain size (query load or dataset size). ActorDB arguably gives you the
largest amount of query flexibility out of the new breed of databases, while
still being strongly consistent and persistent (not memory only).

~~~
angrybits
The ways in which it can (and will) fail are monumentally complex. Everything
looks simple when it's working.

------
bra-ket
How do you run a JOIN across a bunch of actors

~~~
biokoda
You can run queries across actors (even acid transactions), but not actual
joins. This is the tradeoff to make it distributed.

~~~
AlexEatsKittens
It's a relational database without joins? Can you join within an actor?

~~~
ddorian43
1 actor is 1 sqlite db so yes

------
lmm
So can I create one actor for each of my clients, and add/remove machines to
my cluster and have them automatically balance across it? And I can do normal
SQL queries including joins as long as I never need to join across several
clients?

If so, this might be the first one of these new databases that actually helps
with my use case. (Though I'll wait to see what jepsen makes of it first).

------
jedberg
I don't intend this to put down the creators of this product, but Postgres is
a SQL database with the scalability of KV store if used in a way that makes it
scalable like a KV store. :)

It's a great idea, but I think the "mission statement" needs to be restated.

~~~
sbt
I also don't mean to put down the creators of this product, it probably has
its use cases. But every time I see something like "SQL with the scalability
of KV" (which happens a lot these days), it's like someone promising a
perpetual energy source. In other words, there is usually some kind of catch
involved.

In this case it's that JOINs are still expensive. Well, this is actually a
pretty big deal, because JOINs are typically the reason I settle for _either_
a NoSQL or a SQL store.

Again, I'm sure this has other merits that may distinguish itself from the
rest of everyone and their grandma building a database these days.

~~~
biokoda
Only JOINs across actors are sacrificed. Most problem sets fit quite well into
the actor model and you can keep JOINs where it matters most (inside an
actor).

------
nemothekid
I've been meaning to try ActorDB out for a while now to see how well it
performs on large data sets. IIRC, the basic premise is that every actor is a
sqlite instance which causes some interesting limitations that might even have
more interesting workarounds.

~~~
jamiesonbecker
Is it in fact sqlite? Is it erlang? What kind of network is required between
nodes? Can I encrypt it via e.g. autossh on a particular port on cloud
instances outside of an Amazon-style VPC?

~~~
biokoda
encryption of communication is high on the feature priority list. It is erlang
and sqlite. Why sqlite and more on how it works is explained in the
documentation. Network should be within the same datacenter.

------
rpedela
Another question. How is this different or better than having a set of
relational databases with a load balancer between them?

~~~
biokoda
Because it's a truly distributed SQL database. I presume you mean having a
master-master setup? ActorDB does not limit you to the capabilities of a
single machine. Master-master still means your dataset and query load needs to
fit into 1 or 2 servers.

~~~
rpedela
"Decoupled data. You can't run arbitrary queries over your data. But you can
still have a full relational database within actors. So you need to organize
and split your data model into actors. Read the query model and examples
sections to understand how."

When I saw that in the README, it said to me that I can have a distributed SQL
database as long as I did not need JOIN or REFERENCES which are commonly used
features of a relational database. Am I incorrect?

~~~
biokoda
You can have JOINs, but within individual actors. So for instance if this
entire conversation tree was a single actor, you can have a full relational
model to make queries with. But you can't do joins on comments over multiple
threads.

This is the sacrifice ActorDB makes to be distributed.

~~~
rpedela
Yeah so I see two use cases for this:

1\. A NoSQL database but with SQL as the API and some support for relations.
This is quite nice because I have often found NoSQL APIs to be lacking in many
aspects.

2\. A set of fully relational databases that need to be managed and/or load
balanced. This is a use case I have. If the system administration is
significantly easier than the tools natively provided by Postgres, MySQL, etc,
then I can see this being a valid use case. The use of SQLite does not concern
me since it has legitimate downsides [1][2].

[1] [http://www.sqlite.org/omitted.html](http://www.sqlite.org/omitted.html)

[2]
[http://www.sqlite.org/whentouse.html](http://www.sqlite.org/whentouse.html)

------
andypants
I'm a bit confused about the concept of actors. Should I create an actor type
for each table, and an actor for each row? They seem to be equivalent in some
of the examples.

~~~
biokoda
It very much depends on your use case. But no you do not split by tables or
rows generally. You split your problem into actor types.

For instance if you were to create your own news.ycombinator.com you would
require 3 types of actors: frontpage, thread, user

frontpage:

\- CREATE TABLE threads (storyid INTEGER PRIMARY KEY, author INTEGER, title
TEXT, url TEXT)

thread:

\- CREATE TABLE comments (id INTEGER AUTO INCREMENT, parent INTEGER, user
INTEGER, txt TEXT)

user:

\- CREATE TABLE info (id INTEGER PRIMARY KEY, email TEXT, password TEXT)

\- CREATE TABLE posts (id INTEGER PRIMARY KEY, threadid INTEGER)

There would be many thread and user actors, but only 1 frontpage actor.

------
Xdes

        >Use case: reliable distributed counters
    

I am not understanding the example provided for this use case. In my
implementation of this use case I use Redis back by an RDBMS. Redis provides a
cache and a distributed counter while the RDBMS engine handles all the OLTP
and OLAP.

I currently have a job that takes snapshots of the real time Redis counter at
given intervals and inserts them into my analytics table.

How would the ActorDB way simplify or improve this model?

~~~
biokoda
Author here. With actordb you get:

\- no single point of failure (redis and likely the rdbms as well)

\- no need to maintain two very different systems

\- scalable. As in plug in a new cluster and it will proportionally increase
capacity.

We are working on more detailed use case examples. As well as a pretty big
bugfix release due out tomorrow or the day after.

~~~
Xdes

        >no single point of failure (redis and likely the rdbms as well)
    

Your README mentions replication in the Operational characteristics, but it
does not cover partition tolerance?

\- Is there one master per cluster?

\- What happens when the network becomes segmented?

It is also unclear of how this affects multiple clusters.

\- How is a global consensus of the data reached?

\- What happens when the network becomes segmented between clusters?

    
    
        >no need to maintain two very different systems
    

SQLite isn't a full featured RDBMS. It mainly lacks support for stored
procedures and concurrency. It is a very nice solution for small, self
contained, horizontally scalable usage scenarios. However, once the use case
involves generating custom globally unique identifiers (like a 7 character
alphanumeric string) it generally falls apart.

    
    
        >scalable. As in plug in a new cluster and it will proportionally increase capacity.
    

Redis and RDBMS have well known use cases for replication and partition
management. Redis itself is single threaded, so it is horizontally scalable by
your budget (and likewise supports sharding and replication albeit the network
segmentation can lead to issues). More full featured RDBMS like Postgres,
MariaDB, Oracle, and SQL Server support concurrency and granular locking with
similar scaling strategies.

I do not see the value proposition in consolidating the infrastructure in this
case.

~~~
biokoda
There is one master per actor. An actor lives within a cluster. As long as a
majority of configured servers are accessible, writes will succeed.

All queries (reads and writes) go through the master and are not committed if
master does not reach a majority of configured servers in the cluster.

Individual actors are not meant to be scalable. They are meant to be fast
enough. For instance if you were to create your own news.ycombinator.com an
actor would be this entire conversation tree. Sqlite would be sufficient. You
don't need concurrency on a per thread level.

ActorDB provides a global uniqueid generation feature (independent of sqlite).

------
m_mueller
Very interesting, I like the idea of a distributed sqlite. Does it support
long term offline replicas the way couchdb does? If yes, do you have mobile
support on the roadmap? I think this should be the natural succession as I
find the choices for offline replication on mobile devices still very limited,
especially cross platform ones that fully integrate with a RDBMS landscape.

------
cordite
Although wire-compatible, it does not look like it can be a drop-in
replacement, since the query model needs you to explicitly define actors.

[http://www.actordb.com/docs-querymodel.html](http://www.actordb.com/docs-
querymodel.html)

~~~
biokoda
Yes that is correct. Not a drop-in replacement. You can't go from a
traditional RDBMS and magically turn it into a distributed database. Some
constraints need to be loosened. ActorDB tries to loose as little as possible
to turn it into a distributed database and still have a rich query model
(without being memory only or just key/value).

------
stephenhuey
Powerful red on your website! Just wanted to mention the white text looks
great in Safari, but really faint in Chrome.

~~~
michaelmior
Same here. I can barely read the text in Chrome.

~~~
jamiesonbecker
Only verdana is specified as a font-family for most of the white text on red.
This is probably sub-optimal, consider Verdana, Helvetica, sans-serif;. (Also,
font-weight: bold would help on the smaller text.)

------
rpedela
What was the specific reason for using sqlite? Why not Postgres or MariaDB or
whatever?

~~~
biokoda
Because sqlite is an embeddable isolated database. Each individual actor is
completely independent. Sqlite naturally fits into this use case.

Using postgres or mariadb would be very problematic or even impossible,
because they are not designed for this kind of use. ActorDB needs to be able
to move actors to new clusters when they are added, while still execute
queries on them at the same time.

------
asdasf
Really a shame they used the mysql protocol. The mysql command line client is
terrible compared to psql, and the mysql lib is much less pleasant to use than
libpq.

~~~
AaronBBrown
Can you give an example of where mysql (the client) is terrible compared to
psql (the client)? I've had the opposite experience.

~~~
cdavid
for once, you get proper reverse search, which is not available in mysql (last
time I checked, which has now been a while TBH).

~~~
AaronBBrown
Works for me. I use this feature nearly all day.

[http://showterm.io/03de0c8a4992dc7008ef2](http://showterm.io/03de0c8a4992dc7008ef2)

