
Usefulness of mnesia, the Erlang built-in database - motiejus
http://erlang.org/pipermail/erlang-questions/2015-October/086429.html
======
jerf
It is what it is. It's useful to prototype with in Erlang. It may be useful to
ship with. If Mnesia turns out not to fit your problem, here in 2015, you've
got literally _dozens_ of choices of alternate DB, with all sorts of
consistency and distributibility and performance characteristics.

My guess is that if somehow Erlang was where it was in 2015 except it didn't
have Mnesia, nobody would really perceive much of a hole there, and nobody
would write it, because of the database explosion we've seen in the past 10
years. But it is there, and if it works for you, go for it.

My only slight suggestion is that rather than inlining all your mnesia calls,
you ought to isolate them into a separate module or modules or something with
an interface. But, that's not really because of Mnesia... I tend to recommend
that anyhow. In pattern terms, I pretty much _always_ recommend wrapping a
Facade around your data store access, if for no other reason than how easy it
makes your testing if you can drop in alternate implementations. And then, if
mnesia... no, wait... And then, if $DATABASE turns out to be unsuitable,
you're not stuck up a creek without a paddle. With this approach it's not even
all that hard to try out multiple alternatives before settling on something.

~~~
seiji
mnesia is great in isolation with bounded datasets. erlang is great for
operational observability of both the data flow and your application. they go
great together, sometimes, if you know all your requirements up front.

It's always good to keep in mind erlang+mensia were designed to run in a
"network" consisting of a unified blade chassis (each blade is a new node, and
the network is a physical backplane). So, things like network partitions
around erlang distribution and remote mensia tables don't have easy/proper
error recovery strategies.

Even with unbounded datasets, mensia is great if you devote an engineering
team to managing the scaling of it because amnesia can't do simple things like
move a DB from one node to another without semi-advanced erlang+mnesia
knowledge. But, whatsapp was (is?) 100% mensia last I recall, even though they
had to reengineer some of it: [http://www.erlang-
factory.com/upload/presentations/558/efsf2...](http://www.erlang-
factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf)

But, mensia still relies on dets and dets still, in 2015, has a 2 GB max file
size. If your data grows beyond 2 GB, you have to do mnesia fragmentation
which is just an operational burden.

My scaling thoughts tend towards: sqlite -> postgres -> riak

~~~
toast0
If you can fit all your data in ram, you can use disk_copies, which doesn't
use sets. schema is always disk_only_copies, but if it gets to be 2gb, wow!

Edit to add: Adding nodes isn't that hard? Connect to dist, add to extra
copies of schema table, then the tables that you want. More complex if you're
trying to merge together different sets of tables into one schema though.

~~~
seiji
_Edit to add: Adding nodes isn 't that hard?_

Adding nodes isn't hard, but try restoring a mnesia table from service@node1
to service@node2. You have to do something like
[http://stackoverflow.com/questions/463400/how-to-rename-
the-...](http://stackoverflow.com/questions/463400/how-to-rename-the-node-
running-a-mnesia-database)

It's not a great solution for modern AWS-style API-driven-deployment
operational models.

~~~
toast0
Oh yeah... I would handle that by having node1 and node2 running at the same
time, add copies to node2, delete copies from node1, and done; but that
requires both nodes to be able to run at the time (which sounds like it wasn't
really an option in the stack overflow post)

------
mononcqc
I'm the Fred mentioned in the quoted text. As mentioned by FLGMwt, Mnesia
being a 90s database was an offhand remark done at a workshop in a discussion
doing a break.

The reasons I considered it a DB of the 90s is that back then, it could have
been state of the art, but by today's standards, under its current form, it
makes sense to be used mostly on fixed cluster sizes with reliable networks
and a fairly stable topology.

Any fancier cases and you start requiring to dive into the internals when it
comes to coming back up from failures, partitions, requiring repairs, and so
on. The DB has 3 standard backends: all in RAM, all on Disk (with a 2GB
limit), or as a log-copy (infinite disk size, but also bound by memory size).

That ends up leaving you with a DB that has a need for its whole dataset to
fit in memory, supports distributed transactions but can't deal with network
failures well out of the box (you need something like
[https://github.com/uwiger/unsplit](https://github.com/uwiger/unsplit))

Mnesia gaining new backends (Klarna is currently open-sourcing code for an
experimental postgres backend and are using a leveldb one) would fix a lot of
issues as a single-node DB, but another overhaul would be required for the
rest.

The problem I see is that it was a very cool database back then, but it
started lagging behind for a long while and now it has to play a catch-up
game. Its model and Erlang interface is still extremely nice and I wish it
made more sense to use in production without committing to learning its
internals in case of troubles.

------
daleharvey
This has little to do with databases, erlang or mnesia, its just a moan
against people writing ad tech.

mnesia is a database for the 90's because it was written by smart people in
the 80's and like most of the rest of the otp stack was fairly under used or
maintained.

I have a huge amount of respect for Klacke and the original authors behind a
lot of this tech, however the erlang community that followed seems to suffer
some cognitive dissonance around what problems it solves and how well they are
doing them. It would be hard to pick a database less suitable for SMB use than
a domain specific database in a niche ecosystem.

------
i_feel_great
I have much the same sentiment with SQLite. Much dismissed as a toy database,
but absolutely appropriate for 99% of my clients - small and medium
businesses, the same as mentioned in thread.

~~~
TeMPOraL
SQLite is amazing. It should definitely get more popular - it gives you
relational DB features while still storing everything in a file. Which means
portability, performance, and no bloating the entire OS with another database
driver / server.

~~~
davidw
It's extremely popular: it's on pretty much every smartphone, IIRC.

~~~
bosky101
apple's native graph database called "core data" is also built on sqlite.

mnesia lets you decide on a per table basis - where should be stored:

    
    
        1) ram only
        2) disk only
        3) disk copies ( cached, and persisted stored to disk)
    

as well the option of how it should be accessed in a cluster

    
    
        1) replicated (to a set of nodes you want)
        2) or location transparent ( accessible from any node you want )
    

i doubt any dbms gives you this kind of distributed systems friendly
primitives.

my only peeve against mnesia is corruption. if you change these nodes, or one
of them are down, or you call it with the wrong list of nodes ( eg: you may
now need a distribtued store like zk just to get this list of nodes right),
and many different things can go wrong which is why most folks use it for
idempotent store. something that can be recreated if needed, and then used to
great effect - but never as "the" persistent store. additionally there are
limits on the size of these tables, and workarounds.

once bootstrapped - it works like a dream. mnesia is still my go to dbs for a
distributed cache or router for actors.

~B

------
twsted
Many valuable considerations inside, read the post. Starting with this thought
in the question: 'I hate transiting syntactic boundaries when I'm
programming'.

But the answer is such a broader evaluation of the utility of the tools we are
using, related to what we use those in.

And some rants that I share: "but really boil down to adtech, adtech, adtech,
adtech, and some more adtech, and marketing campaigns about campaigns about
adtech."

------
lectrick
Erlang (and by association Elixir) tooling has a nice progressive approach to
managing state.

Agent -> ets -> dets -> mnesia -> riak (or sql tooling etc.)

(Agent [http://elixir-lang.org/docs/v1.1/elixir/Agent.html](http://elixir-
lang.org/docs/v1.1/elixir/Agent.html) is just a state-holding process. Erlang
folks can probably write one of these in their sleep, Elixir added a bit of
wrapping-paper around it.)

If you're writing an app, I think it's best to be storage-agnostic from the
get-go. You shouldn't be building up queries in your core app code- push it to
the edge of your code, because otherwise it's not separating concerns. All
your app (business logic) code should delegate to some wrapper to work out the
specifics of retrieving the data; your app code should just be calling
something like
Modelname.specific_function_returning_specific_dataset(relevant_identifier)
and let that work out the details. That way, if you ever upgrade your store,
you just have to refactor those queries but your app code remains the same. On
top of that, in your unit tests you can pass in a mimicking test double for
your store to do a true unit test, and avoid retesting your store over and
over again wastefully. (You'd still of course have an integration test to
cover that, but it wouldn't be doing it _on every test_.)

------
FLGMwt
Not that this hasn't generated good talking points, but I was there at the
workshop the OP mentioned and Fred's remark was very much said in passing to a
small group of people during a break and he didn't seem to be making an
intentionally negative remark. He certainly wasn't stating it as instructional
fact.

------
eddd
Well, you could summarise this article by saying: "Just because something got
invented 25 years ago it doesn't mean it is useless". On the contrary - it is
worth taking a look on technology that survived 25 years in the wildness.

------
zaphar
The article is light on actual details and heavy on rants about the current
state of products built on the web.

However there is 1 thing that mnesia got absolutely and totally right.
Database schema upgrades. You can create an mnesia database and upgrade it's
schema on the fly as a part of it's operation without once bringing it down or
running a script. I did this[0] for a toy project I did in erlang once that I
unfortunately never finished since the need for it disappeared.

[0]:
[https://github.com/zaphar/iterate/blob/master/src/db_migrate...](https://github.com/zaphar/iterate/blob/master/src/db_migrate_tools.erl)

------
ninjakeyboard
What? Mnesia is not fully-webscale graph-db so it's not useful in the 2010s??
Didn't get much from this piece.

------
davidw
By the same argument, though, why not just use Postgres? And I write that as a
fan of Erlang. Indeed:
[https://github.com/epgsql/epgsql](https://github.com/epgsql/epgsql)

------
lucozade
I can't say I found the piece particularly insightful.

It seemed to imply that mnesia is the DB of the future as soon as everyone
realises that everything they are doing is completely wrong and they should be
doing things that are more suited to mnesia. Without saying what those things
are.

I actually found one of the child comments [1] was pushing in a better
direction. Essentially, the vast drop in $/TB of storage means that
persistence of time series/ event type data is practical for the masses now.
Sure it's found a niche in ads on the web, but it has much wider applicability
than that. I personally think that Erlang is particularly well suited to this
space.

[1] [http://erlang.org/pipermail/erlang-
questions/2015-October/08...](http://erlang.org/pipermail/erlang-
questions/2015-October/086432.html)

~~~
jacquesm
The reason why the comment is insightful is that suddenly Erlang has found
itself in the crosshairs of an industry that it is particularly useful for
(ad-tech) but there are _many_ more applications that have nothing to do with
adtech which erlang is _also_ very well suited for. So for adtech to start
pushing the direction in which erlang/mnesia evolve would be wrong.

As for the drop in $/TB of storage, that drop has been steadily going on since
the 70's of the previous century and for a very large chunk of that period
Erlang/mnesia have been adapting bit by bit to take advantage of that price-
decrease.

Persistence of time series and event type data does not (usually) require the
kind of storage solution that mnesia offers, a much simpler storage medium
would probably suffice and/or be a better choice for that application anyway,
storing such data without further processing in a relational database is a bit
of a cop-out.

Most of those applications could benefit from digesting and compression which
is something you're not going to easily retro-fit onto an existing database.

~~~
lucozade
> Persistence of time series and event type data does not (usually) require
> the kind of storage solution that mnesia offers

Indeed. That's precisely my point (just stated better than me). Erlang looks
like being a very good fit for a large class of applications of this form.
mnesia, on the other hand, is not well suited as the persistent storage for
them. It would follow then that it could make sense for the "next gen" storage
for Erlang to be more in line with this sort of thinking in the way you
describe.

As to the drop in $/TB, it's been pretty steadily exponential. But the cost of
consuming and analysing TB of data has only really been practical for non-
specialist houses for 5-10 years. When you're in the low GB range then a
static data picture makes sense. When you're able to play with TB of data then
a dynamic picture makes sense and Erlang really shines.

------
jacquesm
> This is much more interesting than chasing click statistics in the interest
> of brokering ad sales at the speed of light so that users can continue to
> ignore them.

That comment really packs a punch and should get much wider visibility. Ad
tech and related software is where _way_ too much of our collective efforts
are going.

~~~
creshal
Because it's one of the most important sources of funding for tech companies
(justified or not).

~~~
njharman
Not really. There is a niche of social companies (twitter,facebook,google
search engine) that make money from ads. There is a niche of media companies
(youtube) that make money from ads. There is a nice of startups which make
money from Funding/IPOs. These niches are a focus of HN and get talked about
way out of proportion of their representational percentage.

Most most tech companies make products and get funding from sales (netflix,
amazon, apple, ibm, microsoft, 99% of the Enterprise and 99% of small-medium
market).

