
Now that people are considering NOSQL will more people consider no-DB - fogus
http://martinfowler.com/bliki/MemoryImage.html
======
dgreensp
After reading the article and all the comments here, and from my own
experience, I just don't think it's possible to not have a DB. At best, you
write your own basic DB, because you don't need anything fancy.

For example, you write S-expressions to files like Hacker News does. This is
clever, because the file system has some of the features of a database system,
and files and S-expressions are abstractions that already exist. You do have
to manage what data is in memory and what data is on disk at any given time,
but the complexity and amount of code are low.

The idea that "event sourcing" somehow keeps you from needing a DB is
ridiculous. By the time you've defined the event format, and written the
software to replay the logs, etc., which if you're smart will be fairly
general and modular, congrats, you've just written a database. At best, you
keep complexity low, and it's another example of a small custom DB for a case
where you don't need a fancy off-the-shelf DB. Maybe it's the perfect solution
for your app, but it's still a database.

"Memory images," as a completely separate matter, are an abstraction that
saves you some of the work of making a DB. Just as S-expressions can save you
from defining a data model format, and files can save you from a custom key-
value store, memory images as in Smalltalk could save you from having to deal
with persistence. And if your language has transactions built in, maybe that
saves you from writing your own transaction system. In general, though, it's
very hard to get the DB to disappear, as there is a constellation of features
important to data integrity that you need one way or another. It's usually
pretty clear that you're using a DB, writing a DB, or using a system that
already has a DB built in. If you think there's no DB, there's a high chance
you're writing one. Again, that could be fine if you don't need all the
features and properties of a robust general-purpose DB.

Funnily enough, in EtherPad's case, we had full versioned history of
documents, and did pretty much everything in RAM and application logic -- a
pretty good example of what the article is talking about -- and yet we used
MySQL as a "dumb" back-end datastore for persistence. Believe me, we tried not
to; we spent weeks trying alternatives, and trying to write alternatives.
Perhaps if every last aspect of the data model had been event-based, we could
have just logged the events to a text file and avoided SQL. More likely, I
think, we would use something like Redis now.

~~~
joshu
Of course, filesystems are a kind of database as well.

~~~
silverbax88
Yes, agreed, which is why I find Fowler's (and others) documents a little hard
to take seriously. They ARE writing to disk, and doing a lot of other work to
try and keep the data intact in case of failure...which is what a DB does. I'm
all for the idea of keeping some data in memory for speed, but moving it all
to resident memory is just moving the same components around.

------
guygurari
In many applications, data outlives code. This is certainly the case in
enterprise applications, where data can sometimes migrate across several
generations of an application. Data may also be more valuable to the
organization than the code that processes it.

While I'm no fan of databases, one obvious advantage is that they provide
direct access to the data in a standard way that is decoupled from the
specific application code. This makes it easy to perform migrations, backups
etc. It also increases one's confidence in the data integrity. Any solution
that aims to replace databases altogether must address these concerns. I think
that intimately coupling data with the application state, as suggested in the
article, does not achieve this.

~~~
jcromartie
When people set out to design a SQL database, they usually end up updating and
deleting records. This is bad because it destroys history, and nothing that
you can add to your SQL architecture will fix it at a fundamental level.

By basing your system on a journaled event stream, you start with a foundation
of complete history retention, and you can build exactly the sort of reporting
views you need at any time (say, by creating a SQL database for other
applications to query).

~~~
matwood
_When people set out to design a data driven application, they usually end up
updating and deleting records._

FTFY...

It's not hard to build history into a SQL table design. You can even store
events in a...wait for it... SQL database. I have built numerous systems
backed by SQL databases that have complete history retention. Answering
questions like 'who had id 'X' on this date 3 years ago' are easily solvable
with basic standard sql.

I certainly don't believe SQL databases are perfect or the tool for every job,
but in many cases they work just fine until you get into very large datasets.
Admittedly, I only deal with databases in the 100s of GB range so I have yet
to personally run into the scaling problems that a Google or Facebook have and
the SQL backed systems I have built work just fine.

~~~
jcromartie
It's not hard, no, but it _usually doesn't happen_ in the average application.
That's the issue: it's not built in, it's not standardized, and every SQL
database is fully mutabile by default.

If your system operates in this journaled/event-sourcing way at the _most
basic level_ then you have the ultimate future-proof storage layer. You could
decide to completely change the way the data is stored and represented (in-
memory or otherwise) at any time, as long as you have that raw history.

~~~
kfool
That's one of the things ChronicDB does. It makes historical values available
in the average application.

------
kragen
I've written some programs like this, even to the point of replaying the
entire input history every time my CGI script got invoked. It's surprising
what a large set of apps even that naïve approach is applicable to, and there
are some much more exciting possibilities under the surface.

To the extent that you could actually write your program as a pure function of
its past input history — ideally, one whose only O(N) part (where N was the
length of the history) was a fold, so the system could update it incrementally
as new events were added — you could get schema upgrade and decentralization
"for free". However, to get schema upgrade and decentralization, your program
would need to be able to cope with "impossible" input histories — e.g. the
same blog post getting deleted twice, or someone commenting on a post they
weren't authorized to read — because of changes in the code over the years and
because of distribution.

I called this "rumor-oriented programming", because the propagation of past
input events among the nodes resembles the propagation of rumors among people:
[http://lists.canonical.org/pipermail/kragen-
tol/2004-January...](http://lists.canonical.org/pipermail/kragen-
tol/2004-January/000749.html)

I wrote a bit more on a possible way of structuring web sites as lazily-
computed functions of sets of REST resources, which might or might not be past
input events: [http://lists.canonical.org/pipermail/kragen-
tol/2005-Novembe...](http://lists.canonical.org/pipermail/kragen-
tol/2005-November/000810.html)

John McCarthy's 1998 proposal, "Elephant", takes the idea of writing your
program as a pure function of its input history to real-time transaction
processing applications: <http://www-
formal.stanford.edu/jmc/elephant/elephant.html>

The most advanced work in writing interactive programs as pure functions of
their input history is "functional reactive programming", which unfortunately
I don't understand properly. The Fran paper <http://conal.net/papers/icfp97/>
is a particularly influential, and there's a page on HaskellWiki about FRP:
[http://www.haskell.org/haskellwiki/Functional_Reactive_Progr...](http://www.haskell.org/haskellwiki/Functional_Reactive_Programming)

------
zwischenzug
Perhaps I am an old dinosaur, but this article merely annoyed me.

"The key element to a memory image is using event sourcing, which essentially
means that every change to the application's state is captured in an event
which is logged into a persistent store."

That is a key element of a database. It's called a logical log.

"Furthermore it means that you can rebuild the full application state by
replaying these events."

Yup, logical log.

"Using a memory image allows you to get high performance, since everything is
being done in-memory with no IO or remote calls to database systems. "

This is _exactly_ what sophisticated old-school databases do. You can have
them require to write to the DB on commit, or just to memory, and have a
thread take care of IO in the background.

"Databases also provide transactional concurrency as well as persistence, so
you have to figure out what you are going to do about concurrency."

Righty-ho.

"Another, rather obvious, limitation is that you have to have more memory than
data you need to keep in it. As memory sizes steadily increase, that's
becoming much less of a limitation than it used to be."

So why not store your old-school DB in memory?

I can understand the argument that you don't want to lock into a big DB
vendor's license path, but the technical arguments here look distinctly weak
to me.

Maybe old-fashioned DBs are hipper than people think?

~~~
zzzeek
These are all good points but the core of Fowler's article is that the
persistence is against the application's object structures directly, with no
translation to relational concepts needed (note I am the author of a very
popular object-relational library, so I'm not in any way opposed to object-
relational mapping...it's just interesting to see this approach that requires
none). That it's stored in memory and is reconstructed against an event log
are secondary to this.

~~~
fauigerzigerk
Initially he makes it sound like there is no translation into a different
model but at the end of the article he takes it all back and for good reason:

 _Also it's important to keep a good decoupling between the events and the
model structure itself. It may be tempting to come up with some automatic
mapping system that retrospects on the event data and the model, but this
couples the events and model together which makes it difficult to migrate the
model and still process old events._

So, you're right that he doesn't envision a translation to a relational model
but it's not just object structures either.

~~~
zzzeek
Right, but the event system in question could be built up nicely in a couple
of hours most likely, and the level of "translation" would be minimal compared
to OR mapping - no columns/rows/joins/tables/anything else like that.

With such an application I'd probably still be writing the events themselves
to a relational database for archiving and potentially sending out report-
oriented data as well. I'm not sure how all of that would work out re:
ultimately the whole app needs to be stored in an RDBMS anyway for various
reasons but it seems interesting to try.

------
IgorPartola
Wouldn't this system have a bunch of drawbacks:

\- Long startup times as the entire image needs to be loaded and prepared.

\- It would be hard to distribute the state across multiple nodes

\- What happens in case of a crash? How fault tolerant would this be?

\- Does this architecture essentially amount to building in a sort-of-kind-of
datastore into your already complex application? Without a well-defined well-
tested existing code base, is this just re-inventing the wheel for each new
project?

\- How do you enforce constraints on the data?

\- How do transactions work (debit one account, [crash], credit another
account?

\- How do you allow different components (say web user interface, admin
system, reporting system, external data sources) to share this state?

Just curious.

EDIT:

\- Isn't this going to lead to you writing code that almost always has side-
effects, causing it to be really hard to test? How would you implement this
system in Haskell?

~~~
wpietri
\- The startup times can be a problem if you have a lot of data. Modern disks
are pretty fast for streaming reads, though, and you can split the
deserialization load across multiple processors.

\- Mirroring state is easy; you just pipe the serialized commands to multiple
boxes.

\- It's very fault tolerant. Because every change is logged before being
applied, you just load the last snapshot and replay the log.

\- It didn't seem that way to me.

\- In code. In the system I built, each mutation was packaged as a command,
and the commands enforced integrity.

\- Each command is a transaction. As with DB transactions, you do have to be
careful about where you draw your transaction boundaries.

\- Via API. Which I like better, as it allows you to enforce more integrity
than you can with DB constraints.

~~~
IgorPartola
Thanks for the informative response. Just a couple more questions:

> \- The startup times can be a problem if you have a lot of data. Modern
> disks are pretty fast for streaming reads, though, and you can split the
> deserialization load across multiple processors.

Reading data, at even a GB/second from disk (which is currently not possible)
is going to mean a second spent of a GB of data, just to read, let alone
deserialize. That's with reading a snapshot, not replaying old transactions.

> \- Mirroring state is easy; you just pipe the serialized commands to
> multiple boxes.

That's not distributing the load. I'm talking about having more data than fits
in an reasonable amount of RAM (say 1TB). Also mirroring is nice for when you
want read-only access to your data. You'll have the same problem as any other
data store when you want multiple writers. Also, is replication synchronous or
asynchronous (which end of CAP do you fall on)?

>\- It's very fault tolerant. Because every change is logged before being
applied, you just load the last snapshot and replay the log.

So it's going to at the speed of the disk then
([http://smackerelofopinion.blogspot.com/2009/07/l1-l2-ram-
and...](http://smackerelofopinion.blogspot.com/2009/07/l1-l2-ram-and-hdd-
latencies-infographic.html)). Don't get me wrong, this is still faster than
writing to the network, but then writes are _way_ slower than reads.

My other question is how much of a pain in the ass is it to debug such a
system? I suppose if you have a nice offline API to look at your data, change
something, revert back, etc, it would work well, but if it's deep within your
normal application, it could become nightmarish.

~~~
wpietri
> Reading data, at even a GB/second from disk (which is currently not
> possible) is going to mean a second spent of a GB of data, just to read, let
> alone deserialize.

If that's just saying that startup time can be an issue, I agree. There are a
variety of techniques to mitigate that, though. The simplest is to compress
snapshots and/or put them on RAID, boosting read speed. The most complicated
is just to have mirrored servers and only restart the one not in use right
now.

> I'm talking about having more data than fits in an reasonable amount of RAM
> (say 1TB).

For something where you need transactions across all of that? This
architecture's probably not a reasonable approach, then. The basic
precondition is that everything fits in RAM. However, sharding is certainly
possible if you can break your data into domains across which you don't
require consistent transactions.

> So it's going to at the speed of the disk.

Sort of.

Because it's just writing to a log, mutations go at the speed of streaming
writes, which is very fast on modern disks. And there are a variety of
techniques for speeding that up, so I'm not aware of a NoDB system for which
write speed is the major problem.

Regardless, it's a lot better for writes than the performance of an SQL
database on the same hardware.

> My other question is how much of a pain in the ass is it to debug such a
> system?

It seemed fine. A big upside is that you have a full log of every change, so
there's no more "how did X get like Y"; if you want to know you just replay
the log until you see it change.

Last I did this we used BeanShell to let us rummage through the running
system. It was basically like the Rails console.

------
courtewing
No matter how skilled I become as a developer, there is always something
lurking around the corner to make me feel more naive than ever. As I was
reading this article, I realized that my whole career and knowledge about the
way applications work is based around the one core idea that when non-binary
data needs to be persisted, you use a database.

The idea that you can reliably use event sourcing in memory to persist your
data is as foreign to me as it is impressive. Is anyone familiar with major
applications (web apps, ideally) that use this method for their data
persistence?

~~~
wpietri
You're already familiar with a couple of things that can be built this way:
word processors and multiplayer game servers. In both cases SQL databases are
too slow and too awkward.

Financial trading is another area where databases are too slow. I know of one
place that uses this approach to keep pricing data hot in RAM for their
financial models. And Fowler previously documented using this for a financial
exchange:

<http://martinfowler.com/articles/lmax.html>

~~~
silverbax88
I wonder about this concept. The reality is that having enough RAM to power
NASDAQ, and then being able to accurately reproduce the state of the data
following a crash based on input being kept in a durable store - which
effectively is IO to the disk, which is the same as, well, just writing to a
DB to begin with.

Of course, Fowler talks about 'snapshotting' the data, which, again, makes me
wonder if playing with all of this resident memory and the systems needed to
make that happen haven't already been solved by...um...databases.

------
nickyp
For those who want to take this kind of approach (object prevalence) in Common
Lisp see <http://common-lisp.net/project/cl-prevalence/>

Sven Van Caekenberghe (the author of cl-prevalence) and I used this approach
to power the back-end/cms of a concert hall back in 2003. A write-up of our
experiences can be found at
<http://homepage.mac.com/svc/RebelWithACause/index.html>

The combination of a long-running Lisp image with a remote REPL and the
flexibility of the object prevalence made it a very enjoyable software
development cycle. It's possibly even more applicable with the current memory
prices.

I especially liked the fact that your mind never needs to step out of your
object space. No fancy mapping or relationship tables, just query the objects
and their relations directly. I guess that's what SmallTalk developers also
like about their programming environment.

~~~
larve
we started with cl-prevalence and then of course (NIH-syndrome) implemented
our own approach to this back in 2003, which you can find at
<http://bknr.net/> . We used it back then to run eboy.com, and it still is
powering <http://quickhoney.com> <http://www.createrainforest.org/> and
<http://ruinwesen.com/> amongst others. Those transaction logs + images are
for some 6+ years old, and have gone through multiple code rewrites and
compiler changes and OS changes and what not. It is good fun, has drawbacks,
has advantages, definitely widens your horizon.

------
jkkramer
Using the no-DB approach is particularly tempting with a language like
Clojure. Clojure can slice & dice collections easily and efficiently. It has
built-in constructs for managing concurrency safely.

I actually have a couple Clojure apps that rely on a hefty amount of in-memory
data to do some computations. Even the cost of pulling the data from Redis
would be too expensive. The in-memory data grows very slowly, so it's easy to
maintain. Moving faster-growing data in-process would be trickier, but this
article makes me want to try.

------
giardini
This has limited use because of

Maintenance: I can easily give a 10% raise to everyone with a single SQL
statement. Fowler's method requires that I first create an entire
infrastructure (transaction processing, ACID properties) in code for this
particular application. And it had better be as reliable as the transaction
processing available in modern relational databases (so says my boss) or I'll
be looking for a new job.

Support: you get to teach the new guy how "Event Sourcing" works for this
application A and also applications B, C, ....

That said, I _have_ done this with great success. But the work involved a
single application (a minicomputer-based engineering layout system). The ease
with which versioning could be included was a selling point.

And don't get me started on reporting or statistics.

~~~
wpietri
The "give everybody a 10% raise" case can be looked upon either as a bug or a
feature. Sometimes it's nice that anybody can do anything; sometimes it isn't.

As to creating the infrastructure and worries about reliability, there are a
number of frameworks for this. E.g., Prevayler. It gives you all the ACID
guarantees, but has about three orders of magnitude less code than a modern
database.

Supporting it could definitely be a problem. That's true for anything novel,
so I'd only do this where the (major) performance benefits outweigh the
support cost.

Some kinds of statistics are easier with this. For example, if you want to
keep a bunch of up-to-date stats on stocks (latest price, highs, lows, and
moving averages for last hour, day, and week) it is almost trivially easy in a
NoDB system, and much, much faster than with a typical SQL system.

For other stats and reporting, though, dumping to an SQL database is great.
For many systems you don't want to use your main database for statistics
anyhow, so a NoDB approach mainly means you start using some sort of data
warehouse a little earlier.

------
mmatants
I am 100% agreeing with the article, with one caveat.

Database engines are not just for storing - each is basically a "utility
knife" of data retrieval - indexing, sorting and filtering are available via
(relatively) simple SQL constructs. If your app uses an index right now,
ditching the DB will mean re-implementing it manually. It's not hard, but it's
extra code.

So basically, the DB engine might still be a necessary "library", at least for
data retrieval. A middle-of-the-road take on this is e.g. using an in-memory
Sqlite instance to perform indexing, etc - seeding it at run-time to help with
data searches, but then still not using it for storing persistent information
and discarding the data at the end.

~~~
Duff
The SQL constructs are great, but the biggest advantage to relational
databases is that the engine handles your data consistency issues for you.
Consistency isn't just about rolling the datastore back to a specific moment
in time -- you have to handle locking, concurrent reads/writes, etc.

If you're building a trading platform that handles 6M transactions/second, you
have the money to handle this in the application layer and the load to justify
the expense. But for many other tasks, you may be wasting money or putting
data at risk.

~~~
wdrury
I agree. Unless you know you are building something that will start out
needing millions of transactions per second, you are more likely over-
designing if you are building a bespoke database.

Standard tools are useful because you can get to working code fast ... this is
why LAMP is still such a powerful framework upon which to build. While it may
make sense to consider adding a search indexer (Solr) or key-value cache
(Redis), for almost every use case, rewriting data storage is a waste.

Also, to paraphrase Ted Dziuba, it probably doesn't matter if your product
doesn't scale, because nobody cares, or will ever use it. So I think it is
better to get something up and running quickly to see if anyone cares before
you bother trying to optimize for the rare case where your product turns out
to be the next Twitter.

------
alexro
I think that the best thing DB-s provide is the separation of skills. I can
fully concentrate on the programming side and just be aware of the db side,
and the DBA-s will handle setup, replication, migration, analytics, ad-hoc
queries, backup, etc.

If, on the other side, I had to do it all myself, I'd most probably have lost
my last hair.

------
wpietri
I'd encourage everybody to try this out; building an app like this really
broadened my way of thinking about system design.

Compared with a database-backed system, many operations are thousands of times
faster. Some things that I was used to thinking of as impossible became easy,
and vice versa. Coming to grips with why was very helpful.

~~~
andrewcooke
what implementation did you use? what others exist? thanks.

~~~
gte525u
Isn't this essentially what prevayler <http://en.wikipedia.org/wiki/Prevayler>
is?

------
yesimahuman
I think it's interesting you can distribute more of your "persistent" state to
in-memory storage and then distribute snapshots throughout the day. Online
game servers often rely on state being in memory rather than being queried on
demand. Achieving high performance otherwise is difficult.

However, I wouldn't call this "no-DB." Rather, it's "less-db." Ultimately,
historical and statistical data needs to be stored and databases are great for
that (and for a stats team).

------
cachemoney
I spent about a year as a maintainer of FlockDB, Twitter's social graph store.
If you don't know, it's basically a sharded MySQL setup. One of the key pain
points was optimizing the row lock over the follower count. Whenever a Charlie
Sheen joins, or someone tries to follow spam us, one particular row would get
blasted with concurrent updates.

Doing this in-memory in java via someAtomicLong.incrementAndGet() sounds
appealing.

~~~
jcromartie
> Doing this in-memory in java via someAtomicLong.incrementAndGet() sounds
> appealing.

Just for fun, in Clojure:

    
    
        (def current-id (atom (long 0)))
        (defn get-id [] (swap! current-id inc))

------
emehrkay
I didn't finish the article because I read the one on Event Sourcing
(<http://martinfowler.com/eaaDev/EventSourcing.html>), pretty good pattern. I
like that every time he describes a new one (to me), I feel like I have to use
it.

~~~
discreteevent
"I feel like I have to use it". Sure, as long as its in a prototype and I
assume that that is what you mean. The problem is that its this "see a new
thing and feel like I have to use it" that seems to be the single biggest
source of accidental complexity in production software. So by all means use
something new but not because you feel like it, rather because you have
thought long and hard about it, tested it and can really justify why you want
to use it in your situation.

------
nivertech
Nothing new here. I remember working with TED editor on PDP-11. The machine
crashed some times. After restart TED would restore the text by replaying all
my key presses.

Other example vector graphic editors: it replays vector graphics primitive
instead of pixel bitmaps.

------
jeffdavis
What about ROLLBACK? And no, going back in time by replaying logs is no
substitute, because you lose other transactions that you want to keep (and
perhaps already reported to the user as completed).

What about transaction isolation? How do you keep one transaction from seeing
partial results from a concurrent transaction? Sounds like a recipe for a lot
of subtle bugs.

And all of the assumptions you need to make for this no-DB approach to be
feasible (e.g. fits easily in memory) might hold at the start of the project,
but might not remain valid in a few months or years. Then what? You have the
wrong architecture and no path to fix it.

And what's the benefit to all of this? It's not like DBMS do disk accesses
just for fun. If your database fits easily in memory, a DBMS won't do I/O,
either. They do I/O because either the database _doesn't_ fit in memory or
there is some better use for the memory (virtual memory doesn't necessarily
solve this for you with the no-DB approach; you need to use structures which
avoid unnecessary random access, like a DBMS does).

I think it makes more sense to work _with_ the DBMS rather than constantly
against it. Try making simple web apps without an ORM. You might be surprised
at how simple things become, particularly changing requirements. Schema
changes are _easy_ unless you have a lot of data or a lot of varied
applications accessing it (and even then, often not as bad as you might think)
-- and if either of those things are true, no-DB doesn't look like a solution,
either.

~~~
smokinn
In an event-based system, especially a large distributed one, ROLLBACK as a
single command to revoke all previous attempts at state mutation becomes
impossible to support. Instead of supporting distributed transactions you have
to change to a tentative model. The paper Life beyond Distributed
Transactions: an Apostate's Opinion (Available here:
<http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf> ) describes this well.

Basically instead of making a transaction between 2 entities, you send a
message to the first reserving some data, a message to the second reserving
the data and once you get confirmation from both (or however many entities are
involved in the transaction) you send a commit to them.

These reservations can be revoked though. Your rollback has to be managed by
an "activity".

Ex: Bank transfers. You have the activity called BankTransfer. It manages the
communication between entities and the overall workflow. It starts by sending
messages to entities Account#1 with 100$ in it and Account#2 also with 100$.
To #1 it says debit 500$. To #2 it says credit 500$. #2 responds first and
says Done. #1 responds second and says Insufficient Funds. BankTransfer sends
another message to #2 saying Cancel event id 100 (the crediting).

Other activities that want to read the state of number 1 will see 100$ in it
but the transfer (as yet unconfirmed) had been of 50 rather than 500$ and
another debit of 75$ comes in it would respond insufficient funds. At this
point it's the activity's job to decide what to do. Wait and try again? Fail
entirely and notify any other entities relevant to the workflow? That's up to
the business rules. Also, since the credit has not yet been confirmed, reading
the balance on #2 would still say 100, not 600$.

Of course, depending on your use case you may want the read to return the
balance with unconfirmed transactions. That's entirely up to the application
code and business rules but the example should be explanatory as to how
rollback is implemented.

Eventual consistency is the only scalable way to go for very large systems.

------
Mordor
Why not go for NoDisk solution too - just RAID your memory and back it up with
ultracapacitors?

~~~
kragen
You don't even need the capacitors or battery backup if your memory is
sufficiently distributed.

~~~
Mordor
Hmm, not so sure about that - power outages can affect several city blocks and
then you're introducing latency?

~~~
kragen
Put your memory on separate continents. Yes, it introduces latency, even more
latency than a disk seek, but not that much.

------
superuser2
Running Redis in journaling mode is essentially this, too. Mongo can run like
that too.

~~~
jorangreef
Re: Redis, only if you never compact the append-only file.

------
rb2k_
While this article wants to establish additional layers above the filesystem,
I always wondered how comparable modern filesystems are to key-value
datastores.

As far as I can see, they seem to be comparable to b+tree indexed key value
stores. A key would e.g. be "/home/user/test.txt". Thanks to the B+Tree
"indexation" you can do a prefix scan and list folders (e.g. "ls
/home/user/"--> all keys starting with "/home/user/").

In the case of e.g. ReiserFS they actually use B+Trees. They have a caching
layer managed by the OS. Most of them have journaling which would be the
equivalent of a "write ahead log".

Map reduce based "view" generation can easily be done by pipes and utilities
like grep. We might be even able to do some sort of simplistic
filtering/views/relations using symlinks.

I guess the main difference is that they aren't optimized for this database-
like behavior from a performance standpoint and that the network interfaces to
them are SMB/AFP/NFS.

~~~
jorangreef
Facebook were using the filesystem for storing photo's and then moved to
Haystack which is essentially an append-only log similar to BitCask. The
problem with using the filesystem as a KV store is that for every item stored,
you're storing a whole lot of filesystem specific meta-data: created, updated,
permissions etc.

------
overshard
Any time you have someone who is not a programmer who wants to maintain code,
you will have a DB. And this almost all the time.

------
yk_42
EventSourcing/CQRS (Command/Query Responsibility Segregation) is gaining a bit
of traction in the .NET community. There are some great presentations[1],
blogs[2][3] and projects[4] related to this architecture.

[1]: [http://www.infoq.com/presentations/Command-Query-
Responsibil...](http://www.infoq.com/presentations/Command-Query-
Responsibility-Segregation)

[2]: <http://www.udidahan.com/?blog=true>

[3]: <http://blog.jonathanoliver.com/>

[4]: <https://github.com/joliver/EventStore/>

~~~
darylteo
Greg Young, himself, asserts that CQRS is NOT an architecture. And also
asserts that CQRS itself has nothing to do with Event Sourcing.

[http://codebetter.com/gregyoung/2010/02/16/cqrs-task-
based-u...](http://codebetter.com/gregyoung/2010/02/16/cqrs-task-based-uis-
event-sourcing-agh/)

"CQRS is not eventual consistency, it is not eventing, it is not messaging, it
is not having separated models for reading and writing, nor is it using event
sourcing."

It is gaining a lot of traction simply because it seems like a complicated and
cool way to solve an uncommon problem. I fear that Event Sourcing will soon
become a over-engineered hammer for the wrong nail.

If it were me, I would use CQRS principles, with a relational backend as my
source model. Then when the need for scale arises, use ETL to either non-
relational db or no-db for queries.

------
mrich
That sounds interesting. However, as soon as you have several distinct
applications that share e.g. the same master data, how do do interface them?
You will have to design the in-memory transactional store as a kind of global
component. Then, not much is left until you end up with a real database.

I am working on a team that is building an in-memory SQL database. It features
a custom language that makes it possible to push the time-critical, data-
processing parts of the application directly to the database, which allows for
the same speed as this no-DB approach. But you don't have to build your own DB
and do everything yourself (correct persistence, backup, transactions...)

~~~
jcromartie
UUIDs should let you merge any two streams of events without conflict (unless
there are other business-layer data constraints that would be violated).

------
adelevie
This seems like an excellent idea. Is it possible to preserve most of the
APIs/design patterns we're used to working with for (no)SQL when using Event
Sourcing?

------
artsrc
Having a database model and an application model in separate processes, and
mapping between the two is expensive in many ways.

There are many ways get rid of the problem with getting rid of the database.
For example putting all knowledge of the domain in the database via stored
procedures, couchapps, and object stores.

------
71104
answering directly to the subject: i do hope so. SQL too often introduces only
a layer of complexity between the server-side application and the storage,
while most of times an application could be designed to just use the
filesystem, which is a database on its own by the way: it's a big, usually
efficient, lookup table that maps keys (file paths) to values (file contents).

why store passwords through SQL when a server application could just use a
specific directory containing one file for each user, each file named with the
username and containing his password (without any file format, just the
password, possibly hashed or encrypted)? the operating system I/O cache should
be able to handle that efficiently and the advantage would be the elimination
of the dependence on another software, the DBMS.

~~~
wvenable
> each file named with the username and containing his password (without any
> file format, just the password, possibly hashed or encrypted)?

Because your advertisers want to know how many users signed up last month,
last six months, and last year. When you only consider one use-case for your
data, it's easy to consider using NoSQL or the file system to store your data
but in doing so you fail to imagine all the other ways you might want that
very same data.

~~~
yesbabyyes
ctime, mtime, atime.

~~~
wvenable
Add one other criteria to that -- say location -- and it's already useless.

------
ivanhoe
It looks more like a different DB engine implementation than no-DB system...
you still need a structured data persisted somewhere, difference here is in
the way you store it and how you buffer it, but IMHO that all can be seen
simply as in-memory "DB engine"

------
dendory
I personally use SQLite all the time. I prefer it to any of the other
solutions.

------
drKarl
Two years ago I used an IMDB (In-memory database or Main memory database) in a
project. I think it was CSQL. I think this is a nice way to have full ACID and
great performance!!

------
jberryman
Is "event sourcing" similar to the acid-state business used in happstacK?:

<http://happstack.com/index.html>

------
klauswuestefeld
Martin Fowler to usurp Prevalence pattern: <https://gist.github.com/1186975>

------
yesbabyyes
I use Redis for this. I imagine it would be possible to create certain js
objects that automatically persist to Redis.

------
pointyhat
I've always wanted "no-DB" to the level of it being part of the
platform/language.

I've always thought that software-transactional memory and persistent
distributed heaps would get us there. Unfortunately the nearest things have
been Redis and Terracotta plugged into Clojure. It should be:

Insert? new an object. Delete? dispose an object. Look up? Hash table.

Solved problems that just require persistence.

~~~
jcromartie
There are some neat little libraries that help with this in Clojure. The basic
idea is that when you introduce changes through transactions, the actual
transaction (code) is appended to disk (which is very fast), and this becomes
your "database" file. So, what is persisted is just a list of state changes
that are "replayed" to restore state.

The individual transactions could also easily be distributed to multiple nodes
via a message queue.

~~~
mnemonicsloth
link?

------
kahawe
...and on a mildly related subject: more people should consider LDAP.

~~~
arethuza
You mean using data stores that support access through LDAP as general purpose
databases?

~~~
kahawe
Under certain circumstances using LDAP as a directory or data-store and not
just for authentication alone can make sense, especially if you want to
benefit from the very well standardized, open and stable interface or if some
sort of multi-master scenario is needed or if you want very rigid control over
who can see what portion of the data then the most popular LDAP servers offer
a lot of very cool ways of "modelling" and managing your data.

One drawback to keep in mind is that LDAP is generally not meant for lots and
lots of writes so it is by no means a substitute for DBs but it is great for
looking up data and if that data somehow fits a sort-of "file card" paradigm
anyway and there are way more reads than writes on that data and several
different applications should be able to access it then all the better.

The major and most popular applications of LDAP, however, are certainly always
somehow connected to authenticating users and that is also where it really,
really shines and that was another reason I brought it up. If applicable,
personally I would prefer managing users and their logins in an LDAP server
over keeping all that in a database.

Luckily nowadays most (web) applications offer some sort of support for using
LDAP anyway, however dodgy those implementations sometimes are. (One of my
favorite examples here is netscape navigator/mozilla/thunderbird and the
addressbook schema shenanigans...)

I just think it gets too little credit or news these days but that probably
stems from the fact that it is a pretty stable system without lots of
innovations and it has been around a looong time and it is not so "sexy"
anymore and most HN hackers wouldn't have to deal with it most of the time
anyway.

But I cannot recommend looking into LDAP and playing around with it and
understanding how to get a directory going enough - it is a bit confusing
(sometimes frustrating) at first because it is so different from typical
databases but it is fun once you get the hang of it and learn to appreciate
its simple and efficient beauty and some of the things you can do in huge
directories with e.g. the Sun LDAP server are nothing short of amazing.

~~~
vetler
I worked with LDAP servers for a few years, but never liked it much. Perhaps I
missed something. What can you do in huge directories that is so amazing?

