
Rich Hickey's new project: datomic.com - indy
http://datomic.com/
======
jamii
"Datomic is not an update-in-place system. All data is retained by default."

I'm becoming more and more convinced that your canonical data store should be
append-only whenever possible (see eg [1][2] for detailed arguments). It's
nice to see first class support for this.

[1] <http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html>

[2] <http://martinfowler.com/articles/lmax.html>

EDIT: Just read through the whitepaper. Looks like the indexes / storage
engine form an MVCC
([http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...](http://en.wikipedia.org/wiki/Multiversion_concurrency_control))
key-value store, similar to Clojure's STM. Peers cache data and run datalog
queries locally.

This could be either an available or consistent system, depending on how cache
invalidation in peers works. In the available, eventually-consistent case you
have the added benefit that all queries see a consistent snapshot of the
system, even if that snapshot is not totally current.

Like most of Hickey's work, the whole thing seems really obvious in hindsight.
It also bears a lot of similarity to Nathan Marz' recommendations for data
processing and schema design.

~~~
lurker17
It's rather standard already:
<https://en.wikipedia.org/wiki/Slowly_changing_dimension>

------
breckinloggins
It looks like a very cool product/service, but there's something... off...
about this landing page. I can't quite put my finger on it. Two things I can
think of right off the bat:

1\. The use of the term "whitepaper". It's very "enterprisey"

2\. It took me a bit of perusing to figure out what the product IS. I think
the lead paragraph may need some tweaking

In all, the landing page makes the product feel intimidating. Contrast to
Parse's landing page (<https://www.parse.com/>) where it feels like I'm free
to jump right in and tinker with it, but I also get the impression that it
will scale up if I need it to. (Yes, I know the two services aren't offering
the same thing).

~~~
Scriptor
I agree with you on the landing page. The introductory paragraph seems rather
"fluffy". That combined with the fact that it uses a whitepaper immediately
gave me the feeling that it's not really meant as something for regular
programmers to check out and hack with. It's surprising, since that's how many
Clojure programmers get their start.

On the other hand, it's very new so maybe they'll add more developer-friendly
pages soon. Or maybe it's only meant for "enterprise" environments? Time will
tell.

~~~
nickik
Maybe its so new and unique that its hard or impossible to explain it in a
pragraph.

~~~
rjn945
I think this has a lot to do with it. After an hour of reading, watching and
thinking, I can't come up with any way to put it into one paragraph.

Here's the shortest what and why I could come up with:

 _Questioning Assumptions_

Many relational databases today operate based on assumptions that were true in
the 1970s but are no longer true. Newer solutions such as key-value stores
("NoSQL") make unnecessary compromises in the ability to perform queries or
make consistency guarantees. Datomic reconsiders the database in light of
current computer set-ups: millions of times larger and faster disks and RAM,
and distributed architectures connected over the internet.

 _Data Model_

Instead of using table-based storage with explicit schemas, Datomic uses a
simpler model wherein the database is made up of a large collection of
"datoms" or facts. Each datom has 4 parts: an entity, an attribute, a value,
and a time (denoted by the transaction number that added it the database).
Example:

    
    
      John, :street, "23 Swift St.", T27

This simple data model has two main benefits. It makes your data less rigid
and hence more agile and easier to change. Additionally, it makes it easy to
handle data in non-traditional structures, such as hierarchies, sets or sparse
tables. It also enables Datomic's time model...

 _Time_

Like Clojure, Datomic incorporates an explicit model of time. All data is
associated with a time and new data does not replace old data, but is added to
it. Returning to our previous example, if John later changes his address, a
new datom would be added to the database, e.g.

    
    
      John, :street, "17 Maple St.", T43

This mirrors the real world where the fact that John has moved does not erase
the fact that John once lived on Swift St. This has multiple benefits: the
ability to view the database at a point in time other than the present; no
data is lost; the immutability of each datom allows for easy and pervasive
caching.

 _Move Data and Data Processing to Peers_

Traditionally databases use a client-server model where clients send queries
and commands to a central database. This database holds all the data, performs
all data processing, and manages the data storage and synchronization. Clients
may only to access the data through the interface the server provides -
typically SQL strings which may include a (relatively small) set of functions
provided by the database.

Datomic breaks this system apart. The only centralized component is data
storage. Peers access the data storage through a new distributed component
called a transactor. Finally, the most important part, data processing, now
happens in the clients, which, considering their importance, have been renamed
"peers".

Queries are made in a declarative language called Datalog which is similar to
but better than SQL. It's better because it more closely matches the model of
the data itself (rather than thinking in terms of the implementation of tables
in a database). Additionally, it's not restricted like SQL. It allows you to
use your full programming language. You can write reusable rules that can then
be composed in queries. Additionally, you can call any of your own functions.
This is a big step up in power and it's made practical because of the
distribution. If ran your query on central server, you'd have to be concerned
about tying up a scare resource with a long-running query. When processing
locally, that's not a concern.

When a query is performed that data is loaded from central storage and placed
into RAM (if it will fit). Later queries can use this locally cached data for
fast queries.

\----

That's definitely not all it does or all the benefits, but hopefully that's a
good start.

~~~
anamax
> a time (denoted by the transaction number that added it the database).

Do transaction numbers have total order or just partial order? Total order is
serializing. (And no, using real time as the transaction number doesn't help
because it's impossible to keep an interesting number of servers time-
synched.) Partial order is "interesting".

~~~
programnature
It is totally ordered.

The transactor is a single point of failure.

However, since its only job is doing the transactions, the idea is it can be
faster than a database server that does both the transactions and the queries.

~~~
alkby
I think their statement about ACID is too bold.

How does somebody do read-"modify" style of transactions ?

Say I want to bump some counter. So I delete old fact and I establish new
fact. But new fact needs to be exactly 1 + old value of counter. With
transactions as simple "add this and remove that" you seemingly cannot do
that. So it's not ACID. Right?

~~~
danieljomphe
From what I remember, compare-and-swap semantics are in place for that kind of
case.

If that was not the case, you could still model such an order-dependent update
as the fact that the counter has seen one more hit. Let the final query reduce
that to the final count, and let the local cache implementation optimize that
cost away for all but the first query, and then incrementally optimize the
further queries when they are to see an increased count.

That said, I'm pretty sure I've seen the simpler CAS semantics support. (The
CAS-successful update, if CAS is really supported, is still implemented as an
"upsert", which means old counter values remain accessible if you query the
past of the DB.)

~~~
danieljomphe
Forget my last paragraph. Anyways, richhickey answered. :)

------
jasonkolb
I have to admit I'm a little confused about what this is. I'm taking a coffee
break and not really into reading a whitepaper, so take that with a grain of
salt, but I'd call that a landing page failure.

That said, it _sounds_ like a database-as-a-service? If so, is the primary
benefit the reduced database management load? Or is there some special sauce
in here that makes it more capable than other RDMS or NoSQL databases?

~~~
nickik
Its something that is quite unique and and new not a new K/V-Store, it will
take time to really understand this. I watched the Video and things become
clearer but I wouldn't know how to discribe it in a paragraph.

The "special sauce" is that much of the work is done localy (in memory), you
can use very powerful datamanipulation, and its ACID. That my understanding so
far.

------
puredanger
Rich will be discussing Datomic in his keynote at Clojure/West next week in
San Jose (Friday March 16th). Schedule: <http://clojurewest.org/schedule>

Tickets for the conference are available, including Friday-only tickets for
$250. Friday will include Rich's keynote and a keynote by Richard Gabriel as
well as lots of other Clojure-y goodness.
<http://regonline.com/clojurewest2012>

------
gfodor
One question I have is the cold start problem. How can I ensure dropping in a
new peer is not going to have a large negative effect on response times? With
memcache, you can just prewarm a new node or have clients only round-robin it
a few times per request to warm it up. It seems like pre-warming here is going
to be more cumbersome since it's not a simple k-v store but will require you
to pre-emptively run _queries_ to get there. (Similar to Lucene.)

Edit: Rich's response here:

[http://blog.fogus.me/2012/03/05/datomic/comment-
page-1/#comm...](http://blog.fogus.me/2012/03/05/datomic/comment-
page-1/#comment-48817)

Seems to imply that non-cached performance won't be so bad anyway. Looking
forward to seeing some benchmarks.

------
fogus
Stuart Halloway provides more information in his Datalog querying in Datomic
screencast:
[http://www.youtube.com/watch?feature=player_embedded&v=b...](http://www.youtube.com/watch?feature=player_embedded&v=bAilFQdaiHk)

~~~
nickik
This is really cool. Highly recommended. You see some querycode in clojure and
in java.

~~~
omaranto
I'd say all the query code is written in Clojure, but if you insist on using
Java, you can put the queries inside Java strings.

~~~
nickik
I would rather say the syntax of the DSL is the same as the Clojure
Datastructures witch makes its easier to work with in Clojure.

------
Confusion
Won't this just be another leaky abstraction[1] in which the remoteness of the
data will be impossible to ignore[2]? I like the idea of a transparant local
LRU 'query'-cache for a remote database[3], but I fear Hibernate-like (or
Haskell-like) problems in locating performance bottlenecks.

[1]
[http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...](http://www.joelonsoftware.com/articles/LeakyAbstractions.html)

[2] "A Note on Distributed Computing"
(<http://labs.oracle.com/techrep/1994/smli_tr-94-29.pdf>)

[3] Please correct me if that synopsis is wrong

------
politician
The product seems to share characteristics with triplestores and the Sparql
query language and append-only persistence mechanism from the Linked Data
sphere/movement. Could someone more knowledgeable comment on this similarity?

Some differences: 1\. No concept of inference/reasoning 2\. No mention of a
graph 3\. Interesting use of clientside caching / data-peering 4\. Clojure
serialization vs N3/Turtle/RDF

Some similarities: 1\. Quadstores have are parameterized by graph, Datomic by
time 2\. subject-predicate-object model 3\. query-anything ( including [ ?s ?p
?o] ??) 4\. query anywhere (sending an rdf to a client for local query seems
similar)

edit- I give up trying to get HN to render an ordered list. Any help would
be... helpful.

------
brianm
If I read correctly, it is pretty expensive. $0.10 / connection (peer) / hour,
plus dynamodb and transactor instance charges. For 100 clients, and not
including the dynamodb or transactor instance(s), this makes it a hair more
per year then a quad core oracle instance.

~~~
mapleoin
You have to factor in the DBAs that come with that Oracle db, too.

------
DanWaterworth
This is pretty cool, it's very similar to a project I'm working on: Siege, a
DBMS written in Haskell [1]. Siege uses roughly the same approach; I didn't
know anyone else was working on a distributed immutable DBMS, so this is
really exciting.

[1] <https://github.com/DanielWaterworth/siege>

~~~
gtani
this was mentioned in /r/cloj

[http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-17...](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html)

------
rbarooah
I'd like to know how its model of transaction isolation works given that reads
and writes are claimed to be independent.

It seems as though a 'transaction' is defined as an atomic set of updates, but
doesn't involve reads.

~~~
psykotic
> I'd like to know how its model of transaction isolation works given that
> reads and writes are claimed to be independent.

Any MVCC-style model allows full concurrency between readers and writers. The
bigger problem is managing concurrency between conflicting writers in what
amounts to a distributed database system. None of the material on Datomic's
website explains how they intend to tackle that issue, which seems especially
tricky with their model of distributed peers. All they say is that the
Transactor is responsible for globally linearizing transactions and that this
is better than existing models. However, if there is a genuine conflict, the
loose coupling among peers seems to make the problem much worse than existing
models, not better.

I'd love to know more details.

~~~
jamii
The FAQ says that writes favour consistency over availability, so I guess that
means synchronous calls to the transactor.

~~~
olivergeorge
Some kind of compare-and-set! operator which occurs at the transactor perhaps.

Update:

1\. you can do synchronous transactions.

[http://datomic.com/docs/javadoc/datomic/Connection.html#tran...](http://datomic.com/docs/javadoc/datomic/Connection.html#transact\(java.util.List\))

2\. transactions can include data functions.

"The database can be extended with data functions that expand into other data
functions, or eventually bottom out as assertions and retractions. A set of
assertions/ retractions/functions, represented as data structures, is sent to
the transactor as a transaction, and either succeeds or fails all together, as
one would expect."

------
spitfire
Okay, I don't quite get this. The processing gets moved to the client. But
what if the dataset involved is too large for the client to hold?

~~~
edwardw
Get from server through network, of course. In the meantime, peers cache
"facts" using LRU replacement policy.

~~~
swalsh
So lets suppose I have several billion integers sitting in a data store, and I
want to sort, count, and sum them. Do I have to collect all this data to my
local cache first? What if millions of people are using my application who
want the same value?

~~~
lukev
Remember, the 'peer' doesn't have to be embedded in your front-edge
application (even though that's one use case). You could have a single 'peer'
which sits on it's own beefy server expressly for this kind of calculation.

~~~
sausagefeet
But ten what is then what us the benefit over just using Postgres there? And
tossing a transaction ID sequence number onto your rows?

~~~
snprbob86
It's about writing software which is simpler and more flexible.

If you write a traditional shared-nothing web app client with a traditional
bag-o-sprocs database server, you'll probably be good as long as your workload
doesn't change much. Assuming your write volume never exceeds that of a single
(as beefy as necessary) server (which seems to be working out so far for
Hacker News!) then you're ok.

However, products/services evolve and requirements change. Let's assume, for
example, that you want to do some heavy duty number crunching. This number
crunching involves some critical business logic calculations. Some of those
calculations are in sprocs, but some of them are in your application code's
native language. How do you offload that work to another server? You may have
to juggle logic across that sproc/app boundary back and forth. It's pretty
rigid; change is hard.

You can think of Datomic as a way of eliminating your sproc language and
moving the query engine, indexes, and data itself into your process.
Basically, you get everything you need to write your own database server.
Furthermore, you can write specialized database servers for specialized
needs... as long as you agree to allow a single Transactor service to
coordinate your writes.

Back to the big number crunching. You've got the my-awesome-app process
chugging along & you don't want to slow it down with your number crunching, so
you spin up a my-awesome-cruncher peer & the data gets loaded up over there.
Now you have the full power of your database engine in (super fast) memory and
you can take your database-client-side business logic with you!

Now let's say you're finding that you're spending a lot of CPU time doing HTML
templating and other web-appy like things. Well, you can trivially make
additional my-awesome-app peers to share the work load.

You can do all this from a very simple start: One process on one machine.
Everything in memory. Plain-old-java-objects treated the same as datums. No
data modeling impedance mismatch. No network latency to think about. You can
punt on a lot of very hard problems without sacrificing any escape hatches.
You get audit trails and recovery mechanisms virtually for free.

Again, all this assumes the write-serialization trade offs are acceptable.
Considering the prevalence and success of single-master architectures in the
wild, that's not a hugely unacceptable tradeoff. Furthermore, the append-only
model may enable even higher write speeds than something like Postgres' more
traditional approach.

I hope my rant is helpful :-)

------
nickik
Nice little find in the comments of a blog. Rich Hickey himself speakes about
some of the things people probebly care about
<http://blog.fogus.me/2012/03/05/datomic/>.

------
nuttendorfer
Can't see content on any of the pages in Opera.

~~~
vetler
Yeah, me neither. Pretty strange.

~~~
rml
I'm also unable to read in Opera. Though I don't have JS enabled by default.

------
dpritchett
Neat to see that there's a VM appliance available on launch day. Downloading
that now, gonna give it a spin!

------
JulianMorrison
This strikes me as yet another NoSQL with a niche in which it will be great.
In this case, it's good for a read heavy application with minimal writing,
where its working set is a small subset of the total data set and you care a
lot about write consistency. It would fail in a smoking heap under heavy write
load (single global lock, and the need to push every write to every client
cache). It would blow the cache if you tried to do a range scan.

~~~
nickik
The Idea is that the transactor does a very small amount of work an does can
scale much better then other "single point bottelneck". The problem is still
there but smaller that way.

Read the comment here, it gives some information on the problem your
discribing: <http://blog.fogus.me/2012/03/05/datomic/>

------
mark_l_watson
Reading through the site reminds me of append-only CouchDB (or even better,
BigCouch) both Datomic datoms and CouchDB documents have time stamps so the
state of data is available for different times.

This looks new: local query peers that cache enough data to perform queries (I
don't understand how that works, but it looks like indices might be local,
with some data also cached locally).

Also interesting that it seems to use DynamoDB under the hood.

~~~
lukev
It _can_ use DynamoDB under the hood, as a data store.

The data storage system is strongly decoupled from the transactor and the
peer, so there are a number of options, ranging from a filesystem to S3 to
DynamoDB.

------
danieljomphe
Thus Datomic would be very great for centrally-operated systems, but not so
much with highly distributed systems where many peers are often partitioned
out because, for example, they have no Internet connectivity for a few days,
and they still need to operate within their limited universe.

So if such a highly distributed system was to use Datomic, it would be harder
to guarantee that each peer can work both for reads & (local) writes while
being partitioned from the transactor. One would need to program the software
to log those new facts (writes) locally before submitting (syncing) them to
the transactor. And make that durable. Also, one might also need to make the
query/read cache durable, since there's no network to fetch it back in case of
a reboot of the peer. So it seems there's a missing local middleman/proxy that
needs to be implemented to support such scenarios. At least, thanks to
Datalog, the local cache would still be able to be used with this log, using
db.with(log).

What do you think, is this use case quite simply implementable over/with
Datomic, without asking it to do something out of its leagues?

~~~
weavejester
I don't believe there's such a thing as local writes in Datomic. All writes
appear to go through the transactor to maintain atomicity.

~~~
danieljomphe
Right. So the only way to make Peers resilient to network partitions is to
install a middleman between them and the DB/Transactor. One whose
responsibility is to ensure this Peer's app always has durable access to
everything it's ever going to need to be able to read for its queries, and
always has durable access to some local write log that doesn't exist in the
current implementation.

Thus my question is: is introducing such a middleman in the system going to
denaturate Datomic?

~~~
weavejester
I don't believe Datomic is designed to operate in a scenario where Peers don't
have network connectivity. The local cache Peers keep is to cut down on
network traffic and improve performance, not as a reliable "offline mode".

~~~
nickik
Seams to be true but the intressting part is that peers can be made parallel
and if one datacenter explodes you can go to an other without losing
information. The only "Single Point of Failure" is the transactor and only for
reads.

~~~
weavejester
You mean only for writes :)

------
justindocanto
From a UX point of view, i didn't realize there was a menu until i scrolled
down and your js menu popped in on the top. once that happened I scrolled back
up to see where the menu initially was, because why would they do a pop-in
menu if there wasnt one initially. ah ha! i see it. my eyes completely looked
over it. yes, i realize it's giant but it's also about the same size as an ad
banner (which my eyes typically just ignore). also, the colors are quite bland
and do not set any type of priority. just some constructive criticism for ya.
good luck!

------
swalsh
Clojure is an amazing language, so i'm willing to go the extra mile to attempt
to understand this work. However there's one thing that I can't get over. From
my understanding, the big idea is the query engine is brought local, and the
storage would eventually come local too. It seems like for smallish db's this
is fine. What happens though if you're working with a rather large database?

Additionally, If local means the users client, how is security of the data
ensured?

~~~
nickik
What does this have to do with clojure? This system could have been build in
any language, clojure only uses some of the same ideas (working with values).
I cant help you with your question, sorry.

~~~
swalsh
From my understanding the same guy wrote both. My point was that I personally
view Clojure as a bit of genius.

~~~
nickik
Right. Agree.

------
bilalhusain
As a developer, I find datomic easy to use. The getting started, running
examples, tutorial, reference, the in-memory environment, the downloadable
appliance - everything is so smooth. Last time I had a similar feeling was
when I tried CloudFoundry.

Things should be like this - intuitive, some seed data and kickstart code w/
just enough documentation for when you get stuck.

------
vdm
Webdevs will be all over this when the Peer runs on Javascript runtimes. Who's
taking bets that it's written in Clojurescript?

~~~
drcode
I'm pretty sure the whole thing is built on the JVM, but I agree with you that
having a peer run inside a js browser app via clojurescript would be a logical
next step. (and arguably really useful)

~~~
edwardw
From FAQ:

Is Datomic just for JVM languages? At the moment, yes. We have ideas for how
to enable Datomic on non-JVM languages while preserving as much of the
embedded power as possible.

~~~
snprbob86
That would be freaking _awesome_.

However, my big concern here would be security. You'd need to be able to
supply a predicate for which datums are allowed to be synced to the client.

------
makepanic
somehow the page is broken in opera. Disabling the content: " ."; in #main
resolves the problem.

------
yvdriess
Datomic reminds me a lot of the tagged-token dataflow architectures of the
day. Really cool.

------
gtani
Very interesting. It occurs to me that clojure will be widely adopted without
a killer app, but a few near killers (I thought incanter and a web app
framework around ring and enlive would be the first).

------
locopati
The idea seems very interesting, but the non-free aspect of this seems likely
to limit its uptake. I cannot install a version of this for small-scale,
personal, or not-for-profit needs other than using a non-durable VM that saves
state only when suspended. Even if I buy into the Datomic pricing model and
that pricing is not prohibitive, I am still bound to Amazon's pricing model
(though hopefully that will expand over time to other cloud services to
prevent vendor lock-in).

~~~
drcode
I think Rich & the gang need to focus on a niche market for now as this
technology matures. I expect they will work with a handful of enterprise
clients for now but roll out "convenience features" in the future (i.e. easy
to use and inexpensive hosting for smaller customers.)

~~~
locopati
Certainly that's one approach. The one that seems more likely to generate
widespread use is to put a tool in the hands of lots of developers, making it
easy for more people to get involved and for unexpected uses to happen (e.g.
MySQL v Oracle - though I'm not suggesting that the cost scales are equivalent
here, it's merely the first example that comes to mind).

~~~
jwhitlark
Rich & co. have historically been more interested in being correct than being
popular, especially at the early stages. I would expect to see more components
show up fairly quickly, but not before they're ready for use.

------
twodayslate
Can someone explain what I can do with this thing? Can I use it to backup all
my files and make a dropbox sorta thing? I don't understand.

------
jfarmer
Why is this interesting? It sounds like yet-another data store.

~~~
nickik
I disagree its not at all like anything I have seen befor. It has a very
powerful querylanguage you can use your hole programming language not just
what the designers wanted (like in SQL).
[http://www.youtube.com/watch?feature=player_embedded&v=b...](http://www.youtube.com/watch?feature=player_embedded&v=bAilFQdaiHk)

There is other novel stuff.

