Hacker News new | past | comments | ask | show | jobs | submit login
Datomic Free Edition (datomic.com)
132 points by zachallaun on July 24, 2012 | hide | past | favorite | 78 comments



My nutshell answer to "What is datomic?":

It is a database system for Java or Clojure where not all of the database code runs on the server. Instead, the server is fast and "dumb" and all the work for your queries runs on each client individually, which just cache chunks of data from the servers as needed. It turns out having your queries run on the clients has nice properties in terms of query language power and performance.

Additionally, it has an extremely powerful and elegant way of tracking historical information: No data is ever forgotten, so that if you want to say "run this query against the database as it was last Wednesday" you can do that. In a typical production application, much of the complexity in the database comes from having special auditing tables that keep track of historical info- In datomic you don't need to have such tables.

Finally, it can do all of these things while still being highly scalable (with a small caveat that it might not fully scale in the rare case where you have an app that is extremely write-heavy.)


Also in the revolutionary category, all data is treated as immutable, and what we traditionally consider an update to a piece of stored data (with the change being made in-place of the original datum) is treated as a new fact with just the time being different (a new record with the change in value is created along with the time of change). Since data is never changed, only added, there is no need for the traditional transaction locks, etc., thus providing a whole new way to manage and process your data and providing a perfect audit trail in the process.


I did say "no data is ever forgotten" which is a way of explaining immutability to a layman :)


Yes you did. As well as the automatic audit trail. I just wanted to emphasize the implications this has for the design of the Datomic engine. :)


How is that revolutionary ? Time series databases have been around for years (have a look at KDB as an example).


I stand corrected. Please replace "revolutionary" with "great feature". :)


Great summary!

I'd also add a sentence about transactions being first class. Everything is wrapped in transactions, you can add metadata to transactions (who did it, when did it happen), and since no data is lost, you can use transactions IDs to see what your database looked like 1 week ago, etc.


Yes, and there's tricks where you can add speculative future data, which won't actually be stored in the DB. No doubt helps with projections.

Also I like the idea that queries can invoke your own code. (Which is feasible since the queries execute on the same machine as your app.)


This ought to be front and center on their site. What you wrote there turned a bunch of meaningless buzzwords into a useful product description. I'm now actually interested in the product. Well done!


I think if you're aiming for any sort of broad adoption (are you?), then you'll have to add... SQL.

This may sound like heresy, but in practice the following line is going to be a showstopper for most people:

  Peer.q("[:find ?entity :where [?entity :db/doc \"hello world\"]]", db); 
I don't want to learn a new query language. And I'll most certainly not even try to re-train my team on it. Maybe later I'll become curious about that fancy datalog-language to enable advanced features. But in the beginning, if you want my mindshare, you'd better spoon-feed me.

Take this lesson from cassandra. They started out with Thrift and an extremely cumbersome query-interface, which fit the cassandra data-model perfectly but not the brains of the developers.

Now recently they exposed their query-language as a SQL dialect, and suddenly cassandra is a joy to use.

You should do the same. It will be a lossy abstraction. You'll need weird, non-standard constructs to accommodate the peculiarities of your model. Purists will cry in horror.

BUT: It will look like SQL and roughly work like SQL. Everybody and their dog already knows SQL. People can jump in and bang out "select foo from bar ..." without thinking. People can re-use an entire galaxy of SQL-related tooling and knowledge that has evolved over decades. Most importantly: People can start with something they know and then adapt at their own pace to the great new things that your datastore enables.

Don't underestimate this if you're aiming for the mainstream.


I'm pretty sure SQL will be added to datomic over Rich Hickey's dead body, and that's a good thing IMO.

There are already enough SQL databases in the world. rhickey isn't looking to make a 'me too' database, he's interested in improving the state of software development.

SQL is a terrible language and needs to die. It's fine that we disagree, but don't tell the people who are convinced it's a bad idea that they need to 'join' the mainstream.


It's not about making a "SQL-Database". It's about providing an interface that resembles something people are familiar with and that can be reasonably fluently typed in a REPL without breaking fingers. Just like Cassandra did with CQL and hbase does with HQL. SQL-syntax just happens to have stood the test of time for this kind of application.

SQL is a terrible language

No disagreement here. Just out of curiosity, what would you call this language then: [:find ?entity :where [?entity :db/doc \"hello world\"]]?


Clojure exists because rhickey didn't like the current state of programming languages. Dataomic exists because rhickey didn't like the current state of DB design. SQL is an anti-goal.

"Familiarity" is a terrible metric for new things. New things are not always familiar. As rhickey said in one of his talks, there's a difference between what's simple, and what's familiar. "I don't know German, does that make it unreadable?". German may or may not be the ideal thing, but the fact that it isn't english is not an argument against it.

> Just out of curiosity, what would you call this language then: [:find ?entity :where [?entity :db/doc \"hello world\"]]?

Datalog (or a dialect of it). Datalog is a variation on prolog, designed for querying databases. Just as there are many lisps, there are many datalogs. This one is interesting because datomic querys are valid clojure data structures. It's a great example of code as data. It's trivial to write clojure functions that return data that can be passed as a query to datomic. If you wanted, you could probably even write datomic queries that return other queries. Try doing that with SQL ;-)


Thanks, I understand (now).

I made my first comments under the assumption that this aims to be a general purpose database, but multiple people have made clear now that this is not the case.

Obviously it makes no sense to argue for an intermediate QL (and one as half-baked as SQL) when the project is ultimately aimed at Lisp-purists[1].

[1] This is not meant derogatory, it's just a critical distinction from a DB that, say, my junior-admin who knows his SQL and Python and not much else could be expected to get along with.


Just out of curiosity, what would you call this language then: [:find ?entity :where [?entity :db/doc \"hello world\"]]?

What exactly do you see as the problem with this example? Or with SQL for that matter?

Honestly, I'm interested in what your criticism is. Is it the syntax?


Yes, the syntax (or lack thereof, whichever you prefer).

Obviously I'm not entitled to tell anyone how to design their databases. I'm just saying there's good reasons why so many DBs stick to a SQL-like syntax, and that is because the alternatives are usually worse (think: familiarity, tooling, scripting, REPL, etc.).

For example, using the MongoDB REPL, which is probably close to what a datalog-REPL would look like, is rather painful.

This may all be a non-issue when a LISP-language is exclusively used on the client-side. That query-style probably just snaps in naturally there (I don't know lisp).

But if the database is supposed to be general purpose, accessed with lesser languages, scripted, quickly fixed by half-drunk humans at 4am in a REPL... then it certainly matters to have a sane intermediate lingo. Otherwise, at the least, every client-platform is going to invent their own.

I think the example of Cassandra is really a good one to study. They went through an interesting learning-process that seems very applicable here.

Edit: Please see my other comment above. It seems I have misunderstood the project goals and this is in fact not meant to become a general purpose DB. Under that premise my concerns obviously don't apply...


Datomic is, as far as I'm aware, a general purpose database. What it isn't designed to be is a familiar database.

SQL is what people are familiar with, and even some NoSQL databases have distinctly SQL-inspired query languages (e.g. SimpleDB). The problem is that SQL is a godawful query language, and if we want to do better we need to do something different, and therefore unfamiliar.

SQL tries to look like natural language, which has resulted in a syntax is complex, inconsistent and monolithic, just like the language it tries to imitate. If we want a syntax that is simple, consistent and modular we need to throw away the idea that a query needs to read like English.


This isn't obvious if you don't know Clojure, but that's actually a data structure. In Datomic, queries are data you pass to the database, not a special syntax at all.


Knowing the guys on the development team for datomic, I think their #1 concern is to produce a highly reliable and well-designed system.

Also, they are big on developing systems that are very modular, and since datomic is built on Lisp, adding a SQL frontend would certainly be very easy (like a weekend project.)

I'm sure once these guys have higher priority features addressed, you'll see a SQL feature as well (if folks from the clojure community don't beat them to it and create it first.)


Davy Suvee created a Dataomic Blueprints implementation (http://datablend.be/?p=1641) so you can run Gremlin (https://github.com/tinkerpop/gremlin/wiki) queries on it.


Great that this now free for small projects. I've been pondering two questions about Datomic: (1) immutable data is great, but what if we have to delete some past data, for privacy or regulatory reasons? Does this screw everything up? (2) can we modify values as-at past points in time? Example, an electricity company records usage monthly for customers, but sometimes past usage is incorrect and needs to be edited. What would be the best way to structure this to get the "updated" view as-of six months ago? And also the "original" view as-of six months ago?


>> (1) immutable data is great, but what if we have to delete some past data, for privacy or regulatory reasons? Does this screw everything up?

I recall reading about the ability to clear out old transactions, to account for fast-growing databases. For example, "delete all transactions and facts older than 2 months". 2 minutes of googling didn't yield any results though, sorry.

>> (2) can we modify values as-at past points in time? Example, an electricity company records usage monthly for customers, but sometimes past usage is incorrect and needs to be edited. What would be the best way to structure this to get the "updated" view as-of six months ago? And also the "original" view as-of six months ago?

If I understand you correctly, you do this by adding a new fact with the correct usage. Your queries will now return the latest number, and you can use `as-of` and so on to get the old value.


For (1), Use a "mark as deleted" option, with extremely strict permissions on viewing deleted data, to protect privacy.


That's great - Thanks Rich and Stu :-)

I am using Datomic on a friend's project right now. We will certainly use the Pro version, but a free edition will help promote the platform.


Hope your friend's project will give us a chance to meet in person. I have enjoyed you blog.


He is in your office right now.


that was quick!


For those curious on what the heck datomic is, I found this podcast helpful:

http://thinkrelevance.com/blog/2012/04/26/thinkrelevance-the...

The key to me is that you can run multiple queries against the same point in time, thereby having a sound basis for the data you are working with. Compare this to working with traditional RDBMS having to get it all done in a single query to have a coherent view of the world, or roll the dice by running multiple queries knowing that they all might not be running against the same view of the world.


How is this different than isolation / repeatable reads in SQL transactions?


a) It can occur over time and place - i.e. I can communicate a basis point (tiny, a long) to you, and 3 days later you can look at exactly what I saw

b)It doesn't require any coordination, read locks, MVCC etc


I have to say that the introduction of a turn of the century enterprise software pricing model is not a welcome thing.

I think the subscription model is a much better approach in todays market. In particular since I think Datomic's early adopters are going to be mainly startups and not larger enterprise customers.

Their initial hosted transactor model was a good idea as well. The problem I think was that it was hard to come up with a way of pricing it in a way that people could understand.

Here's hoping they will return to our add back either the subscription model or hosted transactor in the near future.


Maybe I'm just lazy, but the biggest impediment to using Datomic right now is that I don't want to deal with the hosting and configuration that's required.

If I could just go to datomic.com, log in and click "create new database" and call it "my_new_app_database" and then immediately use that database from my app that would be fantastic.

(In short, I wish there was a heroku for datomic... admittedly, paying customers probably aren't impeded by such newb issues.)


I agree I would like this.


I think the neatest outcome it that this may encourage more people to play around with datalog. Coming from a SQL background, I used to say that I wanted first class functions in SQL. Who knew that this little restricted version of prolog would enjoy such a renaissance? See also cascalog . . .


I have no idea what Datomic does. This is a good page to reach new users.


The pricing (& "Get Datomic!") link is broken for me, forwards to https://sites.zoho.com/sitesetup?domain=datomic.zohosites.co... which opens blank.

I needed to copy link ( http://www.datomic.com/pricing.html ) & paste it.


Links were updated, you might need to refresh


yup, works now


Great to see a second tier of Datomic that allows broader use of the platform and expansion of the community. If I wasn't using Datomic in dev already this would definitely make me more apt to try it out and experiment.

The compatible APIs and relatively straightforward upgrade process also makes it appealing. Well played.


Just thought I'd risk proposing some possible architectures to confirm I see how the licenses (and datomic) applies.

Standard web application using free edition. Three servers. Architecture is DB storing data locally, a web server and another server (say for slow report queries, an API or web worker). Could use beefy servers to scale up.

Business application using pro license allows better redundancy and scalability. Can use different data storage options. Can break beyond limits of 2 servers for processing and request handling.

The pro's peer license cost per process (which I think means cost per grunty server cost) is at worst $800 up front plus $400 per year.

Note: This is unlikely to be quite right.

Updated: thanks to RichHickey.


The free topology is 3 servers, one of which is the transactor (the one hosting the db in free edition). You get 2 peers _in addition to_ the transactor (connecting to it). So you could have one additional server beyond the web server, or 2 API servers serving a non-peer web tier etc.


Thanks Rich. Updated.


Hmph. They started out with the "pro" version free for opensource projects.


Could someone explain what an "embedded durable storage engine" provided by the transactor-local storage is? Where is the data actually stored durably? Could I run my personal website with pet projects with datomic using the free edition and have the data stored safely (e.g not an in memory demo database?)


Transactor local storage is to a directory on the transactor's local disk, and is as safe as you think disks are.

Also: you can backup from one storage system and restore to another. So if you decide you want to upgrade from local disk to DynamoDB, no problem.


Thanks for the clarification. Sounds like one could get away with using the free edition for a pet project website along with some periodic backup of the directory on the transactor machine, but any serious project would wish to upgrade to the DynamoDB version.


It's really not a different story than it is with any traditional filesystem-based database, and as durable.


If you want to try the free version they released yesterday make sure that you grab a fresh copy (datomic-free-0.8.3343 is working fine for me). I suffered greatly with the initial datomic-free-0.8.3331 release before my customer gave me a heads up to refresh.



This looks like a cool product. Consider linking to your FAQ or including a blurb about what Datomic is in the announcement -- I had to go digging to figure out WTF it was as opposed to what it was compared to the Pro version. Cheers :)


So basically, it is a triple-store with keeps track of time and no mutation of past.

They have their own query language but it looks to me that they did a great job at making it as close to SPARQL as possible, which personally I am familiar with.

Pretty neat stuff.


It is important to understand that datalog is not a new query language. It is an old one, with a strong pedigree. It has power equivalent to the relational model plus recursion.

SPARQL is cool, and they did a good job making it look a little like datalog.


SPARQl1.0 is equivalent to a datalog without recursion. Wonder what SPARQL 1.1. with property paths is equivalent too. Any way a hobby project idea I won't have time for is implement the datomic api on top of a systap bigdata quad store.


Is there a hard memory cap for transactor local storage in the free version? Or is it more along the lines of: "we'll politely ask you to switch over to the paid version if you're eating up massive amounts of disk space"?


Nope - no limit other than available disk space.


Why do people keep running blogs that are totally inoperable with mobile browsers? I'd like to have a more intelligent comment, but I was unable to scroll without activating links.


Ugh, per-process licensing? What is this, 1995?


I saw this page, and was not surprised that it doesn't say what Datomic is. A lot of people make that mistake.

So, I clicked the link at the end of the page to go to your website. I followed some more links. I found three links that talked about feature, benefits, and the architecture, and clicked all three opening them in tabs.

I read the first part of every page and then skimmed the rest.

I have no damn clue what this is. Its something database. It has immutable facts from the past. Why would I use it?

All of your pages talk about features that might be important to someone who's considering the software, and about how it works, and stuff like that.

But never, in clear terms, what it is.

I have the same problem working on what we're doing here, what it is is new, so there's not an obvious "We're an X" or "We're an X with a Y" that I can give.

The best bet if that's the case for you, probably would be to do something like "Say you've got a Q and you need to W it while you're Z is already at X, if that's the case then Datonic will do P, O, and R for you giving you result Z!"

Or whatever is appropriate for whatever it is datomic is.

Just a constructive suggestion. Who would use it and why?


This is exactly what happened to me. I wanted to know what it is, opened those pages in tabs, read half of first one and skipped to the second one. It was soooo long (http://docs.datomic.com/tutorial.html) I knew I couldn't finish reading it in less than 5 hours and came here to ask: What is this, and why/who would want to use it? It could be useful for a project I'm working on, but I just couldn't understand what the damn thing is. And the fact that I have a headache obviously doesn't help.


Probably the best way to get your mind around Datomic conceptually would be to watch these introductory videos[1] Rich and Stu put together when first introducing Datomic. It's pretty revolutionary, IMO.

Edit: In response to those too pressed for time to watch the videos, you can read the rationale[2] for Datomic. However, it is a bit long. Rich and Stu may want to summarize it for the really pressed for time. :)

[1] http://www.datomic.com/videos.html

[2] http://www.datomic.com/rationale.html


I've seen a number of comments over the last few weeks to the effect of "if you don't understand this product, watch the intro video".

In my opinion, if a product can't be described in a couple of concise sentences, it's going to run into some major roadblocks in getting traction.

Watching a video incurs a number of costs - the time taken to watch the video, the effort of plugging in headphones, the pain of buffering content. All of these are minor obstacles, but you want to eliminate as many reasons to leave as possible when attracting a new customer.


This isn't meant as a slight to you, but I think that Clojure in general and Datomic specifically (as a newer product) isn't looking for the most customers. They're looking for the right ones that share their vision for programming.

It still requires watching a video, but the video "Simple Made Easy" by Rich Hickey (http://www.infoq.com/presentations/Simple-Made-Easy/) describes it best. If that doesn't appeal to you, then don't worry about what you're missing in Datomic.


Actually, someone did a great job succinctly explaining it: http://news.ycombinator.com/item?id=4286701

The attitude that Clojure is only for some enlightened few who are worthy enough to understand it is extremely alienating. Lisps aren't that hard, homoiconicity isn't that opaque, the benefits of using Clojure can be explained in practical terms that most developers can understand, if not at first be convinced by. All these concepts can be explained succinctly in text.

Saying that Clojure is only for those who share some 'vision' is hand-wavy at best, insulting at worst.

Edit: My guess is that Datomic is targeted towards Clojure devs because they are already experienced with Datomic's philosophy and will understand the system better. It also provides a smaller and highly receptive market. They can then focus on perfecting the software instead of training a large number of people in the philosophy behind Clojure and Datomic.


I didn't say that Clojure is for enlightened few or super hard to learn, I meant that Datomic/Clojure is currently targeted to people who already agree with Hickey and the other core developers. The 'vision' isn't some grand thing, just a strict adherence to simplicity and immutability as a way to improve program correctness.

I think we're in violent agreement.


Anyone here who knows the product want to have a go at proving a simple tag line, short about that would be suitable to use on their inbound marketing blog?


I think the guy who'll be interested in Datomic isn't going to be looking for a data store describable in a few sentences, like "NoSQL with feature X". It looks to me like a NoSQL-ish Datalog with Clojure-style concurrency, reified by its distribution model. You would only go with that sort of thing if you know how it works, or are interested in experimenting and learning about it. It's not something the guy looking for "data store, alternative to SQL, meeting requirements X Y and Z" would choose.


> It's not something the guy looking for "data store, alternative to SQL, meeting requirements X Y and Z" would choose.

Actually it seems like it's something that this hypothetical guy would be very interested in, given that Datomic is being pitched as an alternative to SQL.


I watched the first video. It wasn't much better than thee docs; still a little obscure and was just "talk" without any solid examples or use cases. But finally, at around 13:30, they get to say what they mean by "fact". If you want to know if it's something that can be useful to you, you can just skip to this part of video: http://www.youtube.com/watch?v=RKcqYZZ9RDY&feature=playe...


Ok, but what does it do? In one sentence.


OK. The FAQ is better, but not ideal: http://www.datomic.com/faq.html


This talk by Rich Hickey is what got me into Datomic.

http://www.infoq.com/presentations/The-Design-of-Datomic

I would probably have used it for the system I'm making now, as it's a very good match to our requirements, but 0.1 is 0.1, and the docs are kind of sparse so the learning curve for Datalog queries was too steep.


Same. Scanned the page hoping to find "Datomic is a ______ that helps you _____..." sort of sentence, and after not seeing it in about 5 seconds or so, I clicked back, and then clicked on the comments link to see if someone had described it.


Well, I got stopped as soon as I hit the Javascript-dependent template.

I realize I'm a bit on the fringe, here, and I've complained about it before. But I do not like "basic" web pages, e.g. this blog post, becoming dependent upon local execution of arbitrary code (in which camp I'm not including HTML and CSS).

For me, it's a matter of security. And I'll continue to whitelist, based upon established trust.


I realize this probably doesn't address the premise of your post, but Datomic is, by way of my personal understanding/analogy, a giant, multi-indexed SQL table which has 4 columns - a subject column, an attribute column, a value column for the attribute, and a transaction identifier column. The table is streamed to consumers into an LRU cache.

The transaction identifier column represents time which enables you to query the state of the database as of a point in time (e.g. yesterday, last month, etc.) for reporting. The subject-relation-object structure supports adhoc NoSQL-style changes to the schema. Transactions themselves are first-class concepts, so you can add relation-object pairs to them as well.

Their hypothesis seems to be that querying client-side data is faster than server-side data (obvious), and that updating data can be effectively synchronized by a dedicated transaction coordinator (potentially). Although, this is a single point of failure, the claim is that this code can be audited and perfected.


So I had the same situation, I kind get what it is, however I figured, if it turns out to be popular, I will hear about it again and I can then spend some time with it. Taking into account novelty factor, they should focus a lot on explaining it to people.


Same exact response I had. No clue what this service actually DOES.


Who cares! It's free, evidently, and free must mean worth having, right?


It's a project that allows the Clojure guys to monetize their work.

Go ahead HN Lisp lovers, kill my karma!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: