
Datomic Pro Starter Edition - puredanger
http://blog.datomic.com/2013/11/datomic-pro-starter-edition.html
======
sgrove
Wow, very cool indeed. We worked on an open-source community[1] tool using
Datomic/Codeq[2], and the ops portion was/has been pretty unpleasant. I'd love
to see Datomic-as-a-Service popup, even as a only-meant-for-starter-projects,
just so that experimenting with it could be that much easier. That might be
possible with the new license, depending on how the Cognitect team sees it.

[1] The service is a frontend to Codeq, having imported many Clojure repos:
[http://www.jidaexplorer.com/](http://www.jidaexplorer.com/)

[2]
[http://blog.datomic.com/2012/10/codeq.html](http://blog.datomic.com/2012/10/codeq.html)

EDIT: Added links.

~~~
dustingetz
I would like to know more about datomic from an ops perspective if you feel
like elaborating?

~~~
cgag
Yes, same, I think a lot of people would be interested.

~~~
sgrove
I would ask coolsunglasses as he's been using it in production for far longer
in a bigger installation than we have.

------
coolsunglasses
Using Datomic Pro at work for a data warehouse and soon for another project
for <REDACTED>.

I'm working on a migration toolkit for Datomic. (Migrating schemas, not
between database kinds.)

Ask me questions about the API, my experience with ops, or anything else if
you like.

~~~
dustingetz
Can you ramble about how datomic is from ops perspective for a moment if you
will? thanks

~~~
coolsunglasses
Just to clarify, I don't know what problems sgrove ran into, I've asked him in
IRC and will elaborate if he tells me, but at present, I dunno.

So here's what I have encountered:

Datomic's primary limitation is at the transactor level. Storage is a non-
issue, especially if you're using DynamoDB. Even with PostgreSQL write-
throughput was ridic because of the way they're using the database.

Transactor fail-over I'm not sure I'd take too seriously, but I don't think
they'll drop any writes on the floor either.

You shouldn't attempt to run a datomic transactor on a cheapo VPS. Try to put
together at least 4 gb of RAM. Which is the same as what's currently standard
for consumer laptops. No excuses.

Deploying Datomic itself is straight-forward. The config is super simple and
all you need as a dependency is Java. I'm actually a little puzzled that
people want "Datomic as a Service". Datomic and PostgreSQL were the simplest
parts of my stack to deploy.

You want beefy peers as well so as to make maximum use of the peer limitations
on the license. This isn't as big of a deal as people think because you're not
structuring your app the way Rails and Django apps do on Heroku. You don't do
1 beefy database + 5,000 terrible VPSes. Not only is that a management
overhead (automation or not), but you're better off vertically scaling a
couple peers first before running a shitload of dumb VPSes running your
service.

The running assumption is that you should use DynamoDB, but PostgreSQL was
seriously efficient for our purposes.

Indexing adds write overhead, but given the current limitations on what you
can change in the schema online (nothing), you want to plan ahead here.

If you fail to anticipate something in your schema, you'll have to use a
migration toolkit such as I plan to release soon.

The componentized nature of Datomic makes vertical scaling fairly pleasant as
you can more precisely target what you're upgrading/improving.

You probably want to include a pass-through querying API in your peers
accessing Datomic so that you don't have to run the REST service. This'll
enable access to the data from a long-tail of servers that don't necessarily
need to be that close to the data and are performing dumb queries. This solves
the "but my webapp needs 100 crappy VPSes to perform!" problem.

Any kind of heavy-duty aggregation/analytical workloads against a Datomic
dataset should be performed on a big peer with memcached. Use roll-ups for
pete's sake! You can use a bi-temporal timeline with analytics data but I
haven't fully explored the implications beyond Datomic making querying along
that dimension nicer.

The Doc Brown array store presented at the Clojure meetup last night is a very
interesting option for people doing multi-dimensional slicing of analytics
data.

I'd personally use Datomic as a stand-in for any SQL database doing an OLTP
workload, where previously I would've used PostgreSQL. _Especially_ if history
or reproducible results are important. That having been said, it appears to be
adaptable to workloads I wouldn't have expected (analytics).

------
espeed
This is great. Having a free Datomic version that matches Datomic Pro's
architecture and supports all of its external storage engines certainly
simplifies the scale path and makes design decisions easier.

BTW: The pricing page has more details
([http://www.datomic.com/pricing.html](http://www.datomic.com/pricing.html))
-- scroll down to see a comparison.

------
rdtsc
Can someone clarify for me. If data is never deleted and updated, one can
build a very nice architecture, rich caching clients and all that, which is
what they've done. I like that.

What happens in these 2 cases (and sorry for repeating myself I already
mentioned them below).

1) Runaway data generator. Someone either messed up a test or confused the
units on a timer and now you are logging thousands of times the rate you
expected. All this ends in your database. Does just adding a deletion record
for each on of those "fix" the problem, but if it is immutable doesn't it
still suck up the storage.

2) Sensitive data. Someone somehow shoved plaintext passwords, social security
numbers, ICBM launch codes in the database. What do you do in that case.

~~~
puredanger
Datomic offers excision to cover cases like these (or other common cases where
you are required by law to forget things).
[http://blog.datomic.com/2013/05/excision.html](http://blog.datomic.com/2013/05/excision.html)

Note that even in this case, you can remember that you decided to forget, as a
key premise of Datomic is that you always know how and why you came to record
(or forget) a fact.

~~~
rdtsc
Alright, that makes sense. Now it is interesting how the fact that you decided
to forget would work. I guess if one always accesses the data via the official
API and at the latest known state, that will work. But presumably the data
will still be in stored in binary form in the back-end.

Another perhaps related question, does one have the option to "travel in
time". Say I record that I forgot my mistakenly added passwords. But an
attacker knows when that happens. Can they go back and inspect the data state
right before the point when I forgot the data?

~~~
danneu
Excision removes the data.

In other words, you can remember that you forgot, but you can't remember what
you forgot.

[http://blog.datomic.com/2013/05/excision.html](http://blog.datomic.com/2013/05/excision.html)

What you're describing is plain ol retraction. If you retract your credit card
number, then someone can just surf back in time to grab it. That's why you'd
excise it.

------
VaedaStrike
This opens so many doors all around.

I'm literally giddy at this news.

I want to leave my day job right now and go home and work on my Cognitect
stack web app.

I hope I can quickly get to the point where I need to (and then will be far
more able to) purchase Datomic.

------
Jemaclus
Companies really need to do a better job of making ELI5 descriptions. This
means nothing to me:

> Datomic is a database of flexible, time-based facts, supporting queries and
> joins, with elastic scalability, and ACID transactions. > Datomic can
> leverage highly-available, distributed storage services, and puts
> declarative power into the hands of application developers.

What does that even mean? I'm pretty good at databases, and I know what
transactions and queries and joins and stuff, but this explanation does a
pretty poor job of explaining what Datatomic is and why I should use it.

What's a "time-based fact"? What does putting "declarative power in the hands
of application developers" mean? Don't all databases support queries? Don't
most relational databases support joins and transactions?

It just seems like a buzz-word filled fluff phrase that says "Datatomic is a
database," but I don't know why I would use it over, say, MySQL or Mongo or
whatever.

I had to go to Cognitect's site
([http://cognitect.com/datomic](http://cognitect.com/datomic)) to get a better
explanation:

>Datomic is built on immutable data; facts are never forgotten or overwritten.
This means complete auditability is built in from the start - including the
transactions that modify your database. And because Datomic is built on
immutable data, you can explore possible future scenarios by issuing
transactions against a database and decide to commit them only after verifying
the results.

Ok, that makes a _little_ more sense, but it's still not clear what
differentiates this from other database systems.

I'm not affiliated with them at all, but I think SiftScience.com does a great
job of explaining their product simply and efficiently:
[https://siftscience.com/](https://siftscience.com/)

> Fight Fraud with Machine Learning > Sift Science monitors your site's
> traffic in real time and alerts you instantly to fraudulent activity.

Simply detects fraud. Done.

Or EasyPost.com:

> Shipping for developers > EasyPost allows you to integrate shipping APIs
> into any application in minutes.

Don't just spit buzz words and technical terms at me. Tell me what it _does_.

> Datomic is a database that, among other things, specializes in tracking data
> over time, allowing you to test a transaction before saving the data. True
> accountability from the beginning!

Probably not even accurate (again, I don't understand Datomic's premise), but
that sounds more explanatory than their current buzz word-filled blurb.

To any startups watching: have _something_ on your front page that is an ELI5
description of your product.

~~~
brandonbloom
You're right, their description should be clearer, or have an elaboration or
something. But for onlookers:

> flexible

Minimal, non-tabular schema

> time-based

All historical values are preserved.

> facts

Triple store: Entity / Attribute / Value

with time:

Quad store: Entity / Attribute / Value / TIME

> supporting queries and joins

Datalog query language

> with elastic scalability

Reads are uncoordinated, so they scale horizontally

> ACID transactions.

You can trust the database won't lose stuff you write to it.

> leverage highly-available, distributed storage services

Storage engines are pluggable. You can choose backends such as filesystems,
Riak, Dynamo, traditional SQL databases, etc.

> puts declarative power into the hands of application developers.

Reads occur in-process, like an embedded database. Queries, traversal, and
other reads occurs locally, without network round trips.

~~~
cordite
There's definitely a lot more to ACID than the part you mention, which
fulfills the D or Durability aspect.

Though to any competent database engineer, they already know what ACID is.

The videos on datomic's website help clarify the understanding, but that isn't
as accessible as well written text.

~~~
brandonbloom
> (map eli5 (acronym-expand "ACID"))

Atomic: Transactions are all-or-nothing. You never see half of a transaction.
You either get the world _before_ the transaction, or _after_ it.

Consistent: The before and after worlds are always valid, never corrupt.

Isolated: From the outside, it appears that transactions occur one-after-
another, never overlapping in time. In Datomic, this is actually the case:
Transactions are fully serialized.

Durability: If a transaction succeeds, you can be confident that the
consistent "after" world is safely on disk.

------
stolio
Can anybody explain how the 2-peer restriction works with web-apps? It's my
biggest fear in making the jump.

~~~
coolsunglasses
Two JVM processes, far as I'm aware. In practice, a single peer gets you
pretty far, especially if you're a Clojure user.

~~~
stolio
I guess where I'm getting lost is the terminology - I was under the impression
that end-users are peers[0], so I'm worried about maxing out at two users. But
if the limit is whatever a transactor can handle then I doubt I'd ever hit it
and I might as well commit.

I'm looking at using it as part of a Clojure backend, if that matters.

[0] -
[http://docs.datomic.com/javadoc/datomic/Peer.html](http://docs.datomic.com/javadoc/datomic/Peer.html)

edit: grammar

~~~
redinger
From
[http://docs.datomic.com/architecture.html](http://docs.datomic.com/architecture.html)

"A Peer is a process that manipulates a database using the Datomic Peer
library. Any process can be a Peer - from Web server processes that host sites
or services, to a daemon, GUI application or command-line tool. The Datomic-
specific application code you write runs in your Peer(s)."

~~~
stolio
OK, thanks. It just clicked.

For anybody wondering what clicked: I'd have Datomic running with one peer,
and that peer would serve as many end-users as it could handle. If necessary I
could add a second peer and double the read-query power. And that's more than
enough for my little web-app.

Given the nature of Datalog
([https://www.youtube.com/watch?v=bAilFQdaiHk](https://www.youtube.com/watch?v=bAilFQdaiHk))
I'm getting excited.

~~~
coolsunglasses
Datalog has been revelatory for me and my coworkers.

------
tony_landis
I really like what the guys at cognitect are doing.

------
lucian1900
The name is a bit silly. It should be called just "Datomic Starter".

I'm still hoping for an open source edition :)

------
rsanders
Does the Datomic Console app count against the 2 peer limit when running?

~~~
redinger
Yes, Datomic console counts as a connected peer.

