
The Impedance Mismatch Is Our Fault (2012) - simonpure
https://www.infoq.com/presentations/Impedance-Mismatch/
======
jacobobryant
Great video. I watched it about a year ago and it really helped me start to
grok Datomic. I've built several apps on Datomic since, and it's really nice.
For anyone who wants to try it out, I'd also recommend taking a look at
Crux[1]. It's a bit easier to get started with. I'm in the process of moving
my startup over from Firebase to DigitalOcean + Crux right now.

[1] [https://opencrux.com/](https://opencrux.com/)

------
peter_d_sherman
This guy is brilliant!

Excerpt:

16:08

"And locks don't compose... I can't take two arbitrary pieces of code that
each do locking, and then combine them together without knowing the
implementation details.

 _Which means that the entire notion of reuse in OO is broken._ "

Never thought about that before, but it makes a lot of sense...

Now, in thinking about it, I'd change that sentence, slightly, to "the whole
notion of reuse in OO is broken in objects which depend on databases and
specific locking strategies or objects which depend on other objects that do",
but the idea is still a very valid one...

To-do: Rewatch this video in the future...

~~~
AnimalMuppet
OK, but what _does_ compose in that circumstance? FP? The locking is an
external effect, can you just compose two arbitrary functions that each do
locking?

I suspect it's more true that the entire notion of reuse or composability is
broken in _code_ which depends on databases and specific locking strategies or
_code_ which depends on other _code_ that does. It doesn't have anything to do
with objects.

~~~
cryptonector
Transactions compose.

And yes, purity composes (because it requires no locks), and since FP pushes
you to purify, FP helps. You don't have to do FP to have composability, but FP
has great composability.

> I suspect it's more true that the entire notion of reuse or composability is
> broken in code which depends on databases and specific locking strategies or
> code which depends on other code that does. It doesn't have anything to do
> with objects.

Mutable objects impose locking requirements, and locks don't compose easily
because _lock-taking order_ matters (to avoid deadlocks). Yes, you can add
deadlock detection and rollback and retry/abandon on deadlock, but it's a
complication you have to handle because OOPs generally don't provide all of
that.

~~~
AnimalMuppet
> Transactions compose.

But they compose in OO too, right?

> Yes, you can add deadlock detection and rollback and retry/abandon on
> deadlock, but it's a complication you have to handle because OOPs generally
> don't provide all of that.

But FP doesn't provide all of that either, does it?

~~~
cryptonector
Within pure paths, you just don't have this problem. You only have this
problem at the boundary with the storage system (where, indeed, you might have
to retry, e.g., if you have SERIALIZABLE transaction semantics). So FP gains
composability if you can express your program as purely as possible as
compared to an OOP w/ ORM system. And if you have transactions, you can
compose FP programs that construct them because transactions compose. But if
your ORM deprives you of transactions, you won't have that composability in
your ORM.

~~~
AnimalMuppet
OK, that's fine for "within pure paths". But even in FP, you have to take the
lock somewhere, right? And now it's not pure, right? And now you have to pay
attention to lock order, right?

If your ORM doesn't give you transactions, then you don't have transactions.
But FP needs some kind of layer to talk to a database, too, doesn't it? And if
that layer doesn't give you transactions, you're in the same place, aren't
you?

Am I missing something? Or are you missing my point?

~~~
cryptonector
You're assuming that transactions require locks. That's not necessarily the
case with MVCC and SERIALIZABLE semantics, but you may have to retry the
transaction (from scratch) if it fails. This looks a lot like what you'd get
if you were using locks.

 _Somewhere_ , though, there will be locks, indeed. Pushing the locking to the
edges of the system helps keep the core composable. There's no panacea, of
course.

Also, an RDBMS can order locks because once you've decided what
INSERTs/UPDATEs/DELETEs you're doing, they can be sorted and run in a
different order than they were requested (after all, if the transaction is
atomic, then the internal order doesn't matter).

You can absolutely build all of that (transactions, lock ordering, ...) in any
OOP (or FP or other) language of your choice. But that's essentially
recreating part of an RDBS -- if you're after reuse, just use an RDBMS so you
don't have to recreate its functionality. Using an RDBMS w/o an ORM helps you
avoid all the issues raised by TFA. Further, using an EAV design gives you
many benfits as outlined in TFA and elsewhere in the commentary here and on
the web.

~~~
AnimalMuppet
Well, TFA was a video, so I've just been going off of the comments here. So,
if I understand you correctly, the issue is with ORM, rather than with OOP or
with the RDBMS. And you wouldn't use an ORM in FP, so you don't have the
problem there.

~~~
cryptonector
TFA also had slides. I'm not going to say TFV... :)

> So, if I understand you correctly, the issue is with ORM, rather than with
> OOP

Well, there is certainly an issue with ORM. But OOP even absent databases and
persistence, when used as originally intended (with objects abstracting state
and behavior), has concurrency issues.

------
cryptonector
I'm half-way through the video, and he's pushing the Entity Attribute Value
(EAV) model. He's right. He's absolutely right.

There is a duality between the much-hated EAV and table-oriented
representations of data, especially if you are disciplined in your schema
design so that the mapping between your EAV schema and a SQL table-oriented
schema is natural.

I really would like to see automatic duality between EAV and relational models
in RDBMSes. This would give users great query expressive power. EAV is perfect
for "graph" queries, which are essentially recursive queries that chase
relations without having to specify which relation types -- as long as all of
them have the same shape, you can do this with EAV. That means that you can
write queries like "give me all the things this user has access to", or "all
the users that have access to this entity" while traversing arbitrary
relations like "user belongs to group", "group belongs to group", "access
grant", etc.

EAV is, essentially, the most normal form you can construct. And the more
normal the form, the easier it is to apply CRDT techniques should you want to.
Also, if you normalize the keys (see SQLite4[0], yes, _4_ , an abandoned
project, but the docs are still there) then you get to trivially use any
distributed ordered key/value store, and you can distribute the key ranges,
and it's trivial.

Now, EAV is much hated. But that hatred is wrong. The problem is that RDBMSes
aren't exposing a dual of EAV _and_ tables. If you have an EAV schema you can
create VIEWs that look like equivalent table-oriented schemas -- the reverse
is also possible, but somewhat harder. Using VIEWs to bridge the gap imposes a
heavy burden on the query optimizer. But SQLite4 and similar database designs
essentially have EAV stores internally that they could expose with a lot less
complexity than VIEWs impose.

We can haz EAV and table orientation at the same time. We should have it.

Plus, as TFA says, you can combine EAV with timeseries, which allows you to
have time travel in your queries, and even transactions (e.g., run a
transaction now that has an effect in the future), which makes your data
naturally copy-on-write, which greatly improves read performance. Granted, you
have to cleanup unnecessary history at some point, otherwise it piles up
(vacuum), but that's OK.

[0]
[https://www.sqlite.org/src4/doc/trunk/www/design.wiki](https://www.sqlite.org/src4/doc/trunk/www/design.wiki)

~~~
GnarfGnarf
EAV = Entity-attribute-value model.

It's a courtesy to explain an acronym when it is first introduced.

~~~
cryptonector
Indeed. Updated. Thanks for asking!

------
mst
Slides available without a loginwall here: [https://gotocon.com/dl/goto-
cph-2012/slides/iconoclasts/Impe...](https://gotocon.com/dl/goto-
cph-2012/slides/iconoclasts/ImpedanceMismatch.pdf)

------
slowmovintarget
Rich Hickey's talk on the design of Datomic is also something every programmer
should see at least once:
[https://youtu.be/Cym4TZwTCNU](https://youtu.be/Cym4TZwTCNU)

------
Groxx
At 40:00: Time-based data, horizontally scalable, dumb storage, replicas don't
need to be synchronized, you can contact any replica. Sounds like the dream.

But that's a _drastic_ over-simplification of time and consistent responses.
Consistent responses in particular seems like a hard requirement given how
much attention he's given to "why N+1 queries are wrong and you need to do
them as a single operation" (which I totally agree with). And I don't think
this is something that can be glossed over _at all_ because the whole concept
hinges on history being immutable... which means a _major_ synchronization
time bottleneck.

And while he _does_ address this at 46:40, but he seems to treat inconsistent
timestamps as "this is an interesting space, but if you have [inconsistent
time in your db] you have a problem" as if it's a _solved_ problem. Spanner
has arguably the most sophisticated approach at scale right now, with atomic
clocks all over the place, and it _still_ uses MVCC to address the
discrepancies because to do otherwise means major scaling problems.

\---

There's a lot to unpack in here, delivered rapidly and concisely (yay!), and
some good detailed chunks of info, but also what feels like a lot of "these
are wrong and/or difficult for specific and largely-correct reasons" -> "this
could be great because [hands waving furiously] it has none of those
problems!".

Values are great, documents are terrible, programmability is king, I
completely agree. But none of this sounds like it leads to a solution, just a
"these are the tradeoffs we're making, how about we stop trading them off?".
As a thought-provoker it's interesting, but it doesn't seem to directly _lead_
to anything.

~~~
dustingetz
Datomic works, all queries have a time-basis and all results are strongly
consistent, full stop. At a mile high view, it works like git. There is a
central transactor coordinating time, so inconsistent timestamps is not an
issue. To deal with staleness, query nodes connect to the transactor and
subscribe to the transaction log. If you send a query with a recent time-basis
to a stale query node, it can block until it's caught up enough. The lag time
is <10ms for query nodes in same data center as the transactor (and for web
applications, normal server response times are already slower than 10ms)

Datomic is not for Spanner-scale systems, it is an application database
competitive with relational databases.

~~~
Groxx
> _There is a central transactor coordinating time, so inconsistent timestamps
> is not an issue._

This means replicas are synchronized. Or you cannot have replicas. Whether
it's a lightweight "transaction-only" replica or not doesn't really matter
(I'm under the fuzzy impression that postgres's txids serve this same
purpose?).

