

Compilers Are Databases [video] - oelang
https://www.youtube.com/watch?v=48js0H6ooBY

======
Rexxar
I think we can go further :

\- every non-trivial software is a compiler

\- every non-trivial software is a database

Every non trivial software have to read/parse/manipulate/store a lot of data.
I think it ultimately permit to simplify thinks to just admit that we are
doing a specialized database/compiler. The fact that input data are coming
from file, database, network, command line or the code itself and are going to
files, database or network is just an implementation detail.

~~~
pjmorris
'Any sufficiently complicated C or Fortran program contains an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Common Lisp.'
\- Greenspun;s tenth rule [1]

[1]
[https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule](https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)

~~~
Rexxar
The real failure of this law is that, even if you use Common Lisp (or every
other language), to implement your software you will finally have an ad hoc,
informally-specified, bug-ridden, slow implementation of half of Common Lisp
implemented in your software.

The solution is to design your software as a compiler from the start.

------
chubot
I watched the first part of this, and this is apparently the inventor of Scala
talking about the Scala compiler.

He was talking about how deep the transformation pipeline in scalac is,
compared to javac.

I would say that something close to the opposite statement is justified:
databases are interpreters, e.g. sqlite is certainly an interpreter with a
storage layer.

I didn't watch enough of it to see why he thinks that compilers are databases.
Where's the storage layer? I'm interested in compilers and databases, but TBH
I am not that interested in Scala, so I stopped watching.

~~~
jzelinskie
The storage isn't the important part. I think his point was more about
_functional_ databases than your typical RDBMS in the sense that the
experimental Scala compiler needed to model time (in terms of compiler passes
and phases) in order to be able to answer "What is the value/type of this
thing _at a given time_?". TBH, the talk was very little about Scala and
mostly about compiler architecture.

~~~
chubot
What's an example of a "functional database"? Datomic? What's the most
deployed functional database?

It's misleading to say that compilers are databases if they're not anything
like databases people use and know. Probably a more accurate title would have
been "Compilers are like functional databases".

Storage is obviously an important part of database architecture. If a compiler
doesn't deal with storage, there's only so much resemblance that a compiler
and a database can have.

~~~
joostdevries
I think "functional database" can be seen as a wider data pattern than just a
datastore that exclusively implements that pattern. It's a design principle
where a piece of data once created is never changed. That leads to event
sourcing or Twitters "lambda architecture".

The compiler is a database in the sense that it has a memory across
incremental invocations. So the 'storage' happens to be in memory but could be
on disk.

------
pjmlp
For those that care about language development on the JVM, the JVM Language
Summit 2015 agenda (this is one of the talks) is here:

[http://openjdk.java.net/projects/mlvm/jvmlangsummit/agenda.h...](http://openjdk.java.net/projects/mlvm/jvmlangsummit/agenda.html)

And there is an You Tube playlist for the talks already

[https://www.youtube.com/playlist?list=PLX8CzqL3ArzUo2dtMurvp...](https://www.youtube.com/playlist?list=PLX8CzqL3ArzUo2dtMurvpUTAaujPMeuuU)

------
bobm_kite9
Martin Odersky is a clever guy, there's no doubt about that. But, at the
beginning of the talk he introduces the scala compiler as having 40 steps. And
he says that translating from scala to the JVM is a translation from a complex
input to a complex output. And now, Databases. So.. I just don't understand
why he would even want to take on such a difficult task.

It strikes me that bringing functional programming to the JVM can / has been
done simpler: Clojure being the obvious one, but F# on the CLR is another good
template.

Third, Frege is a Haskell implementation for the JVM which to my mind is a
much better long-term bet than Scala: it's already a tried-and-tested
language, and the work involved in the translation has taken an order of
magnitude less time.

It's all very well having some new great language on the JVM, but simplicity
is a huge virtue if you want people to build an ecosystem around it.

~~~
justthistime_
> It strikes me that bringing functional programming to the JVM can / has been
> done simpler: Clojure being the obvious one, but F# on the CLR is another
> good template.

Scala's FP is more expressive than all of them, by a large margin.
Additionally, Scala also supports an useful OO system, unlike for instance the
stuff F#/OCaml/... ships.

> It's all very well having some new great language on the JVM, but simplicity
> is a huge virtue if you want people to build an ecosystem around it.

You realize that Scala has probably the largest, and most meaningful ecosystem
of all (non-Java) JVM languages, and is driving a lot of the changes in the
Java and JVM space? (Reactive streams, value types, specialied generics, ...)

~~~
bobm_kite9
Really good points.

I expect you're right about the expressiveness, although I don't really know
exactly what you mean by this. The "buffet of abstractions" available in Scala
actually put me off it. For me, there are too many options available to expect
to have readable, maintainable code at the end of it all, especially if you
work in a team which regularly creates almost wilfully hard-to-comprehend
java.

This video is 2 years old now, but I think it marked the point when I stopped
investigating scala.
[https://www.youtube.com/watch?v=TS1lpKBMkgg&feature=youtube_...](https://www.youtube.com/watch?v=TS1lpKBMkgg&feature=youtube_gdata_player)

~~~
justthistime_
While I agree with many points in the video, it feels kind of disheartening to
me how many people just wave it around as if it could be used as an attack
against the language and its community.

(A general observation, what you say here is totally fair.)

Of course the video is about Scala, but the problems mentioned are widespread
throughout the industry.

What most of those people miss is the fact that Scala is sometime light years
ahead of figuring out hard problems. The things considered to be an issue in
Scala is often stuff that hasn't even been considered elsewhere.

The video is now 2 years old, and in terms of Scala, that's a lot of time. The
strength of the language and the community is that it can adapt pretty fast,
and many issues have been/are being addressed.

There are plenty of libraries out there which experiment with different
approaches to collections, there is a new, faster and cleaner backend, the new
compiler is shaping up nicely, the new IR will solve a lot of binary
compatibility/cross compilation/Scala<->Scala.js issues, and scala.meta seems
to become one of the best metaprogramming abstractions in languages.

Things are working really well in Scala, so if you have any question or
concerns, feel free to voice them, and I'll give you my 2 cents about what's
happening in that space! :-)

------
fla
I guess everything becomes a database if you work at Oracle long enough.

------
jules
I don't understand the part about the variance of the tree type. Couldn't he
just use the unit/top type as a type parameter for untyped trees, rather than
the nothing/bottom type? Not only does that seem more intuitively correct, but
it would also give the right subtype relationship.

~~~
joostdevries
Well, some things do have the type unit or Any (top). How would you express
that if that means 'not known' in your compiler?

~~~
jules
I think I don't understand your question. My remark was about his type Tree[T]
of syntax trees annotated with values of type T. For syntax trees annotated
with types he uses Tree[Type], which makes sense. Then for untyped syntax
trees (i.e. trees without any annotation) he uses Tree[Nothing]. Now he needs
unsafe hacks to get the desired subtype relationship Tree[Type] <:
Tree[Nothing]. The compiler understands that Tree is covariant, so he
overrides this with an unsafe annotation to make Tree contravariant. Using
Nothing rather than the top/unit type (not sure what that's called in Scala,
Any?) is what doesn't make sense to me. Intuitively an un-annotated tree is a
Tree annotated with values that carry no information, not a tree annotated
with values of the uninhabited type. If you use that you also get the desired
subtyping relationship Tree[Type] <: Tree[Top] without any need for unsafe
annotations.

~~~
modersky
Yes, Tree[Top] would have been another way to model this. I wanted
Tree[Nothing] to make clear that getting the type of an untyped Tree gives a
Nothing, i.e. a run-time error. Getting a Top is not quite the same, but it is
almost as good, since only a view things can be done with type Any in Scala.

EDIT: There's one thing I did not stress in the talk and that makes Nothing a
more attractive choice than Top: We want to do copy on write on these trees.
I.e. if we transform an untyped tree to a typed one using "withType", we want
to be able to _reuse_ the untyped tree and just set the type (unless somebody
else had already added another type). If the tree is of type Tree[Nothing], we
can do that safely, even if there are still shared references to the same node
as an untyped Tree. Getting the type of the shared untyped tree will give a
Nothing, and nothing can be done with it. But if the type would be a Top, we
would be able to apply some operations on it, so the copy on write would be
unsafe.

------
qznc
Looking at how often this compiler must allocate (and garbage collect), no
wonder Scala compilers are slow.

------
andrewchambers
About to watch the video, In the past I've thought about trying to model llvm
code inside datomic and doing transforms with the help of datalog queries.

~~~
jules
People have done something like this for pointer analysis (statically
determining to which memory locations a pointer can point, which is a
prerequisite for many optimizations), e.g. [http://yanniss.github.io/doop-
datalog2.0.pdf](http://yanniss.github.io/doop-datalog2.0.pdf)

------
sklogic
I never watch videos (what's wrong with you, post-literates? Forgot how to
read and write?), so commenting on the title only.

Yes, compilers indeed have a lot in common with databases. Datalog (and
therefore a relational algebra) is a very powerful tool for implementing
compiler passes, for asking questions about the AST or various IRs.

EDIT: I'm talking about things like this:
[http://people.csail.mit.edu/mcarbin/papers/aplas05.pdf](http://people.csail.mit.edu/mcarbin/papers/aplas05.pdf)

~~~
steego
It really sad Prolog/Datalog based languages never took off. Our industry has
no problem embedding SQL strings and mini relational engines in our imperative
programs. If you're going to encode your problems in terms of relations, why
embed something something more expressive and powerful?

Good paper BTW, though I wish you hadn't prefaced the post by calling people
post-literates. This paper deserves more reads.

~~~
sklogic
Currently Prolog-derived languages are making a sort of a comeback - Kanren is
widely used, just look at the number of ports and variations. My favourite
recent example of cKanren used inside a compiler is Harlan [1] (see the region
inference pass).

P.S. I'm really getting tired of the ever growing proportion of video vs. text
in the otherwise valuable content. I'm genuinely interested in the topic but
there is no way I can watch it. Dreaming of a service which would transcribe
arbitrary videos for a tiny fee.

[1] [https://github.com/eholk/harlan](https://github.com/eholk/harlan)

