
Turning the IDE Inside Out with Datalog - arjunnarayan
https://petevilter.me/post/datalog-typechecking/
======
brabel
Jetbrain IDEs (e.g. IntelliJ) have structural search which is basically a
query language for code: [https://www.jetbrains.com/help/idea/structural-
search-and-re...](https://www.jetbrains.com/help/idea/structural-search-and-
replace.html#to_search_structurally)

I know there's more to just queriability in the post, but I thought this
should've been mentioned when discussing what existing IDEs can offer.

~~~
agumonkey
we're not far from coccinelle-ing the codebase

------
sriram_malhar
Niko Matsakis had an insightful observation about this approach in a similar
post (which is referenced by OP):

[http://smallcultfollowing.com/babysteps/blog/2017/01/26/lowe...](http://smallcultfollowing.com/babysteps/blog/2017/01/26/lowering-
rust-traits-to-logic/)

The key observation to me was that the traditional Datalog/Prolog way of
unifying is through syntactic equality, which is a bit too simple to express
the kind of equality needed in Rust and elsewhere. You can express it in
Datalog, but as it gets farther away from the source, error-generation
suffers.

~~~
tannhaeuser
"Insightful observation" doesn't quite do it justice ;) There's a whole
discipline in discrete math dealing with the possibilities and limitations of
equational theories and reasoning. And it could be said their limitations gave
rise to the various constraint formalisms that were introduced in the 80s and
90s as Prolog extensions and sometimes (syntactic) generalizations.

~~~
YeGoblynQueenne
Well then, please elaborate because I know nothing of the discipline you
describe and it certainly sounds interesting (though it also sounds a little
fussy).

~~~
tannhaeuser
I'm not going to summarize classic equational reasoning (with it's deep
connection to algebraic geometry and whatnot) here in a single HN post ;) I
can point you to some classic works/authors in the field, though (and I'm sure
some fellow HNers can provide some more): Gauss, Knuth, Bendix, Gröbner,
Buchberger, Bachmaier, Euclid, Ganzinger, Davis, Putnam, Robinson, and
Colmerauer for Prolog 2. There are also category theory papers relevant to
reasoning about data structures, and of course Damas, Hindley, Miller for type
theories. The way is the goal here.

~~~
YeGoblynQueenne
>> I'm not going to summarize classic equational reasoning (with it's deep
connection to algebraic geometry and whatnot) here in a single HN post ;)

Aw :(

Alright, I'll just chase down some of the references you say. I was going to
check out Prolog 2 anyway, after browsing the wikipedia article on Colmerauer
a few days ago and seeing a reference to his later work there.

~~~
tom_mellior
To be a bit more concrete than the grandparent's name soup, I liked this
handbook article about unification in theories: Franz Baader and Jörg
Siekmann. Unification Theory. In D.M. Gabbay, C.J. Hogger, and J.A. Robinson,
editors, Handbook of Logic in Artificial Intelligence and Logic Programming,
pages 41-125. Oxford University Press, Oxford, UK, 1994.

This seems to be a newer version with similar contents but different authors:
[http://www.cs.bu.edu/~snyder/publications/UnifChapter.pdf](http://www.cs.bu.edu/~snyder/publications/UnifChapter.pdf)

This is specific to unification and not to broader equational reasoning.

~~~
YeGoblynQueenne
Thanks.

------
kibwen
In the Rust ecosystem, I believe the next-generation IDE support is already
beginning to incorporate some ideas like this, although I believe it uses
Prolog rather than Datalog (I'd be interested to learn more about how they
differ). The logic library is called Chalk ( [https://github.com/rust-
lang/chalk](https://github.com/rust-lang/chalk) ), and it's designed to be a
full trait resolution engine for Rust (trait resolution being a necessary step
to e.g. autocomplete methods on types); rust-analyzer has been using it for
quite a while now, and it's slowly being incorporated into rustc itself. Not
as impressive as basing the entire IDE on queries, of course!

The Rust project is also using the differential-datalog library mentioned in
the OP to underlie their third-generation borrow checker:
[https://github.com/rust-lang/polonius](https://github.com/rust-lang/polonius)

~~~
namibj
It seems polonius is using datafrog instead. Where do you see a differential
engine referenced?

~~~
kibwen
The post linked from the readme mentions it:
[https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-...](https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-
alias-based-formulation-of-the-borrow-checker/) , although it's two years old
at this point so I wouldn't be surprised if things have changed.

~~~
namibj
It's not the 2 years that caused the change. Feel free to read up on Frank's
blog post about how datafrog came into existence:
[https://github.com/frankmcsherry/blog/blob/master/posts/2018...](https://github.com/frankmcsherry/blog/blob/master/posts/2018-05-19.md#acks)

------
simplify
Fantastic work! Datalog/Prolog are old-yet-futuristic technologies that I
wished every developer knew about, as it solves a lot of problems in very
elegant ways (as attested to by this post).

I noticed you (assuming you're the author) used what looks like a non-
traditional Datalog syntax – which makes sense to me as, IMO, Datalog/Prolog
desperately need first-class support for a record-like syntax to finally break
into mainstream. Is there any prior work to this syntax, or did you just
develop it as you needed it?

~~~
tannhaeuser
There's a proposal due to the prolog-commons initiative for a dictionary-like
object (that is also implemented in SWI Prolog, I believe).

And in Prolog you can, of course, just trivially use your own term structure
for item-values:

    
    
        p([ item : "value", ... ])
    

You could even use JSON-like terms in Prolog (with maybe a little help by the
op/3 directive to sort out parsing priorities):

    
    
        A_Prolog_Term = {
          x: y,
          z: [a, b, {1, d} ]
        }
    

But the more fundamental approach IMHO would be to use nested knowledge-base
terms:

    
    
        p(
          q(whatever).
          r(X) :- q(X).
        ).
    

As pioneered in the 1980s for attribute grammars, using ^^ and other special
graph tokens as nested "implies" operator, though (see "definite-clause
translation grammars").

------
pjmlp
Rejuvenating that many devs are finally looking into this kind of workflows.

For prior work on IDE as database check "Energize/Cadillac & Lucid's Demise",
[https://www.dreamsongs.com/Cadillac.html](https://www.dreamsongs.com/Cadillac.html)

You can see it in action on this 1993 marketing video from Lucid,
[https://www.youtube.com/watch?v=pQQTScuApWk](https://www.youtube.com/watch?v=pQQTScuApWk)

IBM also tried the same concept on Visual Age for C++, version 4, which was
one of the very last versions of the product.

[http://www.edm2.com/index.php/VisualAge_C%2B%2B_4.0_Review](http://www.edm2.com/index.php/VisualAge_C%2B%2B_4.0_Review)

Both suffered from too heavy hardware requirements for what most companies
were willing to pay for, otherwise we could have had Smalltalk like tooling,
with incremental compilation and reloading for C++ already during the early
90's.

------
jierenchen
This is very cool! It's really awesome to see this code as data concept
gaining a lot of traction recently. Hope to see this project developed
further.

I'm working on a similar project here:
[https://sourcescape.io/](https://sourcescape.io/), but intended for use
outside the IDE on larger collections of code (like the codebase of a large
company.)

Agreed on the Prolog/Datalog approach of expressing a query as a collection of
facts. CodeQL does the same. From one datastore nerd to another, I actually
think this is a relatively unexplored area of querying knowledge graphs (code
being a very complex, dense knowledge graph.)

Very excited to see where you go next with this "percolate search"
functionality in the IDE.

~~~
geordimort
There’s a lot of research on this area. The fact that Semmle was acquired by
Github/Microsoft is a testament to the maturity of the field.

------
sideeffffect
This is a similar idea to that of SemanticDB, which is used in the Scala
community

[https://scalameta.org/docs/semanticdb/guide.html](https://scalameta.org/docs/semanticdb/guide.html)

It is a queryable database of semantic information about the program which is
generated by the compiler (compiler plugin, to be precise). Once generated,
other tools which need semantic information, like linters or language servers,
can consume it without having to worry about how to actually generate it.

You might enjoy a talk about it: How We Built Tools That Scale to Millions of
Lines of Code by Eugene Burmako

[https://www.youtube.com/watch?v=C770WpI_odM](https://www.youtube.com/watch?v=C770WpI_odM)

Kythe by Google is also a similar thing:
[https://kythe.io/](https://kythe.io/)

------
sandermvanvliet
I would have expected a mention of Roslyn which powers Visual Studio (Not
code). Sort of similar to the IntelliJ approach but it is also what drives the
C# compiler, it maintains the code model (actually 2 of them, syntactic and
semantic) which makes it pretty powerful. It is multi language (VB, C# and I
think F# now too?) but it’s perhaps not as universal as the Language Server
Approach

~~~
mkl
Roslyn is mentioned in the Related Work section.

------
afarrell
One challenge to this that languages which easily allow metaprogramming can
encourage codebases where it is hard to write tools to gain insight into the
structure of a codebase.

You can say “don’t do that”, but I didn’t. Past coworkers did.

~~~
geordimort
This is just a prototype to showcase ideas. In real life even parsing of C++
or Java is much harder to pull off than one anticipated.

------
disposedtrolley
Are there any examples of "programs as databases" being applied to structured
text like JSON or YAML? Something like a generic system which takes a set of
rules/facts and the source data, and transforms these into a queryable data
structure.

I'm working with a lot of OpenAPI [1] specifications currently, some of which
span tens of thousands of lines. Heaps of parent, child, sibling type
relationships, and objects which are defined-once and referenced in many
places. It would be nice if I could perform a search like "select all schemas
used in this path" or "select the count of references to this parameter".

[1] [https://github.com/OAI/OpenAPI-
Specification/blob/master/ver...](https://github.com/OAI/OpenAPI-
Specification/blob/master/versions/3.1.0.md)

~~~
sjg007
I mean, put them into leucene, elastic search or maybe mongodb?

~~~
disposedtrolley
Well yes, but that doesn't solve the parsing problem right?

My point is that some YAML or JSON documents conform to a specification which
can be codified as a set of rules, which combined with the data form facts.
I'm asking if there are existing systems to parse these files given I write
these rules in a particular syntax, spitting out a queryable representation of
the data.

------
ilaksh
Reminds me a little of "Intentional Programming" or Structured Editors.

------
nine_k
This is really nice. Key points:

\- Parse the program like an IDE would, but expose the data in an open
queryable database format (both line and unlike a language server).

\- Use Datalog for storing the facts and _inferring_ new facts about data.

(A fun fact: the Datalog implementation they use is written in Haskell and
generates programs in Rust.)

~~~
kibwen
Here's the link to the Datalog implementation you mention (from VMWare, I
wonder what they use it for?): [https://github.com/vmware/differential-
datalog](https://github.com/vmware/differential-datalog)

------
habosa
This is extremely cool, great work! You're right, IDEs can do so much magic
but they're limited by the 'spells' their creators have imagined. Many times
I've wanted to do a simple semantic refactor that I could express as logic but
the IDE can't handle it.

(Also hi Pete!)

~~~
duncanawoods
I don’t see what that has to with the IDE. Just grab a parser/ast library for
your language.

I’ve written a few of my own refactorings for big design changes that might
touch e.g. every view in a huge application in ways no series of regexes could
cover.

It’s hard work. Definitely one of those tricky “cost of automation” decisions.
You can do an awful lot of dumb grunt work in the time it takes to debug a
refactoring.

------
Davidbrcz
This idea of turning a program into a database and using prolog/dalalog on it
is not new.

The most successful example is Semmle (bought by Github), which has been doing
it for years now, with a SQL-like syntax for requests (named ".QL").

------
gavinray
This is how Rust Analyzers intellisense works, it uses a query engine called
Salsa that stores the symbols in a database and the linting and
completion/semantics are entirely query-driven:

[https://github.com/salsa-rs/salsa](https://github.com/salsa-rs/salsa)

Good video describing it's use for working with Rust's AST:

[https://youtu.be/i_IhACacPRY?t=1348](https://youtu.be/i_IhACacPRY?t=1348)

------
z3t4
I've written an IDE in JS where all internal structures are available in the
runtime which can be hooked into. I can however not see the usability in what
the article describe. So I would like to se some practical examples. What if
you could do queries like in the article, what sort of queries would you run
!?

------
coderdd
With
[https://github.com/TreeTide/underhood](https://github.com/TreeTide/underhood),
my goal is to provide a read-only view of code, geared for understanding and
debugging. Having to maintain the ability to edit comes with constraints.

------
YeGoblynQueenne
Nice article and great idea, but as is traditional there are some slight
fudges of what Datalog is or what Horn clauses are etc, that I'd like to
unfudge, slightly. It's Sunday! What better than to start our day with a very
quick and almost not completely fudgy intro to logic programming? Le'ts go!

To begin with, Datalog is not a "cousin" of Prolog as stated in the section
"Interlude: Brief Intro to Datalog". Datalogs (there are many variants!) are
subsets of Prolog. For example, a typical datalog is the language of definite
clauses with no function symbols [¹] and with no negation as failure [²].
Another datalog may allow only the cons function in order to handle lists;
etc.

Otherwise the syntax of datalog is identical to Prolog, but there is a further
difference, in that Prolog is evaluated from the "top down" whereas Datalog is
evaluated from the "bottom up". What that means is that given a "query" (we'll
come to the scare quotes in a moment) Prolog will try to find "rules" whose
heads unify with a literal in the query (A literal is an atom, or the negation
of an atom; "p(χ,α)" is an atom.) whereas datalog will first generate the set
of all ground atoms that are consequences of the program in the context of
which the query was made, then determine whether the atoms in the query are in
the set of consequences of the program [³]. The reason for the different
execution model is that the bottom-up evaluation is guaranteed to terminate
[⁴] whereas Prolog's top-down evaluation can "go infinite" [⁵]. There is of
course another, more subtle difference: Prolog can "go infinite" because of
the Halting problem, from which datalog does not suffer because, unlike
Prolog, it does not have Universal Turing Machine expressivity [⁶].

So in short, datalog is a restricted subset of Prolog that has the advantage
of being decidable, while Prolog in general is not, but is also incomplete
while Prolog is complete [⁷].

Now, the other slight fudge in the article is about "rules", "facts" and
"queries". Although this is established and well-heeled logic programming
terminology, it fudges the er fact that those three things are the _same kind
of thing_ , namely, they are, all three of them, Horn clauses [⁸].

Specifically, Horn clauses are clauses with a single positive literal.

Crash course in FOL: an atom is a predicate symbol followed by a set of terms
in parentheses. Terms are variables, functions or constants (constants are
functions with 0 arity, i.e. 0 arguments). A literal is an atom, or the
negation of an atom. A clause is a disjunction of literals. A clause is Horn
when it has at most 1 positive literal. A Horn clause is a definite clause
when it has exactly 1 positive literal.

The following are Horn clauses:

    
    
      ¬P(χ) ∨ ¬P(υ)
      P(χ) ∨ ¬Q(χ)
      Q(α)
      Q(β)
    

In logic programming tradition, we write clauses as implications (because ¬A ∨
B ≡ A → B) and with the direction of the implication arrow reversed to make it
easier to read long implications with multiple premises. So the three clauses
above are written as:

    
    
      ←P(χ), P(υ) (a)
      P(χ) ← Q(χ) (b)
      Q(α)←       (c)
      Q(β)←       (d)
    

And those are a "query", (a), a "rule", (b) and two "facts", (c) and (d).

Note that (b,c,d) are _definite_ clauses (they have exactly one positive
literal, i.e. their head literal, which is what we call the consequent in the
implication). Facts have only a positive literal; I like to read the dangling
implication symbol as "everything implies ...", but that's a bit
idiosyncratic. The bottom line is that definite clauses with no negative
literals can be though of as being always true, hence "facts". Queries, i.e.
Horn clauses with no positive literals, are the opposite: "nothing implies"
their body literals (my idiosyncratic reading) so they are "always false".
Queries are also called "goals". Finally, definite clauses with both positive
and negative literals can be thought of as "conditionally true".

Prolog and datalog programs are written as sets of definite clauses, i.e. sets
of "facts" and "rules". So, when we want to reason about the "facts" and
"rules" in the program, we make a "query". Then, the language interpreter,
which is a top-down resolution theorem prover [⁹] in the case of Prolog, or
bottom-up fixpoint calculation in the case of datalog [¹⁰], determines whether
our "query" is true. If the query includes any variables then the interpreter
also returns evey set of variable substitutions that make the query true.

In the example above, (a) has two variables, χ and υ and evaluating (a) in the
context of (b,c,d) would return a "true" result with the variable substitution
{χ/α,υ/β}, i.e. (a) is true iff χ = α and υ = β.

And that's how Horn clauses and definite clauses become "rules", "facts" and
"queries".

Next time: how the leopard got its stripes and the hippopotamus learned to
love the first order predicate calculus.

_________________

[¹] This is my (first) slight fudge because constants are also functions, with
0 arguments. So, to be formal, the typical datalog above has "no functions of
arity more than 0".

[²] Negation-as-failure makes a language non-monotonic, in the sense that
introducing new "facts" can change the meaning of a theory, i.e. a program.

[³] So, its Least Herbrand Model, or its Least Fix-Point (LFP).

[⁴] _Because_ it finds the LFP of the query and the program.

[⁵] Unless evaluated by SLG resolution, a.k.a. tabling, similar to
memoization.

[⁶] Although _higher-order_ datalogs, that allow for predicate symbols as
terms of literals _have_ UTM expressivity, indeed a UTM can be defined in a
higher-order datalog fragment where clauses have up to two body literals with
at most two arguments:

    
    
      utm(S,S) ← halt(S).
      utm(S,T) ← execute(S,S₁), utm(S₁,T).
      execute(S,T) ← instruction(S,P), P(S,T).
    

Originally in:

Tärnlund, S.-A. (1977). Horn clause computability. BIT Numerical Mathematics,
17(2), 215–226.

[⁷] Less fudgy, definite programs are refutation complete under SLD
resolution, meaning that any atom that is entailed by a definite program can
be derived by SLD resolution. A definite program is a set of definite clauses,
explanation of which is coming right up.

[⁸] Long time ago, I explained this to a colleague who remarked that all the
nice syntactic elegance in query languages falls apart the moment you try to
make a query, which usually has a very different syntax than the actual rows
of the tables in the database. So I said "that's the point! Queries are also
Horn clauses!" and his immediate remark was "That just blew my mind". It's
been so long and I'm so used to the idea that I haven't a clue whether this is
really mind blowing. Probably, being my usual excited self, I just said it in
a way that it sounded mind blowing (gesticulating widely and jumping up and
down enthusiastically, you can picture the scene) so my colleague was just
being polite. That was over drinks at the pub after work anyway.

[⁹] Resolution is an inference rule that allows the derivation of new atoms
from a set of clauses. In theorem proving it's used to refute a goal clause by
deriving the empty clause, □. Since a goal is a set of negated literals,
refuting the goal means basically that the negated literals are true. So our
query is true in the context of our program.

[¹⁰] Datalog's bottom-up evaluation uses something called a TP operator. It
basically does what I said above, starts with the ground atoms in a program
and then derives the set of consequences of the clauses in the program. In
each iteration, the set of consequences are added to the program and the
process is repeated, until no new consequences are derived. As stated above,
the process is guaranteed to complete because every datalog definite program
has a least fixpoint, which is also its Least Herbrand Model (we won't go into
Herbrand Models and Herbrand interpretations, but, roughly, an LHM is the
smallest set of atoms that make the union of a definite program and a goal
true). A more complete introduction to LHMs and LFPs and how they are used in
bottom-up evaluation for datalog can be found here:

[https://www.doc.ic.ac.uk/~mjs/teaching/KnowledgeRep491/Fixpo...](https://www.doc.ic.ac.uk/~mjs/teaching/KnowledgeRep491/Fixpoint_Definite_491-2x1.pdf)

~~~
tom_mellior
Sorry, but you are also fudging things.

> Datalogs (there are many variants!) are subsets of Prolog.

No. First of all, this is not true syntactically. There are Datalogs that
allow non-Horn clauses with several terms in a goal head:

    
    
        a, b :- c.
    

This is not allowed in Prolog. So (such) Datalogs are not subsets of Prolog
syntactically.

Second, it is not true semantically either, not even for the common syntactic
subset. Consider:

    
    
        ancestor_of(Parent, Child) :-
            child_of(Child, Parent).
        ancestor_of(Ancestor, Person) :-
            ancestor_of(Ancestor, Parent),
            child_of(Person, Parent).
    

This is left-recursive, so typical queries will not terminate in Prolog, i.e.,
have no finite solutions. But as you say, Datalogs are decidable and any query
terminates, so you _will_ get solutions, which is different semantics from
Prolog. So it's not meaningful to say that Datalog is a semantic subset of
Prolog.

Datalog and Prolog are like C++ and Java: One is an extension of a subset of
the other, or equivalently, there is a non-empty common subset with similar-
ish semantics. This is not a very useful statement? I agree! But it is what it
is. They are different languages.

~~~
YeGoblynQueenne
Yes, you're right and I'm also fudging things- but didn't I say that upfront?
I start my comment by announcing an "almost not completely fudgy intro to
logic programming"!

More seriously, you're right about syntax so thanks for the correction.

But, regarding semantics, the ancestor_of/2 program above _can_ terminate in
Prolog, evaluated by SLG resolution, as per my footnote 5. There are still
situations where Prolog will not terminate when evaluating a normal program
even under SLG resolution, but left recursion is not one of those.

Edit: also, if a Datalog program is also Prolog, and assuming that the program
terminates under Prolog, then Prolog and Datalog will both compute its LHM. So
it makes sense to say that the two languages are semantically at least very
similar and to explain one in terms of the other. They are both much closer
than what each is to ASP, for example. It really depends on what assumptions
one makes- and that's where the "fudging" comes in.

Anyway, thanks for the correction. I sure could have done a better job of that
comment. Did you find anything other that was very wrong in my comment? I'd
appreciate it if you pointed it out.

~~~
tom_mellior
> Did you find anything other that was very wrong in my comment?

I'm not sure about the statement that "Datalog is evaluated from the "bottom
up". [...] datalog will first generate the set of all ground atoms that are
consequences of the program in the context of which the query was made, then
determine whether the atoms in the query are in the set of consequences of the
program". I think it's true that Datalog _behaves_ as if it were evaluated
like this, but AFAIK Datalog systems can do lots of very aggressive
optimizations that change the _actual_ evaluation. I'm not an expert on
Datalog.

Anyway, I was mostly dissatisfied with the general thrust of the comment,
trying to establish a "subset" relationship. I think the article's "cousins"
comment is fair. Prolog is older and influenced Datalog very much. Prolog is
also Turing complete, so it is strictly more powerful. But Datalog's semantics
allow some optimizations that wouldn't be possible in Prolog, so it can be a
lot faster on appropriate classes of problems.

~~~
YeGoblynQueenne
Thanks for this additional comment.

I thought more of your earlier comment about how "p,q:- r" is Datalog. I
accepted this because I figured you know what you're talking about but, to be
honest, _I_ don't know what you're talking about. My understanding is that
without definite clauses (and "p,q:-r" is not definite) there are no fixpoint
semantics and without fixpoint semantics there is no guarantee of program
termination.

So I have to ask: where does this information come from? Could you point me to
a source? To be honest, I suspect that I am confused because of a lack of an
ISO standard for Datalog. Even Prolog, with an ISO standard, has various
extensions like, if memory serves, B-Prolog (which includes OOP elements). If
Datalog has no commonly recognised standard, then basically anything can be
called "Datalog" as long as someone, somewhere, can recognise it as Datalog.
Is that the case here? If so, that would clear my confusion.

It would also make more sense for me to say that "Datalog is a subset of
Prolog", at least in terms of syntax. In that case I'd have to clarify that
I'm talking aout a simple, commonly accepted language fragment of definite
clauses with at most one function symbol, which I think everyone would readily
recognise as "Datalog" without much fuss.

I confess that the extent of my knowledge about Datalog comes from
conversations with (senior) colleagues and not directly from original sources.
Of course, original sources go way, way back (but obviously not as back as
original Prolog sources and I'm familiar with those, so that's not a complete
excuse). In any case, I had a look at, e.g. "What you always wanted to know
about Datalog (and never dared to ask)" by Geri, Gotlob and Tanca (also in the
bibliography section of the wikipedia article on databases).

The article, which is from 1989 and clearly addressed at the databases
community (rather than the logic progamming, or AI community) states that
Datalog is "in many respects a simplified version of general Logic
Programming", referencing J. W. Lloyd for the latter. Since it's 1989, "Logic
Programming" clearly means Prolog (given that it's too early for, e.g. ASP).

Further, the article makes it explicity that "In the formalism of Datalog both
facts and rules are represented as Horn clauses of the general shape ..." and
gives an example of a definite program clause. Later, the article states "From
the syntactic point of view, Datalog is a subset of Prolog" (but then goes on
to point out the difference in semantics).

Finally, the article does agree with you that is is possible to evaluate
Datalog programs in a "top-down" fashion, using the query-subquery algorithm
which is, from what I can tell, backwards chaining implemented by breadth-
first search. So the semantics of Datalog _can_ be different than Prolog's. My
mistake I think is in equating the use of a TP operator (which is "bottom-up")
to Datalog execution, always.

~~~
tom_mellior
> So I have to ask: where does this information come from?

Half-remembered university courses from >= 10 years ago, I'm afraid. I now
think I was probably thinking of ASP, not Datalog. See for example the
"Disjunctive logic programs" section of [https://www.cs.uni-
potsdam.de/~torsten/Lehre/ASP/Folien/asp-...](https://www.cs.uni-
potsdam.de/~torsten/Lehre/ASP/Folien/asp-handout.pdf). Since the syntax is
very Prolog/Datalog-like, I probably got mixed up. My bad, sorry. The first
few sources on Datalog I looked at only talked about Horn clauses, not ones
with more heads.

> My understanding is that without definite clauses (and "p,q:-r" is not
> definite) there are no fixpoint semantics and without fixpoint semantics
> there is no guarantee of program termination.

Skimming
[https://en.wikipedia.org/wiki/Answer_set_programming](https://en.wikipedia.org/wiki/Answer_set_programming)
and
[https://en.wikipedia.org/wiki/Stable_model_semantics](https://en.wikipedia.org/wiki/Stable_model_semantics)
I get the impression that even disjunctive logic programs appear to always
terminate, though the complexity might be daunting.

~~~
YeGoblynQueenne
Ah, phew, OK. I was really confused by that. No worries, I'm still grateful
for your comments :)

Yes, ASP is not Horn so it has rules with multiple literals in the head.
"Choice rules", writen as {s,t}:- p. Also, like you say, I believe it doesn't
suffer from Prolog's non-termination, again because unlike Prolog it's not
Turing-complete. But in any case ASP is based on stable model semantics, not
fixpoint semantics.

However I'm really very far from being anything like an expert in ASP! I
really should learn a bit of it because it's actually necessary in my
research.

------
geordimort
Sorry to disappoint but for people doing programming language semantics
there’s nothing new here. The hard part is always to go from the made-up
language to a set of multiple full-blown languages.

------
tom_mellior
> I want to see how easy it is to add more advanced features to FP, like
> generics

This will be interesting to see, since the straight-forward implementation of
generics for functional programming languages uses unification, which is not
available in Datalog. It will probably be possible to encode things for any
given program, since its universe of types should be finite. But it will
involve jumping through hoops to encode something equivalent to unification.

------
airocker
I think the biggest win here could be transactions. One commit can be worked
upon by more than one person at once.

------
pgt
Ask yourself: what is the difference between a database and a programming
language? The question is worth exploring.

------
LeonB
The .net dependency and code maintenance tool “NDepend” has a feature called
CodeQuery that this reminded me of.

You can write queries such as:

from m in Application.Methods where m.NbLinesOfCode > 30 && m.IsPublic select
m

------
jtwaleson
I was very confused as I had read the title as '... with Datadog" (the
monitoring tool).

Anyway, I think both are a good idea. Datadog -> integrating monitoring
information back into the IDE. Datadog -> working with the actual semantics of
the code rather than just blobs of text.

