To clarify, if you don't have recursion, you've just got a slightly different skin over relational algebra.
The initial implementation used differential dataflow, but this was just merged in: https://github.com/rust-lang-nursery/polonius/pull/36#issuec...
Very cool stuff!
I don't know about rust, but the cost of that abstraction elsewhere is pretty good. Datalog will beat basic implementations, and those that most researchers have time to implement handily. However, very well engineered purpose built solvers still beat datalog engines by a large factor (again, no theoretical reason this should be true, but it is), so production compilers still currently eat the complexity of building alias solvers.
It would be nice if one day that was not true.
Sure there is a reason: Many applications, alias analysis in particular, only use a subset of Datalog. Subsetting languages very often makes for easier and more efficient implementation. If you know that there is no negation in your alias problems, your special purpose solver can be much more efficient than a Datalog solver that must be prepared to deal with negation.
Of course you might wish for a generic Datalog solver that you could run in a special mode where you promise that it will not encounter negation in the problem you give it. But that is then no longer a Datalog solver but a "well engineered purpose built solver".
As for the "built in" part, anything that is not built in must cross an API boundary and usually translate between different data representations. That is a theoretical reason for being slower.
I don't really agree. Specialized implementations for instances of a problem that can solved more efficiently than the general case are a pretty fundamental part of any mature solver (GMP is probably the most well-known example). This includes even cases where the user says to try the specialized algorithm and the solver checks that you indeed have an instance of it, rather than the specialized subproblem being automatically detected by the solver (there are many examples, but I always think of optimizations for read only transactions in databases--generally you have to declare the transaction read-only up front).
> As for the "built in" part, anything that is not built in must cross an API boundary and usually translate between different data representations. That is a theoretical reason for being slower.
The abstractions required for getting solvers to perform well on problems like this are usually sufficiently far removed from the representation you'd have had if you didn't have to solve the problem that (1) transforming it is only a small part of the overall time to solve, and (2) it is often a cost you already have to pay to some extent.
What you've offered are practical reasons it hasn't happened yet, but not theoretical reasons.
I suspect the point of disagreement is
> Of course you might wish for a generic Datalog solver that you could run in a special mode where you promise that it will not encounter negation in the problem you give it. But that is then no longer a Datalog solver but a "well engineered purpose built solver".
which I don't think is correct. A Datalog solver with the property that it can run faster on certain classes of input is great; it is not "no longer a Datalog solver".
Yes. And that is extra work that must be done. Admittedly, it probably isn't the "large factor" mentioned by DannyBee.
> A Datalog solver with the property that it can run faster on certain classes of input is great; it is not "no longer a Datalog solver".
I was talking of a special, user-requested mode in which the user guarantees that the input program they are interested in will not contain certain constructs, and the solver can reject programs that do contain them. A "Datalog" solver that rejects programs containing, say, negation, does not handle the full Datalog language and is therefore no longer a Datalog solver.
As an analogy, Clang is a C++ compiler, but passing it the "-xc" flag causes it to reject valid C++ programs, so then it's no longer a C++ compiler (although it is still the same program!).
As the post mentions, the current ergonomics ("a wall of definitions") are indeed the main shortcoming of the present approach, which is syntactically far removed from Datalog.
My suggestion towards an automated conversion is to consider Prolog: Syntactically, a Datalog clause is a valid Prolog term. In fact, Datalog is a syntactic subset of Prolog. Therefore, it can be easily parsed with Prolog (using the predicate read/1), and then inspected via built-in mechanisms. You may be able to write a Prolog program that converts proper Datalog definitions to such Rust snippets.
it uses the ability to return multiple times from a continuation to allow sets of results (streams or solution trees) to be expressed directly in scheme and intermixed freely with the functional subset.
a 'nice' mixture of declarative and procedural would be really slick. i should go back and look at mozart/oz again.
I'll definately see if my limited ability to deal with formalisms lets me understand the type system proposed in
W.r.t. Datafun, i learned about it from a talk its creator gave on strangeloop last year , for me it seemed quite approachable :)
So really linking to a text on Prolog is not really relevant to a discussion of the merits of Datalog.
SLG resolution (also called "tabling" in Prolog) prevents both positive and negative loops, and always terminates for programs that have the bounded term-size property, as is the case for all Datalog programs. XSB Prolog is a notable example of a Prolog system that supports this mode, which you can enable with the table/1 directive:
Therefore, you can use a Prolog system such as XSB Prolog also to learn Datalog, and then seamlessly move to Prolog. Conversely, if you know Prolog, then learning Datalog is very easy. Another very nice system for learning Datalog is DES, which you can run in SICStus Prolog and other systems:
Various extensions such as aggregate functions and object-oriented programming (which are mentioned in the Wikipedia entry for Datalog) are also applicable to both Prolog and Datalog with identical or closely related semantics.
I also work on an industrial strength Datalog implementation and I can tell you unequivocally there is very little relationship with how a real Datalog engine is implemented and a Prolog implementation.
And no, learning Prolog will not help you learn Datalog. Unless you're going to always us tabling or demand transformations, the difference in evaluation model requires writing in a very different style.
Whether one emphasizes the differences or commonalities of Prolog and Datalog may also depend on commercial interests: For vendors of Datalog systems, it may be advantageous to stress the differences. For Prolog vendors, it may conversely be advantageous to stress their commonalities.
Personally, I learned Prolog before Datalog, and it was easy for me to learn Datalog because I was already familiar with Prolog. The reason may be that I try to write as declaratively as possible when programming in Prolog, and that carried over well to Datalog too. More recently, I have also used tabling in many Prolog programs, and often obtained very general programs with good termination properties, similar to Datalog definitions.
I implemented a Datalog engine in Java a while back. Looking back on it now there are a bunch of things I might have done differently, but I've never tried to implement anything like it before.
I remember that doing the stratified negation was quite complex and I won't be able to explain it of the top of my head anymore.
Anyway, here's my version: https://github.com/wernsey/Jatalog
BTW the use of functional maps, such as those in , might have advantages over things like StackMap. Of course it would require some restructuring of the code.
In retrospect I think there might be several ways to improve performance by using other data structures to index the facts, but I haven't had the time to work on it lately. There is a pull request from someone that I'm still looking at, which might just spur me on to think about it some more.
Also, the solver here is special-cased to treat only a certain subset of Datalog that is much simpler than the full language.
if you allow something to be true only if something else is not true, your solution space is no longer monotonic. meaning that a result you thought was part of the solution can later be retracted. if there is a cycle (trivially "a if not a"), then you may not ever arrive at an answer. that might seem simple to detect, but that cycle can exist through an arbitrary number of rules and data dependencies, so in the limit is undecidable.
ideally, as in this solver, you could keep adding terms you could show to be correct from the data and rules at hand until they and all their consequences have been found, and which point you can stop (fixed point). but negation screws up this plan.
there are various restrictions on negation, solver heuristics, and partial conservative proofs of termination that can be applied. but this is where your beautiful simple declarative world starts getting sad and messy.
[edit: I'd like to leave this because I wish I understood the problem with negation earlier, but I just skimmed and missed stratification, Frank thoughtfully explains below]
The engine supports negation in the standard way: stratification. You can call 'from_antijoin' with a static relation for its second argument but not a variable. It is not discussed in the post because it is a bit of a diversion from the intended message (and because antijoin is computationally not much different from a filter).
If you would like a more general compute model with non-monotonic iteration, I recommend differential dataflow!