Hacker News new | past | comments | ask | show | jobs | submit login
Logica, a novel open-source logic programming language (googleblog.com)
346 points by layer8 9 months ago | hide | past | favorite | 98 comments

I have yet to check it out, but it's very cool when people try to rethink SQL, and also revive Datalog. I (morally) support this effort.

Myself, I would like to see data query/manipulation language as a total functional language, possibly based on the idea of categorical data transformations: https://www.categoricaldata.net/

Also - bit of a rant - if you're creating a new programming language, consider making syntax and semantics separate in the specification. Lots of people get hung up on arguing about language syntax but it's really semantics differences that are important for compatibility. Lot of new languages comes up only to fix syntactic problems with existing languages but create small semantic differences in the process, making automated translation from and to existing languages difficult. I wish we could move to a world where syntax and semantics in programming languages are discussed separately from each other.

> Also - bit of a rant - if you're creating a new programming language, consider making syntax and semantics separate in the specification.

I agree. Implementations should accept a stable, machine-friendly format (doesn't matter which; JSON, s-expressions, or even XML would do). If they also accept a human-friendly format, there should be a standard/built-in translation from human format -> machine format (optionally the other way too).

This way, we can always convert random real-world code (scraped from GitHub, or whatever) into a language-agnostic format (yes Python has an `ast` module; that doesn't help a Python linter written in something else, like Go); tools can manipulate this format without having to care about the surface syntax (e.g. linting/doc-gen/static-analysis/versioning/diffing/refactoring/macros/etc.); the output of such tools can always be fed back into the main implementation to compile/run/type-check/syntax-check/etc.

I agree with the GP. It frustrates me when the surface syntax and the semantics are not split apart. The constraints of old compiler technology (pre-3rd-millennium) continue to dictate the overall architecture of compilers. LLVM is a tiny baby step in the right direction.

Note, I'm not advocating for a single solution, here. I'm advocating that language authors should always think in terms of a front, middle, and back-end: front is surface syntax (their preferred one?); middle is the semantics, with a prescribed API; and, the backend is the implementation side — nicely abstracted by a 2nd API.

That way it gives those of us stuck in not-your-language a fighting chance to integrate Your Cool Thing™.

Good point about separating syntax and semantics. Maybe new languages should use something like lisp syntax to nail down the semantics, and then people could create their own syntaxes from there.

IIRC, Ohm (successor to OMeta) separates syntax from semantics!

> Myself, I would like to see data query/manipulation language as a total functional language

Erm, like PromQL?

Nice to see Datalog being validated by a big name, though I don't see what's modern about Logica in particular, or why one should use it over plain Datalog (as syntactical Prolog subset) when the available backends are restricted to SQL rewriters or locked-in to BigQuery. Will have to look into the model for aggregate queries which I guess is a selling point for Logica (as is modularization/composition with optimization), and a weak point since typically neither in Datalog (the decidable logic fragment) nor portable.

Edit: also I find the title a bit grandiose since this isn't about Logic Programming in general, but only database querying

> ...or why one should use it over plain Datalog...

I have been looking for examples of how to use a practical implementation of Datalog for years and the closest I've come to is actually miniKanren instead. Could you point me to codebases that productively use Datalog internally?

I feel your parent post was probably looking for software that successfully uses datalog to fetch/query its data, not one that provides data through datalog. User, not provider.

Yep, thanks!

I'm well aware of datomic family databases, but it's the part about solving interesting problems with them that interests me, not that someone has implemented another one ;)

Have you seen this presentation about NuBank's usage of datomic https://youtu.be/7lm3K8zVOdY ?

What example with miniKanren have you found? This is an area I have a passing interest but never find the time to delve deep enough to find anything shiny enough

The most interesting open source example I have found is a hy-lang minikanren that implements a tree-rewriter which lets you encode equivalent implementations of code snippets. It's a code-linter that will automatically simplify code you write via higher-level rules written in mk


I've heard informally there are many companies relying on clojure/core.logic to rewrite complicated business rules for the sake of filtering and constraint solving problems in applications, but I do not know of any open source examples to reference.

edit: i accidentally linked to a dependency of the project i meant to link to. originally linked to the mk implementation: https://github.com/algernon/adderall


It seems that it's open source (Apache 2.0) and can generate SQL for PostgreSQL and SQLite in addition to BigQuery.

Yeah but that reduces Logica outside Google to SQL rewriting. When the weak point of SQL isn't so much the syntax but the scalability and expressiveness limitations for big data and document data (fixation on strong ACID/CAP guarantees, schemas). SQL syntax has strong points, too; one being that it's not just a query but also update/batch language with ACID semantics; another being that it's standardized with a range of mature options available.

Consider also the practical side: using Datalog as merely "prettier SQL" still doesn't allow you to dynamically define data properties or go schema-less as in RDF or other logic/deductive graph databases. Whenever you want a new column, you must execute DDLs (ALTER TABLE ADD COLUMN) also leading to forced commits, overly broad permissions, chaotic backup procedures and/or code artefacts containing the dreaded SELECT * syntax. Also, parsing Datalog queries, reformulating into SQL, then re-parsing SQL in the DB engine isn't the most efficient thing.

Basically, the workflows and use cases for SQL RDBMSs and Datalog/graph databases are not the same, and if you're using one on top of the other, you're getting the intersection of possibilities but the union of problems, as is well known from O/R mappers );

>> Whenever you want a new column, you must execute DDLs (ALTER TABLE ADD COLUMN) also leading to forced commits, overly broad permissions, chaotic backup procedures and/or code artefacts containing the dreaded SELECT * syntax.

I don't understand what you mean here. With datalog, if you have a predicate person(Name, Age, Height) and you want to add an argument (a "column") for income, you can simply create a new predicate person(Name, Age, Height, Income).

Or, if you want to avoid duplication, you can write a rule to combine the information in two (or more) predicates:

  person(Name, Age, Height, Income):-
    person(Name, Age, Height)
   ,person(Name, Income).
You don't need to remove the old predicate. That's actually one case where Datalog works better than SQL, that only allows "rows" i.e. "facts" (in Datalog parlance) but not "rules" that establish relations _between tables_.

That's true for Datalog, based on what I know about Prolog (not a Datalog expert!). I don't know how it works in Logica, but from reading the article above I think the semantics would be similar.

>> Basically, the workflows and use cases for SQL RDBMSs and Datalog/graph databases are not the same, and if you're using one on top of the other, you're getting the intersection of possibilities but the union of problems, as is well known from O/R mappers );

That's funny. But I don't think it applies here. SQL and datalog are both relational. The difference is that Datalog lets you define relations over tables ("rules"), not just relations over data ("facts"/"rows"). Essentially, SQL is one half of datalog's relational semantics - only information without reasoning. Datalog adds reasoning on top, but the reasoning is still, well, relational (facts, rules and queries are all relations). There's no impedence mismatch here, as in trying to fit relational data into a non-relational program.

I'd still be very concerned about putting something as complicated as this on top of SQL, because despite what it says on the tin, SQL is not a declarative language. Any serious database-using application will still be chugging through synonymous queries to get to the one that works right, manually annotating the databases with the indexes it needs, figuring out when to manifest tables or views or any number of other optimizations the engine can't or won't do on its own, and doing a lot of other operations that are hard enough when using SQL directly, but will be made even more so by trying to operate through such an opinionated interface. SQL is already handicapped by trying to be declarative but in practice being a language where a lot of nominally-equivalent things will result in different queries under the hood.

I loooooooooove the ideas being expressed in the post. I am firmly in the camp that SQL needs a deep rethink because of its many and manifold software engineering flaws, and this is exactly the sort of thing I'm thinking of, not just a slight gloss on SQL, but a complete rethink. I'm just not sure this is going to be practical sitting on top of SQL. Make this a native query language for Postgres or something and we'd be talking. One step at a time, though. I'm very positive on this step being taken.

At this point, extracting the industry from its path dependence history [1] of SQL is a Google-sized problem. The engineering itself isn't necessarily a Google-sized problem but the rest of it is.

[1]: https://en.wikipedia.org/wiki/Path_dependence - that is, if databases were all separately evolving over the years and only this year were they all going to get together and produce a standard to unify themselves, it would not look like SQL. It would quite likely look a lot more like this. SQL has too many glaring flaws, not least of which is its total composability fail.

From what I understand Logica compiles to SQL so it can run on BigQuery. I don't think that's putting it "on top of SQL".

I think it's a bit confusing that datalog is always discussed in the context of databases and as a "query language" etc. In fact it's a subset of Prolog, so it really belongs to the subject of logic programming. It doesn't help that Prolog programs themselves are implemented as databases and that Prolog programming uses terms such as "query" that blur the waters about exactly what one is doing.

I confess I don't have a background in databases and so I only understand the very basics about SQL's semantics, which is the Relational Calculus, but as far as I understand it, RC is a subset of predicate logic (a.k.a. first-order logic). Prolog is itself a different subset of predicate logic, Horn clause logic; and Datalog is a subset of Prolog and equivalent to SQL in expressive power.

Very briefly, every expression in Prolog is a Horn clause. A clause is a disjunction of literals. A literal is an atom, or the negation of an atom. An atom is an atomic formula, a predicate symbol followed by a number of terms in parentheses where the number is the "arity" of the predicate. Terms are variables, functions or constants.

For example, father(john, bob) is an atom of the predicate father/2, where "father" is the symbol and "2" is the arity.

An example of a clause is grandfather(x,y) ∨ ¬father(x,z) ∨ ¬parent(z,y). This is a disjunction of one positive literal, grandfather(x,y) and two negative literals, ¬father(x,z) and ¬parent(z,y). By the rules of logical connectives, the same disjunction can be written as an implication: father(x,z) ∧ parent(z,y) → grandfather(x,y). By Prolog convention also observed in Datalog, implications are written with the positive literal first: grandfather(x,y)← father(x,z), parent(z,y). The left-facing implication arrrow is rendered as ":-" in ASCII friendly manner, conjuctions are represented by the comma, ",", and variables are represented by upper-case letters, yielding the standard Prolog -and Datalog- notation:

  grandfather(X,Y):- father(X,Z), parent(Z,Y).
The above clause is a Horn clause. A clause is Horn when it has at most one positive literal. A Horn clause is definite when it has exactly one positive literal. Horn clauses with 0 positive literals are called "goals", Horn clauses with exactly one positive and 0 negative literals are often called "unit clauses" and Horn clauses with one positive and any number of negative literals are usually called "definite clauses" (confusingly). A definite clause is datalog if it has no functions of arity more than 0 (constants are functions with arity 0) as arguments to a literal. For example, in the following, [1] is Datalog, [2] is not (but is Prolog):

  s(0).           % [1]
  s(N):- s(s(N)). % [2]
Where s(N) is a function (possible to determine syntactically because it's an argument to a lieral). In Prolog parlance, definite clauses are also called "rules", unit clauses are also called "facts" and goal clauses are also called "queries".

Now, s(0) is a Prolog and Datalog fact and is just as fine a SQL table, called "s" and with a single row with one value, "0". Here's a fuller example:

  grandfather(X,Y):- father(X,Z), parent(Z,Y).
That's a Prolog and Datalog program with two "facts" and a "rule". The following are two queries and their results:

  ?- father(X,Y).
  X = bob, Y = john ;
  X = john, Y = alex.

  ?- grandfather(X,Y).
  X = bob, Y = alex ;
Each query starts with "?-" at the command-line and ends with a "." as all Prolog clauses. Below the query are its results: the instantiations of the variables X and Y in the query that make the query true. The ";" means there may be further results. And "false" means there are no more results.

Now, I leave it as an exercise to the reader (you) to figure out how the above works out with SQL. Keep in mind that father/2 has a clean translation to a SQL table named "father" with two columns, for example named "father" and "child". The "rule" for grandfather/2 is probably best represented as a join.

In any case, as you can probably see, we have here a very different language than SQL, but with semantics that can be seen as, in a sense, being equivalent to the semantics of SQL. Except, where SQL makes a distinction between "data" and "queries over data", Datalog only has facts, rules and queries, that are all Horn clauses and that are all part of the "program database".

So it's not a complicated machinery on top of SQL at all. The only thing I'm concerned is of the naturaleness of SQL queries generated by the "compiler" (some kind of transducer, probably). On the other hand, I reckon SQL is only meant to work as a kind of "relational assembly" and will not have to be seen by any human eyes except in rare cases. Or that's hopefully the plan.

Edit: note there are many, Many, MANY variants of Datalog with confusingly subtly different semantics. See the book I recommended in my comment to juki, below. Personally, I get lost in the variations pretty quickly...

"I think it's a bit confusing that datalog is always discussed in the context of databases and as a "query language" etc."

It is a reflection of the way that SQL is so ensconced in the developer's gestalt as "The Way To Query Data", such that Querying Data means SQL and SQL means Querying Data, that most people are not capable of coming at something like a logic-based database layer as a first-order element on its own, but can only conceive of it as an SQL layer.

Further observe how there's a category of databases called "NoSQL"... when you have a category of something defined by its not being in some other category, that really shows just how large that category looms in the developer's mindset. NoSQL is slowly cracking the SQL consensus, but it's a long and slow process. You'll know it has really made it when we give a positive name to that category, or perhaps more likely, 3 or 4 names to the several types of databases within. "Document store" is getting close to being the name of one of the styles.

My particular reason for speaking this way though is that this specific technology manifests that way. You'll note I called for this to be made a native query layer, because I'm personally pretty much over SQL and ready for the next thing to come out. I'm tired of it being the 1970s again every time I speak to a database. Unfortunately, it's such a task that it has killed everyone who has tried it so far. AIUI FoundationDB got the closest from anyone I've seen. I'm not sure how they're doing; a cursory web search suggests perhaps they aren't as dead as I thought.

>> My particular reason for speaking this way though is that this specific technology manifests that way. You'll note I called for this to be made a native query layer, because I'm personally pretty much over SQL and ready for the next thing to come out.

I agree with that, although since I don't write any SQL anymore, I don't really mind it, as such. But I think the reason Logica is compiled to SQL must be the pervasive association of "database" with "SQL" that you point out. Datalog in fact has its own execution model that doesn't really need SQL. Perhaps the people behind Logica felt that it would be easier for it to be adopted if it piggy-backed on SQL, the same way that so many languages target the JVM etc. I too think that's a little disappointing. But I'm heavily invested in logic programming so I'm glad to see _some kind_ of logic programming language at least created at Google (no idea how much it's used though).

> That's actually one case where Datalog works better than SQL, that only allows "rows" i.e. "facts" (in Datalog parlance) but not "rules" that establish relations _between tables_.

What is the difference between Datalog rules and SQL views?

It's been a while since I used SQL and I'm a bit rusty in it, but views would probably be the equivalent of Datalog rules, yes. The difference, as in my other comment to OP, is that Datalog rules are part of the Datalog program, which also stores the actual "tables" i.e. the facts. Whreas in SQL, views are only sort of ... virtual? Like I say I'm a bit rusty- but from my understanding, SQL vies don't live in the same space as tables.

Funny thing. It used to be my day to day work was 80% SQL. Nowadays it's 99% Prolog maybe with a little bit of bash and powershell scripting (gotta automate those experiments!). I kiiind of miss SQL? But not quite. Personally I don't 100% get the grumbling about SQL's syntax. It's unintuitive and it works very hard to hide the actual semantics behind it, but, eh, at least it has clean semantics.

I recently found this free book on databases that goes over both SQL and Datalog. It's a bit thick with obtuse terminology but it actually goes in depth over many useful topics:


I also recommend that to OP, if they're reading.

mind informing what you use Prolog for, and strengths?

I use Prolog for my research. I study Inductive Logic Programming (ILP) for my PhD. ILP is a field in the intersection of machine learning and logic programming, that studies approaches to learning logic programs from examples, background knowledge and language bias (it helps to think of background knowledge as a library of sub-routines from which a program is to be composed and to think of language bias as constraints on the structure of learned programs).

Obviously Prolog is well-suited for this task, but there's a reason why you don't often hear of "Inductive Python Programming" or "Inductive Java Programming", say. The reason is that imperative languages tend to have lots of specialised syntax, for example for class declarations, loops, variable assignment etc. Whereas Prolog syntax consists entirely of one kind of expression, the Horn clause. So for instance, to learn a program with a "loop" in Prolog you "only" need to add a recursive clause to the program, where a recursive clause is simply an ordinary Horn clause with the same predicate symbol in a head literal and one or more body literals. To learn a program with a loop in Python you have to add the loop to the program as a specialised structure with its own peculiar syntax.

Also, because in Prolog everything is a Horn clause, examples, background knowledge and language bias can be (and often are) represented as Prolog programs themselves, so it's possible to learn new background knowledge, new language bias and even new examples. That'd be tricky to do in Python where examples, say, would be not programs, but the inputs of and outputs to programs.

The sister field to ILP, of Inductive Functional Programming exploits the homoiconicity of functional languages in similar ways.

Finally, Prolog is a language with a deductive inference algorithm as an interpreter and it turns out deduction can be sort of inverted into induction. Which is to say, we can go from reasoning to learning, with but a tiny little hop. Well, ish.

If you're interested in more details about my work, there's links in my profile.

Although I'm not really familiar enough to comment on much of this, I will point out that ORMs are still very popular and valuable despite their problems. If this is anything like ORMs, I would expect it to be very useful to many people despite theoretical problems that tend to be fairly manageable in practice.

Oh that's good, there was no mention of anything other than BigQuery in the front-matter but looks like it's in there:


My hope would be mainly that this can get datalog into mainstream use, and soon get more (and more mature) libraries created by the community. That is in itself very exciting to me though.

Would be pretty awesome if we could have logica (or something similar) for dataframes (including pandas), and so could build pipelines of transformations-via-queries on those.

(If there is anything like this already implemented, I'm all ears!).

Not a great introduction to the language, IMHO. There is not a clear use of logic to automatically reason about anything, just query composition. It seems the language is much more powerful than what this introduction makes it to be!

> English words (...) often capitalized to keep the old-fashioned COBOL spirit of the 70s alive!

I like logic programming a lot but a convention not technologically enforced is a poor reason to argue for a language change. When arguing about SQL limited abstraction capabilities that space would have been better spent talking about CTE limitations, for example.


> To make things worse, SQL code is rarely tested, because “testing SQL queries” sounds rather esoteric to most engineers, at best

So nonexisting best practices require a language change, apparently. It was also not showcased how this can be done in Logica, beyond the table mocking that could be done with a "with xxx as (values a, b, c) select (query to be tested)" approach in sql.

> So nonexisting best practices require a language change, apparently. It was also not showcased how this can be done in Logica.

Look for the section containing the text "As a final example, let us mock the comments table, in a unittest of a query." That demonstrates mocking and is explicitly pointing towards testing. The article is only a high-level intro document.

Sorry, I was editing my post in parallel. I see what you are pointing to. I'm sure the language has been thought out for much more time than my reading of the post - I'm just complaining about this "presentation" post. It's not clear from the example what the language provides over just redefining the table with mock values for testing. I´m sure Logica has more than what's stated in here, it's just not a good example (IMHO).

I have a hard time understanding what problem this solves versus SQL?

This really should have demonstration of some of the actual use cases where this shines vs SQL.

Also: SQL-92 has values clause which makes the example provided a little bit silly, you could just use

    values (2),(3),(5)
Another example they gave is a 5-line (excluding imported code) mocking code with a comment "compare that to what you would have to do to achieve the same using bare SQL". Okay..

    select * from (values (1, 'hello'), (2, 'logic'), (3, 'programming')) as mocktable(user_id, comment);

I guess the appeal of Datalog is the recursion that can be expressed in rules:

    has_descendant(?ancestor, ?descendant) :-
        has_child(?ancestor, ?child),
        has_descendant(?child, ?descendant).
Here, the assumption is that you have explicit `has_child` facts (expressing vertices in a graph, essentially), and the above rule gives you paths of arbitrary length.

In SQL, given a table has_child(parent, child), it is not clear to me how you can get all descendants of a given person, or all ancestors.

Other people talk about recursive extensions to SQL, maybe that provides a way.

Depends on the specific SQL flavor, e.g. https://www.postgresql.org/docs/9.1/queries-with.html

Logica instead seems to create tables and drop tables?


It looks like Logica code isn't actually translated to a (large) SQL query, but Logica code is dynamically interpreted by some interpreter that calls into a SQL database as the virtual machine.

Not sure what Logica really provides here. Datalog usually comes with elaborate techniques to make sure only what's really needed is calculated instead of just generating all values every fact generation "iteration".

> It looks like Logica code isn't actually translated to a (large) SQL query, but Logica code is dynamically interpreted by some interpreter that calls into a SQL database as the virtual machine.

Actually, I might have jumped to a wrong conclusion. I really don't know if that's what happens here, but if not then it would imply that recursion isn't actually driven by the evaluation mechanism, but by Logica creating enough tables for a given program to ensure that iterative evaluation of rules ends up with a fixed point result set. Not quite sure. None of the other examples show recursion in the SQL output itself either.

Well, at any rate I'd like to learn more about the actual mechanism here but they are a bit light on documentation.

The normal SQL way is with a recursive CTE and help from the query optimizer. Differs more in feeling (constructing relations with joins vs. defining predicates with logic) than in substance from the Datalog way.

Sqlite official examples: https://sqlite.org/lang_with.html

MySql official examples: https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/with.html

The problem Datalog tries to solve is complexity: SQL "pulls" data (what's a query after all) to a calling application. Datalog builds up data relationships through declarations. That means that: a) that entities can be inferred from these relationships as opposed to large complex queries, b) that some of these relationships can be built up by code/robots as opposed to humans declaring them.

The end result is (you hope) a very complex database where the smaller blocks/relationships can be audited and verified quickly, and where parallelization more or less comes for free.

The reality is that Datalog systems end up being massive hairballs of declarations that are hard to unravel for mere humans (well, regular developers) and that query-based solutions are 10x faster to develop for 80% of the application use cases.

The closest parallel is functional-vs-procedural programming (don't flame me); it's a niche solution for niche problems.

Source: former Datalog developer for ERP systems.

I actually mostly agree with you, except for the fact that in reality SQL is not a language, but a family of languages, some of which don't support the syntax[1], including Google's own BigQuery. Whether or not this is a reason to create a completely new unrelated language is still up for a debate.

Tangentially related, but does anybody know of a program or a library that takes standard SQL queries as input and outputs one or multiple equivalent queries using the SQL dialects of a set of DBMSs? That is, compiles a standard SQL query into a PostgreSQL one, an SQLite one, etc.

[1]: https://modern-sql.com/feature/values#compatibility

> does anybody know of a program or a library that takes standard SQL queries as input and outputs one or multiple equivalent queries using the SQL dialects of a set of DBMSs?

There are a number of tools that translate queries between SQL dialects. Google for “sql dialect translator”.

The thing is, most of them translate between dialects of SQL excluding Standard SQL, which is both more and less than what I would like to have. I don't want the ability to turn PostgreSQL SQL into SQLite SQL, but I do want the ability to turn Standard SQL into both of these. I guess, that will remain an idea for a hobby project, heh.

That's explained carefully in the first 5 paragraphs, especially paras 1, 3, 4, and 5.

Yes I read it. Lots of abstract talk about modularity and how SQL is so bad because people tend to use all caps without demonstrating how and what this whole new programming language can do better.

Note that I'm not claiming it can't do but I would be interested for the authors to point out what the actual benefits are.

All I see from their examples that clauses in Logica are much longer than SQL counterparts and that they are importing modules which (I assume) re-define already defined schemas which brings all kinds of different dependency problems that I'm not going to go in here..

The document assumes you have used SQL in anger before and run into it's lack of modularity. If you haven't run into those things then you'll miss the point. But if you have then this looks really interesting.

Ignore the upper caps thing. Have you not run into the problem that SQL cannot be written with reusable units of code? Do you not find reusable units of code to be the basis of programming?

Yeah, the quip about many people preferring to write SQL keywords in all caps as if that were somehow a defect of SQL struck me as being astonishingly asinine.

SQL queries aren't just esoteric, they have highly opaque performance implications. Two ways of doing an SQL query that might look mathematically equivalent to a human could result in orders of magnitude speed difference due to one of them using the proper index and the other having to do a sequential scan, or various other performance issues like creating temporary data.

So this is going to run into the same issues as any SQL code generator (compare Hibernate for example): you need to know what query it will output. And you need DBA skills to know what that query means in terms of performance. Neither of those steps can be skipped.

Nor is unit testing necessarily helpful when using small n. Issues of poor scaling don't show in tests unless the data is large.

Compared to Prolog, which is sensitive to declaration/search order and can assert enough new facts to not terminate, the risk of inefficient but always correct queries is a marked improvement.

This is just pure speculation on my part, but:

What about optimizations? It seems like it should be possible to construct a SQL query that doesn't hit the pain points (e.g. avoids queries that do not use indexes). Although from my experiences with other ORM frameworks, that probably isn't an easy problem.

Even then though, since it looks like it somewhat aims to replace SQL even in the database-construction step, that might help in this regard, by constructing a more optimal representation of the data (which doesn't seem to be tabular)?

Unit testing I am similarly skeptical about though. The article does mention it being "rather esoteric [sounding] at best", I would actually agree with that expression, haha. I don't think I've ever written or even seen, in my 8 years as a developer, a 100-line SQL query that was not at least partly generated (and hence required testing as a unit, and not just the code around it). I suppose Google operates at a different scale, but still.

There's Datomic[1] though it's proprietary. Regarding the Prolog, I hope they will take a look at newly emerging "GHC of Prolog" in Rust - Scryer[2].

[1] https://www.datomic.com/

[2] https://github.com/mthom/scryer-prolog

What do you mean with "GHC of Prolog"? I don´t know a lot about the Haskell ecosystem so I dont know what that implies.

Edit: Never mind, the Scryer Prolog github page states it as follows:

    Scryer Prolog aims to become to ISO Prolog what GHC is to Haskell: an open source industrial strength production environment that is also a testbed for bleeding edge research in logic and constraint programming, which is itself written in a high-level language.

Isn't SWI the GHC of Prolog already?

SWI Prolog is a very nice Prolog but it is not standard anymore

What do you mean by not standard anymore?

I recently gave Rego a shot but had a difficult time grokking it. It's also inspired from Datalog. How would you say Logica compares to Rego?

> It supports modules and imports, it can be used from an interactive Python notebook and it even makes testing your queries natural and easy.

I don't see any examples of how to do tests in the announcement. Consider adding some.

Nice to see some new initiatives in this domain (or old ideas resurface), but there is a long way towards mainstream adoption IMHO:

* My business clients and I speak SQL together. I don't see them learning a new language. I don't have the authority nor any will to force them to.

* I can spin up a container for testing business rules logic (and often share the results back to the client: here is what the impact of updating rule A is, rows of type W will be affected in this way).

Even though SQL has ceremony/verbosity, I'd rather see the standard be evolved. My clients and I could pick it up more easily.


That's great for BigQuery though. You can't spin up a BigQuery docker container anyway, and testing with another schema/project is risky while you have interns around.

Logica compiles to SQL. It is semantically equivalent but not syntactically so the gain has to be in differences in the grammar.

I see that one can create predicates (functions) with parameters as a means for code re-use. I could also see that having implications for testability. That's interesting.

I could see how one could build a DSL with Logica to make fairly tricky queries easier. That's interesting.

Has anyone used this? If so could you explain how are SQL functions called? Do they have to specifically exist in Logica or are they just assumed to exist in SQL? (I'm thinking about geographic functions in particular for example. Are window functions also possible in Logica?

Fantastic to see a logic programming/datalog/prolog-like from a big actor like Google. Perhaps this can make more such tools become mainstream.

Only I wish we had such a language for a more generic streaming / data processing framework, such as materialize [1].

I was very optimistic about that for some time, as the guy behind the technology, Frank McSherry, wrote some datalog tooling as well [2].

[1] https://materialize.com/

[2] https://github.com/vmware/differential-datalog

It astonishes me that it took so long for Datalog to be legitimized as a query language.

It's almost as if people saw OWL 2 DL, didn't believe what it had accomplished, and didn't try to make anything better.

Related question: Is there a good resource to learn logic programming (using Prolog or something like that) for an experienced programmer ?

The Reasoned Schemer is a highly praised book for that. It teaches minikanren and builds it from scratch. I personally did not like a lot its "socratic" style.

The best way to learn it, in my opinion, is to implement microkanren, which is micro by design for teaching purposes. It is small enough to fit in your head, understand what's unification, and play with it. Then you can jump into other implementations.

If you like clojure, you can use core.logic, although documentation is not abundant.

More prolog-related, The Power of Prolog https://www.metalevel.at/prolog has been praised here several times.

Without any hesitation I would recommend the Ivan Bratko book, it is so densely filled with knowledge. Prolog is vastly different from other programming systems and some of the concepts take a bit of exposition before they sink in, and this book very much strives to explain quite a lot of mysterious things.

Prolog Programming for Artificial Intelligence by Ivan Bratko.

Just want to put a very strong second on this recommendation. Three chapters in I knew enough to prototype something for work that reduced a mess of C++ to a couple pages of rules.

Im general I'm tending to feel that HN likes to recommend too many fad/quirky books than those that are having an encyclopedic approach.


My recommendation would be to learn the real thing, not an almost, sort-of Prolog that's actually a LISP dialect in disguise.

I recommend taking a look at 'Learn Datalog Today'[0] first. Although it's just Datalog (in S-expression form) which is a subset of Prolog but makes people get the gist of it very quickly.

For Prolog me too wondering if there's a great source. But I have read 'the Reasoned Schemer', it used a simple Scheme-based logical programming language for teaching purposes and it's very educative and entertaining.

[0] http://www.learndatalogtoday.org/

http://amzi.com/AdventureInProlog/index.php is i feel the best for learning to actually write something in prolog. Though it's maybe not so great for logic programming as a paradigm.

Also it's got some small incompatibilities with SWIprolog and I don't know how well amzi works under Wine so it can be frustrating if you're on linux.

Maybe I'm missing something, but why don't query languages allow for sum types and pattern matching? Why is it so hard to express in SQL a table that can contain either this schema or that schema?

The relational model describes a relational algebra, which in turn specifies a "relation" (often called a table) to be a set of tuples (rows), and "relational operators" that accept and return relations. And the relational algebra is complete in the sense that it can express all queries expressible by predicate logic + types and changes through a small set of operations.

The relational model adds constraints, state, and a mechanism for first-class derived relations (updateable views).

And while you can stick anything with a well-defined equality in a relation, including other relations, the point of the relational algebra is to describe structure using relations, thus making it all accessible to relational operators. In a properly normalized database, all structure can be manipulated through a common set of operations.

So you could, e.g. create a relation with a single JSON attribute and call it a day. But now, in addition to the relational operators, you need a whole mess of JSON operators to query it.

Thus, while you could have a sum type, you don't need this because you can put the various summands into separate relations. For instance, the simple case of booleans:

    Persons(key id: int, name: str, is_tall: bool)
    ... noramlizes to ...
    Persons(key id: int, name: str)
    TallPersons(key id: int)
Or for an Either:

    Persons(key id: int, name: str, zing: either<int, str>)
    ... noramlizes to ...
    Persons(key id: int, name: str)
    LeftPersons(key id: int, zing: int)
    RightPersons(key id: int, zing: str)
    AssertEmpty: LeftPersons{key} & RightPersons{key}
What you really want your database to do is to let you enter that first "Persons" table with the sum type. That should logically be a derived table that is backed by the normalized tables.

Then, you'd get the simplicity of entering Persons.insert(key=5, name='bob', zing=Left(5)), but that's simply an updateable view. It will really update the base tables Persons / LeftPersons with simple atomic values.

That's a very good explanation, thanks. However I'd argue that just because you can express something in a language, doesn't mean it's the optimal way to express it. Any turing complete language can express sum types, but there's something to be said about having explicit features within the language for sum types.

The SQL vs NoSQL debate reminds me of the static types vs dynamic types debate. Static type systems offer a more robust mental model and verification, but at the cost of rigidity and forcing the programmer to think in the type system's mental model. Dynamic types are therefore appealing because they can adapt to the programmer's mental model, even if that model is seriously flawed. One change in philosophy that has helped static type system is the realization that you can design static type systems that while not conceptually pure, get closer to the dynamic mental models. These systems, such as TypeScript or Go, make static types easier to swallow.

I wonder if the same could be said for SQL/relational systems. If you could design a relational system that's a touch more ergonomic, that doesn't require learning a mental model that feels a little foreign, maybe the NoSQL options will be a lot less appealing.

> However I'd argue that just because you can express something in a language, doesn't mean it's the optimal way to express it.

I completely agree, and I may not have driven this point home, but because a relational system lets you have different views of the same data, you can have sum types in one view, while breaking them into relational structures in another view of the same data. We don't get that in SQL DBMSs because SQL isn't really relational.

> If you could design a relational system that's a touch more ergonomic, that doesn't require learning a mental model that feels a little foreign, maybe the NoSQL options will be a lot less appealing.

What's could be ergonomic about a strongly system is that it can guarantee that if I pull a record from the database, it has exactly what types it says it does, so my code knows what it's dealing with. Even if I'm writing something in Python, I don't actually want to do a mess of instanceof checks.

I think what we want is to properly wire a DBMS into build tools. The production system should have everything strictly typed, presenting a clean API to consumers.

Meanwhile, a development branch should let you put whatever you want in there while you're experimenting, and then you lock it down before you promote your code to prod.

I've skimmed https://en.wikipedia.org/wiki/Database_normalization before and I'm trying to reconcile your clear explanation with what is there.

  > Informally, a relational database relation is often described as "normalized" if it meets third normal form.
So then I look at https://en.wikipedia.org/wiki/Third_normal_form , which has this fun:

  > An approximation of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: "[every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key".[7] A common variation supplements this definition with the oath "so help me Codd".
I don't see how that relates to your normalization. It's possible the simple case of booleans was just for illustration, but if not, then it suggests to me that there should never be any boolean column in the normalized schema, since you can have an additional table containing the keys corresponding to e.g. `true` values.

Could anyone clarify?

So 2NF and 3NF are rules that aim to eliminate data redundancies. That's Kent's point: everything is a fact about a key.

I'm talking about a more basic idea of expressing structures that are accessible through relational operations, so joins, intersections, unions, etc. That's known as 1NF[1] and, as you might expect, it's a pre-requisite to 2NF and 3NF.

[1]: https://en.wikipedia.org/wiki/First_normal_form

Thanks. And so in 1NF, would you ever have a column of type boolean?

Nope. I believe that's why ANSI SQL doesn't specify a boolean column type.

What prevents this same line of reasoning from being used to justify that SQL does not need a DATETIME primitive type?

Just as you showed how a sum type could be represented with component relations, a similar example could be constructed showing how any needed attribute of a datetime could be normalized to various tables combining the features of dates and times.

The problem is that you will want to query all persons with (say) a particular name regardless of the zing, and maybe join it with some related table (say, Address), but still want (as the client application) to receive the zing values. To represent the result with a single relation, you need (in the result) nullable fields for left zing and right zing, and then it makes more sense to have those in Person in the first place (plus a constraint that they are exclusive). The only alternative would be that the client receives several on-the-fly created relations as the result (i.e. LeftPersonsWithAddress and RightPersonsWithAddress), but then what about a sorted result? E.g. the client may want all resulting PersonsWithAddress sorted by ZIP code.

I’d also really like to understand this better. I did some (casual) research and couldn’t even find any decent papers addressing this.

It seems misleading to call this "Datalog". The GitHub repo even says "among database theoreticians Datalog and SQL are known to be equivalent", which is absolutely wrong without qualification. Some flavors of SQL will have recursive extensions so that they could be considered equivalent, but that is not true in general.

I can't find any mention of recursion on the original blog post or the GitHub page. Without recursion it isn't Datalog.

A major difference between Datalog and SQL is that Datalog uses set semantics, whereas SQL uses bag semantics. For those not aware, that means facts in datalog are unique. SQL’s equivalent to facts (records) are not unique, and relations can contain duplicates.

SQL 99 has recursion via recursive CTEs, so the claim is probably valid.

Fair enough, but to a database theoretician SQL means "non-recursive conjunctive queries". And I skimmed the tutorial and there is not a single example of using Logica to compute something recursively. Transitive closure is the canonical example for showing off Datalog.

SQL 99 only requires support for linear recursion, i.e. "each FROM has at most one reference to a recursively-defined relation". Datalog doesn't have such a restriction

The C-like syntax is a bit depressing.

There is a syntax debate I respect. While I prefer austere syntax, deeper thinkers like Bill Joy note that programmer productivity increases with more information on screen at once. Syntax that improves both code density and clarity is a good thing. I love Haskell and Ruby in actual use, even if I want to prefer Lisp without parentheses (an easy preprocessor if one thinks it through).

I cannot respect perpetuating C syntax just to attract users who would otherwise be challenged (that Apollo 13 astronaut who "never trained in the LEM"). Rob Pike once gave the only justification I can understand: Code used to need to survive communicating through channels that mangled whitespace.

That is no longer the case, and modern editors all support syntax highlighting. (We've reached the point where one should develop an editor language server in parallel with any new language.)

If your editor can figure out your language's grammar, and then you can with the editor's help, then one achieves greater code density and clarity at once. Some people do love terminals, but most people use graphical user interfaces. Why is language design stuck in terminal pre-history? There is no excuse in 2021 for lots of stray punctuation that's just ground glass in programmers' eyes.

> There is a syntax debate I respect. While I prefer austere syntax, deeper thinkers like Bill Joy note that programmer productivity increases with more information on screen at once. Syntax that improves both code density and clarity is a good thing. I love Haskell and Ruby in actual use, even if I want to prefer Lisp without parentheses (an easy preprocessor if one thinks it through).

I strongly prefer verbose type systems, in particular, some lightweight type inference is good as long as it's not full fledged HM type inference (like Haskell, Rust etc). The problem with HM type inference is although it's extremely powerful and makes the code look cleaner, it hides important data from programmer, which ultimately causes 2 bugs:

* variables being inferred to have types slightly different than ehat programmer expects. E.g. I expected foo to be A(B(C)) -> D(C) turns out it's actually A(B(X)) -> D(C) which also type checks.

* Errors can be harder to read.

Yes. My Haskell code includes all signatures, even for functions defined inside functions.

This seems to mostly be Prolog syntax, a thing which I can assure you has never been chosen to attract users.

Datomic-flavoured Datalog: http://www.learndatalogtoday.org/

I'm surprised there's no trademark issues with the name. I worked for Logica, the company for many years. At some point later on they got bought by CGI but you'd have thought they still held the rights to the name. CGI is a stupid name - a three-letter acronym that conflicts with more than one unrelated IT term. Seems logica.com still redirects.

I don't believe you can trademark the name of a computer language

I am trying and find concrete examples of where this is superior to mainstream SQL. Right now the documentation and examples are targeted at higher-level users than myself and it is not clear (to me that is) what it is trying to solve. Anyone have an ELI5-type link?

In the examples I saw, I never saw one that did a "join". I guess I'm surprised that they positioned this as a replacement for SQL but gave examples like detecting if something is prime. Might be useful, but I didn't get a good sense for it by reading the article or nearby docs.

The linked page has an example of a join between the `comments` table and the MagicNumber relation.

A join of A and B looks like

  C(x,z) :-

Joins show up in the tutorial.

> The main flaw of SQL, however, lies in its very limited support for abstraction.

This is objectively bullshit.

What SQL database engine doesn't provide views, stored procedures and/or user-defined function support?

Failure to construct higher-order abstractions in SQL is a failure of the engineer to understand the problem domain, not a failure of the tool.

SQL is capable of operating at any level of abstraction you wish for it to. It is all engineering from there.

None of those things (views, stored procs, functions) are part of the query language. They all involve making somewhat permanent alterations to the structure of the database, and are entirely capable of conflicting with other objects in the database. So if your query needs these things, you're now into "release management" territory.

In terms of lack of linguistic abstraction, try writing a query that abstracts over what table it's to be run against, allowing choice of source data to be parameterized, without either resorting to shenanigans like passing strings of SQL fragments around to be eval'd during query execution, or writing the query in some other system such that it can then be compiled into SQL for execution.

To give a concrete example of what we're talking about, I wrote this a couple of months ago in a similar discussion: https://news.ycombinator.com/item?id=26320573

The problem is that "Compose" function is essentially impossible to write. I say essentially because it's probably something that could theoretically be written, but not with any reasonable amount of effort. It would literally be on par with the difficulty of implementing a DB from scratch that implements the Logica language as its base query language.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact