
Dhall 1.0.1: a total programming language specialized to configuration files - sndean
https://github.com/Gabriel439/Haskell-Dhall-Library
======
dozzie
Reaction of a sysadmin: Nice. Let's not.

Configuration file should be passive data, not a programming language, and as
easy to process in variety of languages as possible.

~~~
wintermute42
It's not a programming language, it's not turing complete and it is guaranteed
to terminate. So what's the problem?

~~~
shakna
> Evaluation always terminates, no exceptions

How do you know for certain?

Where's the proof for the Halting Problem in this case?

Claiming safety through termination is nice, but not something that is simple
to solve.

~~~
aseipp
The language isn't Turing complete, and it offers no way to encode general
recursion in any form (and it isn't offered by the 'compiler' as a primitive).
The language is "strongly normalizing", meaning every term and program can
always be reduced to a final "normal form", where no more reductions are
possible (like how '2' is the normal form of '1+1') and this normal form
exists for every program which can type-check. Therefore, the 'compiler'
always terminates as well, when it is asked to evaluate the program. You don't
even need to get to the point of talking about the halting problem, because
the tool isn't _asking_ if any particular program terminates. It does not need
to. All programs that can be expressed in the language, by design, terminate.

But the details are more subtle. In particular you can write a configuration
expression that takes 10 years to compute, but it's probably less likely
you'll do this. So you also need a way of asking "Is this in normal form" too,
for things like explicitly untrusted inputs. But that's pretty doable since
you can traverse the input and easily see if any terms haven't been reduced.

~~~
shakna
> it offers no way to encode general recursion in any form

I'd need to look into that a bit more, but at the moment I don't believe that
is actually true.

Dhall provides anonymous functions, or the basics of lambda calculus.

This _probably_ means that I can encode the Y-Combinator in it, resulting in
anonymous recursion, though unless functions are first class, it'd by 500+
LOC.

Again, I haven't reviewed the internals enough to see, but if you have
functions, that's normally enough to build infinite anonymous recursion.

Even if it isn't possible for some reason I haven't seen, claiming safety in
that fashion probably isn't appropriate.

I'm also not certain that it _isn 't_ Turing Complete.

The C Pre-Processor doesn't supply a way to encode general recursion, but as
we can see here [0], it is entirely possible, though to a limited depth.
Leading to some debate as to whether or not it is Turing Complete.

If a configuration program created by Dhall is not guaranteed to terminate _in
a timely manner_ , then you completely lose the safety there.

Reduces to a normal fashion is completely useless as a guarantee if it takes
10 years, and is one of the reasons that simple markup/notation forms like
YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather
than something that can explode into O(n^2) time.

[0]
[https://news.ycombinator.com/item?id=4795542](https://news.ycombinator.com/item?id=4795542)

~~~
aseipp
> This probably means that I can encode the Y-Combinator in it, resulting in
> anonymous recursion, though unless functions are first class, it'd by 500+
> LOC.

Try writing that function. It has a type like '(a -> a) -> a', which you need
to implement. You should try it, but I'll spoil it for you: Dhall has no
recursive types, nor does it have recursive values. The recursive Y-combinator
cannot be written in Dhall itself, it must be provided by the implementation.
You can write this in Haskell using open recursion/data types, but only
because Haskell allows equirecursive types, and it's already inconsistent
anyway. Dhall has neither of these properties: no form of recursion exists at
all.

This is a property of System F, the core itself, which is quite well
established.

That said, some weird bug in the implementation itself may cause some weird
behavior (like most software). But System F itself? It's not turing complete
and this is well known for a very long time as a mathematical result. And
having written my _own_ implementation of the Calculus of Constructions: it's
quite easy to get the core logic correctly implemented. In fact, with a few
abstractions, you can correctly implement the Calculus of Constructions --
which is even _more powerful than Dhall_ \-- in about ~50 lines of Haskell
code.

> Again, I haven't reviewed the internals enough to see, but if you have
> functions, that's normally enough to build infinite anonymous recursion.

No, it isn't. Functions have nothing to do with it, at all. In the world of
lambda calculus/natural deduction/logic, the relevant bit is not functions, it
is the question: _can I construct any possible value, out of thin air_? Or, in
other words, can I create a value of type 'forall a. a', somehow? Can I create
a paradox -- a logical inconsistency? In the unified world of programming
languages & logic, "general recusion" is actually a form of logical
inconsistency inside the system, which allows you to prove anything -- of the
kind Godel described.

This is, roughly speaking, the equivalent process to determining some logical
system like ZFC is actually inconsistent: such as proving `True = False`
inside ZFC with nothing else. If you can create that proof out of nothing but
the basics, you've created something that shouldn't exist. It is a
contradiction. If you can do this, you have proven inconsistency in the
underlying theory -- because you can prove _anything_ if you already have
proven 'True = False'
[https://en.wikipedia.org/wiki/Principle_of_explosion](https://en.wikipedia.org/wiki/Principle_of_explosion)

And, really, it is not "roughly" the same thing -- it is the _exact same
thing_. To find a value of type 'forall a. a', in fact, is the exact same
principle as proving or showing a paradox/inconsistency in an ordinary logic.

If what you said was true, that "functions lead to turing completeness", then
the _existence of modus ponens_ \-- modus ponens being the logical equivalent
of "function type" in the lambda calculus -- would imply that systems like ZFC
are inconsistent. But modus ponens and inconsistency have nothing to do with
each other, and the fact a theory contains Modus Ponens as a valid judgement
rule does NOT mean it is inconsistent.

For example, the simply typed lambda calculus has functions. But it does not
have any form of polymorphism, or type variables. Thus, the general Y
combinator is untypeable in this language and cannot be written, and the STLC
is not Turing complete: you cannot write a function of type 'forall a. (a ->
a) -> a' inside the language. It's not even possible to _abstract_ over types,
because there are no type variables! Picking a specific type 'a' means nothing
at all: '(Bool -> Bool) -> Bool' is extremely easy to implement and not
inconsistent in any way ('fix k = k True', done, and perfectly consistent).

It's the "for every 'a'..." part, along with the _ability to deduce_ 'forall
a' (somehow, someway), that leads to general recursion. If you want the STLC
to have recursion, your compiler must implement a magic version of 'fix' on
its own.

> I'm also not certain that it isn't Turing Complete.

You should be, because it isn't. There's ample research on the design of typed
programming languages, the Lambda Cube, System F, and turing completeness.

A good book to start with, just for the design of these systems, is "Types and
Programming Languages" by Benjamin Pierce.

> The C Pre-Processor doesn't supply a way to encode general recursion, but as
> we can see here [0], it is entirely possible, though to a limited depth.
> Leading to some debate as to whether or not it is Turing Complete.

No, it doesn't lead to anything. The C preprocessor isn't turing complete.
That's it. There is no scientific debate to be had about this, anymore than
there is a scientific "debate" about what temperature water boils at. The
replies to the post you quoted illustrate this directly and express the
difference. Whether or not "it can recurse 10,000,000 if I allow it to do so
by passing a flag, and 10,000,001 times if I changed the number 10,000,000 to
10,000,0001" is rather besides the point of whether it's Turing complete.

> Reduces to a normal fashion is completely useless as a guarantee if it takes
> 10 years, and is one of the reasons that simple markup/notation forms like
> YAML and JSON that are readable in O(1)/O(n) time end up preferred, rather
> than something that can explode into O(n^2) time.

Correct. This is why being able to ask if any configuration is normalized is
important (and asking this can be done quickly by simply seeing if there are
any unreduced terms, rather than trying to actively reduce them). But let's
not pretend YAML and JSON don't have their deficiencies in similar realms --
O(1) is nice and all, yet YAML is actually relatively complex to the point of
introducing security vulnerabilities, and I've encountered my fare share of
deficiencies/inconsistencies in JSON/YAML parsers.

You must also consider the usage scenarios. Purely for configuration, this is
less likely to matter: you're unlikely to write a purposefully
inconsistent/bizarre YAML file that fucks up your parser, much like you're
unlikely to write a Dhall configuration which takes 10 years to reduce when
evaluated. But when you are accepting arbitrary, untrusted inputs - YAML
parsing security matters, just like Dhall evaluation time, and the ability to
ask "is this normal form", does.

In most cases I don't see the differences or threat vectors being truly,
meaningfully distinct, at all. If you trust your input sources, given the
general way configuration files are used -- the evaluation time isn't really a
problem. YAML O(1) or whatever doesn't really matter much either; it's
generally a fixed cost anyway. But if you _don 't_ trust your input sources --
YAML should scare you _just as much_ as Dhall.

That said, I don't think Dhall will offer a substantial benefit for most
people who are already using YAML. So it's not like you should switch. To me
it's sorta neat, but probably something I'd never use extensively just due to
adoption factors.

But I'm afraid you're deeply mistaken on the design of the language, and don't
seem to know much about programming language theory, at a glance. I don't
blame you, since it's a rather dry field. But, to put it in perspective:
"System F and the Lambda Cube alone are not turing complete, and always
normalize" is so trivially obvious in the PLT world that if you asked someone
to "prove it" to you, they would probably say "It's true by definition of the
system itself, what are you even asking" before staring at you blankly (or,
they realize you don't know the field, and that they're going to have to start
from square 1, as far as computability/language theory goes, to explain why
this is the case.) It's an old and well accepted result.

TAPL, the book I mentioned earlier is a good book to introduce yourself to
many concepts like this and should help clear up any confusion you might have.

~~~
shakna
> This is a property of System F, the core itself, which is quite well
> established.

So when I repeatedly asked for the aspect of the language design that
guaranteed this safety, and you responded with "the design guarantees it", you
meant polymorphic lambda calculus, that, as it lacks infinite terms,
preventing the creation of recursion through unfolding.

Clearly stating the reliance of System F would go some way towards generating
trust in the definition of the language.

> There is no scientific debate to be had about this, anymore than there is a
> scientific "debate" about what temperature water boils at.

A language is considered to be Turing complete if it can calculate any
computable value that a Turing machine can.

Brainfuck was proven to be Turing complete by Muller, and languages such as
Piet have proven their own completeness by emulating or translating arbitrary
Brainfuck programs.

So, I'm afraid the question in regards to the CPP, is only is limited
recursion acceptable for Turing completeness?

The Turing-Church thesis specifically states that it should ignore resource
limitations, which leads to the question of whether the limited recursion is a
result of available resource or the language at hand.

There are active debates on this topic, and it cannot be so readily dismissed.

However, the point I raised with this, is the claimed safety of Dhall, in that
it doesn't guarantee timely normalisation of its terms.

Which I believe you may have also noticed. [0]

> much like you're unlikely to write a Dhall configuration which takes 10
> years to reduce when evaluated.

True, however the guarantee of evaluation safety is false.

There is no safety there when you can run out the available resource, such as
time.

And people do insane, unlikely things with a language, such as breaching the
type safety of Scala, and in response, they developed DOT calculus, so that
they might give those guarantees.

Trusting a programmer not to do something stupid, is often the wrong thing to
do. [1]

There is a line which must be drawn, but you can't make guarantees that
mislead the programmer to feel safe.

[0] [https://github.com/Gabriel439/Haskell-Dhall-
Library/issues/6](https://github.com/Gabriel439/Haskell-Dhall-
Library/issues/6) [1]
[https://github.com/kanaka/mal/tree/master/awk](https://github.com/kanaka/mal/tree/master/awk)

~~~
Gabriel439
Author of Dhall here: while Dhall can't guarantee timely normalization Dhall
can guarantee timely checking if a term is normalized and you can write a
program using the Dhall library (or add a flag to the default Dhall compiler)
that rejects imports that aren't in normal form. Checking for normal form is
linear in the size of the source code.

As others have mentioned, Dhall is based on System F, which guarantees that
all programs halt. You can't write the Y combinator or any other infinitely
looping lambda calculus term in Dhall because it won't type check. The typical
example of an infinitely looping lambda term is the following untyped lambda
calculus term:

    
    
        (λx → x x) (λx → x x)
    

... which won't type-check in Dhall. It's actually not even legal Dhall code
because you have to declare the type of all bound variables but there is no
type you can assign to x that will make that type check. Even in a language
with type inference it still won't type check as you will get a unification
error

The theory behind Dhall's termination guarantees is sound. If a program loops
indefinitely in Dhall it can only be because of an implementation bug causing
Dhall not to match the underlying theoretical foundation. The theory itself is
unbreakable.

