
Kotlingrad: A shape-safe DSL for differentiable programming - bmc7505
https://github.com/breandan/kotlingrad
======
bmc7505
Hello HN, author here! Been working on this DSL for differentiable programming
for a while now. The API is starting to stabilize and it would be great to
collect feedback from real users. You can find some toy examples, including
the core of our implementation here:

[https://github.com/breandan/kotlingrad/tree/master/src/main/...](https://github.com/breandan/kotlingrad/tree/master/src/main/kotlin/edu/umontreal/kotlingrad/samples)

Please feel free to reach out if you have any questions or comments. Thanks so
much for your feedback!

------
jillesvangurp
The math is a bit beyond my skills but I love the creative use of Kotlin
language and compiler features on display here. The How? section was very
enlightening. However, the abuse of backticks makes for some unreadable code.
I was thrown of by `1` a bit until I realised that that actually referred to a
sealed class with that as the name. I'm still a bit puzzled by how shape
inference is supposed to work. So, a bit too much type voodoo here for my
taste but definitely very cool.

~~~
bmc7505
Author here, thank you for your kind remarks! You’re right, the backticks
surrounding our type level integers are too distracting. The Kotlin grammar
[1] requires class names must either start with a letter, or be enclosed by
back-ticks. Unless the Kotlin language designers reconsider the identifier
scheme [2] (?), we would be better off replacing these with something like T1,
T2, ..., TN for readability and to minimize confusion.

Re: Shape Inference. To properly understand how this works, the best place to
start is by looking at ToyVectorExample.kt [3]. Once you’re comfortable with
that, ToyMatrixExample.kt is a fairly direct generalization. The best way to
explain our type system is that we stick to equality checking and perform
manual type conversion for tensor contractions and expansions. Haven’t figures
out a good way to do concatenation, slicing, or convolution which require
type-level arithmetic. This may be feasible for a fixed number set of
predefined shapes, but seems to be generally intractable in the Java/Kotlin
type system without a significant overhead. But if someone has an idea, I
would be very happy to be proven wrong about this!

[1]:
[https://kotlinlang.org/docs/reference/grammar.html](https://kotlinlang.org/docs/reference/grammar.html)

[2]: [https://github.com/Kotlin/kotlin-
spec/blob/master/grammar/sr...](https://github.com/Kotlin/kotlin-
spec/blob/master/grammar/src/main/antlr/UnicodeClasses.g4)

[3]:
[https://github.com/breandan/kotlingrad/blob/master/src/main/...](https://github.com/breandan/kotlingrad/blob/master/src/main/kotlin/edu/umontreal/kotlingrad/samples/ToyVectorExample.kt)

[4]:
[https://github.com/breandan/kotlingrad/blob/master/src/main/...](https://github.com/breandan/kotlingrad/blob/master/src/main/kotlin/edu/umontreal/kotlingrad/samples/ToyMatrixExample.kt)

~~~
jillesvangurp
Regarding the naming, the backtick notation is useful sometimes to allow you
to do stuff that goes against normal conventions in Kotlin for identifiers.
But generally sticking with naming conventions (that you are also dodging with
various annotations top opt out of warnings for non ascii characters) is
probably helpful from a code readability and autocomplete predictability point
of view; which I would argue is important in an API and DSL.

I'd suggest just spelling out what is what instead of relying on names that by
themselves are meaningless. So e.g. Dimension1, Dimension2x2, etc. make it
clear that 1) it's a class 2) it's intended as syntactic sugar for something
else. Could type aliases work here? In any case, there are only so many vector
and matrix dimensions you can predefine. Your approach kind of breaks down for
larger vectors/matrices and you lose compile time checks for those.

Nevertheless, kudos for pushing the type system this far.

~~~
bmc7505
You raise some good points about the API readability and discoverability. For
identifiers that the user is likely to write often, ASCII and standard naming
conventions are usually preferable. While auto-completion is capable of
handling unicode symbols, it can easily go overboard and harm usability. Agree
that typealias could help, it's something we should explore. Thanks for the
suggestions!

Indeed you are correct about the finite dimension constraint, a problem we
discussed with a recent reviewer [1] in further depth. As you correctly noted,
since we have a fixed upper bound for type-level integers, when checking large
tensors (i.e. whose width exceeds 1000 in any dimension), we fall back to
dynamic shape checking, prior to numerical evaluation. Smaller tensors can be
shape checked at compile time.

I too, am wary of the "type voodoo" associated with type-level programming,
but would argue that the added complexity from our approach (sans type level
arithmetic) comes at very little cost to the end user due to type inference.
In most cases we can check arrays at compile time. In the worst case, or if
the user chooses, we use runtime shape checking or unsafe execution, which is
the same as ordinary array programming.

Thanks again for your feedback. Please don't hesitate to reach out if you have
any further questions or comments!

[1]:
[https://openreview.net/forum?id=SkluMSZ08H&noteId=BJeZ3FkuOr](https://openreview.net/forum?id=SkluMSZ08H&noteId=BJeZ3FkuOr)

------
Enginerrrd
Man I'm having a hell of a time imagining how one would write a differentiable
program.

I guess realistically it would have to be functional programming, where each
function is itself differentiable, and then you'd have to be very careful with
your function composition.

I'm incredibly intrigued, but also pretty skeptical. Like it seems like it
would be so hard to ensure that you don't leave the domain where your program
is differentiable.

~~~
nullc
There are automatic differentiation libraries for many languages, including
C++.

The result is generally only piecewise differentiable because, e.g. you
obviously cannot differentiate across the hard branch of an if-statement.

Usually for optimization you want reverse-mode differentiation, where you
learn how the function output-- some optimization objective-- depends on the
input variables. The automatic differentiation libraries records all the
operations you perform on the input variables and then composes them using the
chain rule, working backwards.

For some kinds of codebases it's actually pretty natural... e.g. I've taken a
big body of C digital signal processing code and in a couple hours gotten
gradients for it just by replacing all the floating point variables with
special differentiable types from an AD library. These libraries also include
collections of standard math lib functions which it knows the derivative for.

For the purpose of optimization the fact that your program isn't
differentiable everywhere isn't necessarily a problem. You can use gradient
free methods for searches across those boundaries, and gradient methods to
refine your results within them.

If you're optimizing over problems that have thousands of input variables
having derivatives is critical to making machine optimization tractable--
unless your search space is particularly trivial and can be well approximated
by the simple differentiable models that gradient free local optimization
tools use internally.

Of course, you could also attempt to directly program your derivatives-- it
almost would certainly be much faster to evaluate than transcript based
reverse mode AD. But it may be impossible to construct analytical derivatives
that cover your whole problem domain, it's often unreasonable hard even when
possible, and even if you have them keeping the functions in-sync is difficult
and a source of devious bugs.

------
tjchear
In case anyone doesn't know what differential programming is, here's what
Wikipedia says about it [0]:

> Differentiable programming, or ∂P, is a programming paradigm in which the
> programs can be differentiated throughout, usually via automatic
> differentiation.[1][2][3] This allows for gradient based optimization of
> parameters in the program, often via gradient descent. Differentiable
> programming has found use in a wide variety of areas, particularly
> scientific computing and artificial intelligence.

[0]
[https://en.m.wikipedia.org/wiki/Differentiable_programming](https://en.m.wikipedia.org/wiki/Differentiable_programming)

------
xiaodai
Julia's compiler lets you hook into it and make something mind blowing like
Zygote.jl
[https://github.com/FluxML/Zygote.jl](https://github.com/FluxML/Zygote.jl)

------
mkl
What's up with the bizarre, untypeable and unpronounceable[1] orthography? I
couldn't see an explanation skimming through. "Kotlin𝛁" seems likely to
violate trademark rules, as it's almost identical to "Kotlin", and the special
character is likely to be stripped by forums etc., leading to more confusion.

[1] Yes, I know the repo name uses "grad" \- that spelling actually makes
sense.

~~~
bmc7505
This is good to know. We had not previously considered the trademark aspect,
but will investigate further. While the name is shared with a geographic
location and various other projects use a similar naming convention (e.g.
Stalin∇, ∇SLAM), "Kotlingrad" may indeed be more appropriate and SEO-friendly.
Thanks for your feedback.

~~~
mkl
There's another issue you may not be aware of, which is what I was describing,
based on how my phone displayed it. The ∇ (U+2207) you just said for Stalin∇
and ∇SLAM is a different character to the one on the Github page, 𝛁 (U+1d6c1)!
The latter character is outside the BMP, so will lead to plenty of
compatibility and font issues. For me on Firefox on Android, it shows up as a
blank grey wide character rectangular block (Chrome on Kubuntu it's fine). The
symbol, when it displays properly, is pronounceable if you know the maths, but
will probably hurt SEO etc. If you must use one of these, definitely use ∇, as
that should be well supported (just hard to type and search for).

~~~
bmc7505
Re: Typography. Good to know. We might drop the nabla and stick to ASCII in
future references.

Re: Trademark. Might be an issue. I’ve checked with the Kotlin team and they
are looking into it.

------
gtt
So, can we expect KotlinTorch soon?

~~~
bmc7505
Possibly! In the mean time, there are already some Torch bindings for
Kotlin/Native:

[https://github.com/JetBrains/kotlin-
native/tree/master/sampl...](https://github.com/JetBrains/kotlin-
native/tree/master/samples/torch)

If you’re more interested in the type-safety, you might also check out
Hasktorch:

[https://github.com/hasktorch/hasktorch](https://github.com/hasktorch/hasktorch)

