
The Spiral Language - Nelkins
https://github.com/mrakgr/The-Spiral-Language
======
PeCaN
In case anyone is wondering about why this project puts so much importance on
things being inlineable, GPUs often don't support function calls or have an
extremely shallow return stack.

In this way it reminds me of [http://futhark-lang.org/](http://futhark-
lang.org/) somewhat, though Futhark is explicitly targeted at producing
extremely efficient GPU code.

~~~
ygra
As also described by the fourth to sixth paragraphs in the readme:

 _Abstraction by heap allocation is a dead end. It works moderately well on
the current generation of computers where CPU is still the dominant driver of
computation._

 _It cannot work for devices like GPUs and the rest coming down the line. Many
of the most important computational devices of the future won 't support heap
allocation so an alternative is needed to draw out their full power. It is of
absolute importance that a language for that task have excellent control over
inlining. Inlining therefore must comes as guarantee in the language and be a
part of the type system._

 _Inlining is a trade-off that expresses the exchange of memory for
computation. It should be the default instead of heap allocating._

I guess whoever _is_ wondering about the why hasn't read the introduction.

~~~
xiaq
Function calls use stacks. It is not the same thing as heap allocation.

~~~
crististm
Function calls don't necessarily need to use stacks. There is nothing
preventing frame allocations on heap.

~~~
wtetzner
In fact, many Scheme implementations allocate their frames on the heap,
because it makes it easier to support continuations.

------
kgbier
The commit messages across the entire project are... alarmingly insightful.
There's some detailed stream of consciousness stuff in there.

~~~
jpwagner
[https://github.com/mrakgr/The-Spiral-
Language/commit/51fe8b7...](https://github.com/mrakgr/The-Spiral-
Language/commit/51fe8b7f286590a83db5aa47d1da7fb11a96af56)

~~~
jessep
If you haven't clicked through to read the above commit message, please do. It
is amazing. This is a glimpse into the mind of a mad scientist (and I respect
and love mad scientists more than anything and aspire to be one)

~~~
Razengan
> _It is a tragedy, but I cannot imagine a language better than Spiral. Not
> within real world constraints facing me. A language better than it would
> pretty much have to be magic._

> _So, in this respect, my programming skill has reached it 's limit and the
> most I will be able to do is refine it._

> _And it is embarrassing to me because I want to do better. I am ashamed, but
> the one to come will have to follow up on what I could not manage with my
> meager human abilities._

This reminds me of the fictional inspirational process of dwarves in the
Drizzt Do'Urden series of books [0] by R. A. Salvatore [1]. In that setting a
dwarf craftsman gets only one moment of exceptional divine inspiration in his
life to forge a legendary artifact. It is both a blessing and curse for them,
because they realize that they will never be able to surpass that work for the
remainder of their long lives.

I wonder if other people here have had that moment in their life as well?

[0]
[https://en.wikipedia.org/wiki/The_Icewind_Dale_Trilogy](https://en.wikipedia.org/wiki/The_Icewind_Dale_Trilogy)

[1]
[https://en.wikipedia.org/wiki/R._A._Salvatore](https://en.wikipedia.org/wiki/R._A._Salvatore)

~~~
zwkrt
The dwarves should be thankful. Many people would have this moment and not
realize it has passed them by.

------
brianberns
I think it's worth pointing out that the Spiral compiler seems to be written
in F#, which is itself a terrific functional programming language. (And it
looks like the Spiral compiler actually emits F#, which is then compiled?)

The author states that Spiral "was made for the sake of making a deep learning
library which was too difficult to do in F# itself." I wonder what he thinks
about DeepML ([http://www.deepml.net/](http://www.deepml.net/)), which is a
deep learning library written in F#.

~~~
abstractcontrol
Before I started work on Spiral I spent way too long, like a year working on a
ML library before throwing in the towel. I think I skimmed through the source
of DeepML over a year ago and my impression was that the author was struggling
against the limitations of F#'s type system. There were a bunch of places
where he was doing the equivalent of telling the compiler to fuck itself by
downcasting to `System.Object`.

The way these projects go is that you start of with an AD library and then
when the type system and the language starts getting in the way you either do
a lot, and I mean a LOT of engineering with inferior tools or you make
something better.

The 'make something better' is always picked and can take various forms.
Generally, people would not make a decision to work on a language for a ML
library for almost a year before they write down the first line for it. They
instead take the middle road of starting with an AD library and gradually
extending it.

First they realize that they cannot really express tensors properly with the
confines of the language so they make everything symbolic, then they build
engines to run such symbolic AST, then they realize that they need JITs and
other optimizers to make everything run fast. Tensorflow and PyTorch are
currently at this stage. There are more stages after this.

Of course the task of working with ASTs is what a compiler does. I think it is
a great pity that Tensorflow and PyTorch are written in C++ under the hood.
C++ is probably the worst language I can imagine for working on compilers, so
I applaud the authors of DeepML for picking F# instead. Had the TF team picked
statically typed functional language for this they could have cut the size of
TF by over 10x and saved themselves over a million lines of code.

[ML]([https://en.wikipedia.org/wiki/Standard_ML](https://en.wikipedia.org/wiki/Standard_ML))
derivatives like it were made for that sort of thing and I cannot imagine
doing Spiral in a non-ML styled language.

~~~
brianberns
Thanks for the response. I always thought that the point of "making everything
symbolic" is that you can differentiate a symbolic expression and thus
implement backpropagation automatically. No? How do you handle this in Spiral?

~~~
abstractcontrol
Yes, it is, but there exist an alternative approach called automatic
differentiation which is more imperative and flexible in nature than symbolic
differentiation.

The idea is to keep a tape (you can think of it as a list) and then record all
the operations on it as you step forward through the program. Then at the end
you execute the operations backwards from the tape.

I take this approach in Spiral's ML library.

------
dejawu
As someone not particularly familiar with F#, could someone go into what the
author means by "first-class types"? Do they mean that data types themselves
can be treated as values (first-class citizens)?

~~~
abstractcontrol
Indeed. But F# cannot actually do this, and neither can even dependently typed
languages to the degree Spiral can. In Spiral types act much like values
meaning you can bind them to variables and you can reflect on them without
cost like with values, though if you tried something like `(type 1) + 2` you'd
just get a error during the last phase of compilation. It is rather useful and
without it, type inference would be impossible in Spiral.

------
ghotli
Probably the most mind bending idea I've had the pleasure of being exposed to
via wayward link clicking in quite some time. I didn't burn through the entire
thing but I went pretty far and the syntax primitives and the inlining by
default type guarantees seem like a natural extension of language design in
the history of programming languages.

My apologies for getting a bit philosophical here at the end, but this kind of
treatment is among my favorite things to encounter having had the privilege of
being born at the cusp of the information age. During the first 50 years of
the internet we get to casually encounter foundational new ways of expressing
software with increasingly more guarantees and expressability from the
languages themselves and reap the benefits of the plumbing and infrastructure
that inevitably arises from such. Rust's compile time memory tracking borrow
system comes to mind as another such set of compiler based guarantees that we
have now that we just didn't have when I was a kid.

tldr: Exciting times to be alive and clicking links on the internet.

~~~
ConcernedCoder
I'm with you, this should be my entry in: [Ask HN: 2018 Summer Reading
List?]([https://news.ycombinator.com/item?id=17513576](https://news.ycombinator.com/item?id=17513576))

------
akkartik
My ears pricked up at "The culprit for this is the heap allocation by default
dogma.."

[https://mastodon.social/@akkartik/100357841646414633](https://mastodon.social/@akkartik/100357841646414633)
(Mastodon doesn't have in-page anchors, but search for 'memory allocator'.)

~~~
chrismorgan

      > document.querySelectorAll('[id]').length
      0
    

That caught me by surprise. Especially when it has its class entry-center to
indicate “this is the actively focused message”, to not load to that point in
the page or at least have an ID to jump to there seems surprising.

(A pure CSS solution for centering it on the page with an anchor: make the
.entry-center `position: relative`, and give it a new child <div id=jump> with
CSS `position: absolute; top: calc(50% - 50vh)`. The downside is that if the
entire .entry-center is taller than the viewport, it will still be centered on
the page, rather than sitting at the top. I suspect that defect could be
solved if you used two nodes instead of just one, combining margin and
padding, since padding can’t be negative.)

------
weathererp
It is fascinating how much a single person can achieve. Since the author seems
to be aware of this thread, Marko, do you mind telling a little bit about your
background? This project does not seem like something that was done on the
side, and also not something that a non-expert in programming languages just
decides to have a go at.

------
slifin
I wish there was a language that had the syntax of ReasonML, built on top of
F# because the ecosystem of F# is much better

It's very easy to get lost in syntax when transferring from C like languages
like JavaScript to F#, some concessions (like Reason has) to semi colons and
braces might help those programmers (myself included) to feel at home

~~~
weberc2
This. I really like functional programming, and FP proponents keep telling me
syntax isn't a big deal, but I'm constantly stubbing my toe on it. I guess my
deferred-gratification mechanism isn't that strong, because I get bored
quickly of battling syntax when I could do something interesting in an
imperative language _now_.

------
moomin
Well, this is pretty interesting. I’ve been wondering when someone would
finally put optimisation semantics into the code.

~~~
weberc2
I think it's more interesting that they're in the type system. C has had
inline pragmas for decades now, and one might also consider memory allocation
semantics to be optimization semantics. Or more generally, one might consider
any low-level semantics to be optimization semantics. The neat thing about
Spiral isn't that it has 'optimization semantics', but rather that they
integrate neatly into the type system.

~~~
moomin
“Inline”, though, is a request, like a hint. It doesn’t _enforce_ it. And this
is my problem with most optimisations: you can’t guarantee they’ve happened.
Putting them in the type system is definitely the right way of doing this.

constexpr, however, does seem to be enforced.

------
blt
show a code sample before your whole manifesto

~~~
PeCaN
Frankly, to me the specifics of the syntax are not nearly as important as what
the language does.

This:

> Abstraction by heap allocation is a dead end. It works moderately well on
> the current generation of computers where CPU is still the dominant driver
> of computation.

> It cannot work for devices like GPUs and the rest coming down the line.

tells me way more about the language (and gets me much more interested) than a
"hello world" would.

I think syntax is by far the easiest part of a new language to learn, so it's
ultimately not that important.

EDIT: Regardless, it turns out it actually gives code samples like a page or
two down.

~~~
garmaine
I don't trust that it actually achieves those goals without seeing code. Show
an illustrative example of what you are trying to do and how it is
accomplished, then wax philosophic.

~~~
coldtea
What he's trying to do is not conveyed in example code.

That said, he gives plenty of examples.

------
Ono-Sendai
If you're not allowed function calls on a GPU, why not just inline everything?
Why introduce some funky keywords?

~~~
setzer22
You're allowed _some_ function calls. It just has a very short call stack.

------
ConcernedCoder
I think the real meat, is here: [https://github.com/mrakgr/The-Spiral-
Language#5-tensors-and-...](https://github.com/mrakgr/The-Spiral-
Language#5-tensors-and-structural-reflection)

~~~
pimlottc
That link doesn’t work on mobile since the readme is abbreviated. Here’s a
working one:

[https://github.com/mrakgr/The-Spiral-
Language/blob/master/re...](https://github.com/mrakgr/The-Spiral-
Language/blob/master/readme.md#5-tensors-and-structural-reflection)

------
eigenspace
> There is no need to consider how tensors are made in a language made for
> numeric computation like Julia.

Does the author know anything about julia or how cleanly it allows one to
implement things like tensors?

~~~
abstractcontrol
When I looked at Julia last time I do not think it had the capability to
partially apply tensors. In Spiral indexing into a 3d tensor would give you
back a 2d tensor. It might be possible to do this with function in Julia, but
that would depend on its inlining capabilities and unlike Spiral it does not
provide guarantees with regards to that.

Its type system is definitely less powerful than Spiral's - some of the things
you could express naturally in Spiral would require macros in Julia.

All in all, the language strikes me as a better Matlab which I think it
succeeds at, but Spiral is made to be a better F#. Some of the tradeoffs I had
to make in language design means that I could not succeed at this across the
board.

The fact that it has a more powerful type system makes it take an immediate
hit in terms of ergonomics, and the language is less pleasant to code in than
F# because the IDE is not providing immediate and constant feedback.

On the other hand for systems programming, I can't imagine what C (and even
C++) could possibly do better than it except maybe compile times. Despite
that, Spiral is also nothing ilke Rust - it does not try to check that
pointers are safe. It is rather a extremely expressive static language that
has no type annotations anywhere and looks like a dynamically typed functional
language.

~~~
StefanKarpinski
> When I looked at Julia last time I do not think it had the capability to
> partially apply tensors. In Spiral indexing into a 3d tensor would give you
> back a 2d tensor. It might be possible to do this with function in Julia,
> but that would depend on its inlining capabilities and unlike Spiral it does
> not provide guarantees with regards to that.

That's exactly how slicing a tensor works in Julia:

    
    
        julia> T = copy(reshape(1:2*3*4, 2, 3, 4));
    
        julia> summary(T)
        "2×3×4 Array{Int64,3}"
    
        julia> T[:,2,:]
        2×4 Array{Int64,2}:
         3   9  15  21
         4  10  16  22
    

Inlining seems irrelevant since the slicing is done eagerly. Unless you're
talking about lazy slicing, in which case you could do `f(T) = T[:,2,:]` and
your inlining/function comment makes more sense. If you use StaticArrays.jl
(the closest thing in Julia to Spiral's tensors) then the machine code for
this operation on a 2×3×4 Float64 tensor is just:

    
    
        vmovups	16(%rsi), %xmm0
        vmovups	64(%rsi), %xmm1
        vmovups	112(%rsi), %xmm2
        vmovups	160(%rsi), %xmm3
        vmovups	%xmm0, (%rdi)
        vmovups	%xmm1, 16(%rdi)
        vmovups	%xmm2, 32(%rdi)
        vmovups	%xmm3, 48(%rdi)
        movq	%rdi, %rax
        retq
    

> Its type system is definitely less powerful than Spiral's - some of the
> things you could express naturally in Spiral would require macros in Julia.

The macro comment bit is a bit perplexing since macros are about syntax and
are about as unrelated to the type system as it's possible to get. I'm unaware
of any way that the presence of absence of macros in a language can affect the
expressiveness of its type system.

I'd be very curious to hear what can be expressed in Sprial's type system that
cannot be in Julia's. Julia's type system is generally more expressive than
state-of-the-art dependent static type systems since it punts on type checking
entirely. This allows it to have, among other things, parametric types where
the parameters can be other types, integers, floats, symbols and tuples of
same (recursively).Parametric types even support fancy features like using an
earlier type parameters to constrain later type parameters, e.g. `X{T<:Number,
S<:Vector{<:T}}` — i.e. `T` is a possibly abstract number type and `S` is a
vector whose element type is a subtype of `T`.

The only major type system feature I'm aware of Julia lacking (aside from the
obvious omission of static type checking) is intersection types. There's been
some thought of adding those in order to support multiple inheritance among
other things, but it hasn't proven necessary yet.

~~~
abstractcontrol
I recall an example where Julia had to use macros in order to achieve loop
unrolling. I'll dig it up if you want. This is something can be done quite
naturally in Spiral due to its type system.

I find that people often say macros are about syntax, but what they really are
is some combination of a parser and a partial evaluator - in other words,
their main use seems to doing compile-time function evaluation.

Spiral does not have arbitrary CTFE, but can perform any functionally-pure,
non side-effecting computation in its type system plus memoization of function
calls.

One thing of practical interest that makes me doubt Julia's inlining
capabilities would be monads.

[https://github.com/pao/Monads.jl/blob/master/doc/index.rst](https://github.com/pao/Monads.jl/blob/master/doc/index.rst)

"Monads.jl provides a powerful, if relatively slow"

The key word here is slow. I haven't seen a single language as of yet apart
from Spiral that can inline all their overheads away.

They are pervasive in Haskell, and Haskell cannot do it. Scala wants to do it,
but it can't. F# and Ocaml cannot optimize their overheads away. If Julia
could then it would advertise it. This is something that is subject of
academic research in the field of partial evaluation.

Speaking as a functional programmer, as a language Julia feels barely
functional to me. It has first class functions, but it does not have pervasive
partial application and syntax that makes such a style comfortable like
functional languages tend to have. It feels more like an imperative language.

And speaking as a systems programmer, I know that without inlining guarantees
you cannot have optimized functional constructs. It is not just monads. By
tracking all the factions and their bodies to exactness and forgoing all heap
allocation unless explicitly stated, it becomes much easier to do language
interop. It is possible to make functions that capture variables from lexical
scope and then pass as arguments to a function that passes them through to the
GPU. Julia talks about speed, but barely talks about inlining so I am dubious
about its claims.

I could go on, but I think this is a large enough sample of my views on Julia.
If Julia wants me to use it, then it needs to speak more to my values as a
programmer.

~~~
eigenspace
I think that when you say

>There is no need to consider how tensors are made in a language made for
numeric computation like Julia.

and then proceed to compare to PyTorch instead of julia is incredibly
misleading and unfair to julia. I fail to see how anything you've brought up
here excuses that.

~~~
abstractcontrol
I might be persuaded to see things from your point of view if you can show
that Julia's tensors meet my criteria. I might be wrong about this - I do not
know much about Julia and it might very well be capable of this. But these are
the kinds of things that if it were capable of, it would advertise. If it can
do it I'll apologize and change the line to show Julia in a more positive
light.

The reason why I started talking about inlining of monads here is because
ultimately they are just functions, and Spiral's tensors are not part of the
language, but just functions as well. So if it cannot do monads then it
probably cannot do tensors to satisfaction either.

It was pointed out to me that tensor slicing could serve as substitute for
partial application. What I am wondering next is the degree of flexibility
Julia's tensors have in the context of GPU kernels. Let me illustrate this
with a short example. The following is how the inplace map Cuda kernel is made
in Spiral. It seems simple, but I'll go at it from the top to show the
characteristics and flexibility that Spiral's tensors have.

    
    
        met map' w f in out = 
            inl in, out = zip in, zip out
            assert (in.dim = out.dim) "The input and output dimensions must be equal."
            inl in = flatten in |> to_dev_tensor
            inl out = flatten out |> to_dev_tensor
            inl in_a :: () = in.dim
            
            inl blockDim = 128
            inl gridDim = min 64 (divup (s in_a) blockDim)
    
            w.run {
                blockDim gridDim
                kernel = cuda // Lexical scoping rocks.
                    grid_for {blockDim gridDim} .x in_a {body=inl {i} ->
                        inl out = out i
                        inl in = in i
                        out .set (f in.get out.get)
                        }
                }
    

The first line is `inl in, out = zip in, zip out`. Since the inputs to to the
map function can be arbitrary data structures, the `zip` iterates over it and
zips the tensors in it into a single one.

In Spiral the tensors have a tuple of arrays layout by default, and it is
absolutely trivial to switch between AOT and TOA representations. In fact you
can mix and match tensors that have both layouts internally. Zipping tensors
in Spiral just merges them into one with the TOA layout though the subtensors
could be AOT.

Dealing with these layout transformations is a significant issue in numerical
computation that Spiral solves completely. The reason why I have an impression
that Julia cannot do this is not because I know for sure. Rather I do not
think this can be done within the confines of a parametric type system. Rather
I think it would require an intensional type system like the Spiral has.

    
    
        inl in = flatten in |> to_dev_tensor
        inl out = flatten out |> to_dev_tensor
    

`flatten` checks that the tensor is contiguous and flattens it into a 1d
tensor. The `to_dev_tensor` is just some formality that I could not get rid
of. It takes the pointers out from the references holding them which allows
the tensors to be captured lexically before being passed into the kernel.
Without this a type error would occur.

    
    
        kernel = cuda // Lexical scoping rocks.
            grid_for {blockDim gridDim} .x in_a {body=inl {i} ->
                inl out = out i
                inl in = in i
                out .set (f in.get out.get)
                }
    

The actual kernel itself is quite small and trivial. `cuda` is just a parser
abbreviation for `inl {blockDim gridDim} ->`. In other words it is not a
special language feature, but a regular function.

When I was doing this in F#, I had to not pass in every single argument of the
function by hand - meaning the tensor pointers, their offsets, their sizes,
their dimension ranges for each and every tensor, I also had to make wrapper
classes on top of that. If a map operation needed more arguments, then I would
have needed to copy paste a kernel, adjust it and the wrapper and repeat the
testing process all over again, all this with very little type safety.

Spiral is not quite on the level of Futhark in terms of ergonomics, but it is
a vast improvement over what came before. Note that there is no need for a
type annotations anywhere - the function is as generic as it could possibly be
in a statically typed language. Because of inlining guarantees all of this has
zero overhead.

Compared to Spiral, where do Julia and its tensors stand on the axis of
generality? Would it be possible to write a simple map as simply as this?

~~~
KenoFischer
If you're interested in advanced layout transformations in Julia, I talked
about this a bit here:
[https://www.youtube.com/watch?v=uecdcADM3hY&t=1658s](https://www.youtube.com/watch?v=uecdcADM3hY&t=1658s)

As for your in place map example, here's how that's written in julia:
[https://github.com/JuliaGPU/CuArrays.jl/blob/98e32e055e96fe7...](https://github.com/JuliaGPU/CuArrays.jl/blob/98e32e055e96fe73f35b7c1acfa63859d549163b/src/utils.jl#L55-L58)

Of course you may say that's cheating because it shifts the problem to the
much more general broadcast machinery, but the implementation of that is not
particularly tricky either.
[https://github.com/JuliaGPU/CuArrays.jl/blob/23c4c737a31aff9...](https://github.com/JuliaGPU/CuArrays.jl/blob/23c4c737a31aff91b297f2994990b7ea01665069/src/broadcast.jl#L4-L24)

This code works with any multi-dimensional array, independent of layout
considerations, though you may of course want to specialize based on layout
for performance. Note also that this the code is slightly more general than
the one in your example, because it handles arrays that do not have efficient
linear indexing.

~~~
abstractcontrol
Sorry about not seeing your reply yesterday, HN is not as helpful as Reddit so
I have to scan for them manually which can be error prone. I just went through
your talk. It was definitely interesting. Despite my confidence in the high-
performance abstractions that Spiral allows, I do not have much experience in
doing high performance optimizations myself so such field reports motivate my
efforts.

So Julia can definitely do flexible data structure layouts, but looking at
your code I am starting to understand why it took an expert such as yourself
to show me this example. Had you not told me that the link is to a map kernel
I would have difficulty figuring it out on my own.

This shifts my objections from 'Julia cannot do it', to 'macro heavy code is
quite hard to grasp'. In Spiral you could have written all of this without the
need for macros. Spiral does have (text) macros, but their purpose is to do
language interop and not abstraction.

The use of macros is a typical anti-pattern in languages whose type systems
are too weak for the problem at hand - I am decently sure that with sufficient
macro magic any language that has them would have been capable of performing
these same optimizations.

One immediate benefit of a powerful type system that I'd like to point out is
that Spiral allows reflection over tuples, which is a lot more elegant
solution to a problem of a function having variable arguments than the C-style
varargs mechanism that I see Julia using in the examples you've shown me.
Also, in Spiral reflection is done using pattern matching rather than if
statements.

Another benefit is that Spiral has no need for separate static array
machinery. Because literals can be a part of a variable's type and if not,
unless explicitly prevented the literals tend to be propagated through
function call boundaries (join points). Assuming they are known at compile
time the inlining of tensor dimensions in loops is actually the default
behavior in Spiral. I've yet to do benchmarking to see how helpful that is,
but I am sure it would help reduce register pressure.

Your examples are not quite enough to get me to apologize for my rude
behavior, but are enough to get me to shift my views a little so I'll change
the offending sentence so it reflects reality more accurately.

~~~
simondanisch
On Julia 0.7, the broadcast kernel will look much more like this (and actually
the version used in CuArrays/CLArrays currently also doesn't use those
macros):

[https://github.com/jrevels/MixedModeBroadcastAD.jl/blob/mast...](https://github.com/jrevels/MixedModeBroadcastAD.jl/blob/master/src/MixedModeBroadcastAD.jl#L80)

let I = @cuda_index(shape)

    
    
        @inbounds dest[I] = bc[I]

end

There are several things to point out here:

1) The macro @cuda_index doesn't really need to be a macro, but like this it's
the shortest way to check the index and return when out of range (in
CUDA/OpenCL you often need to schedule a loop with more iterations than you
need).

2) the `dest[I] = bc[I]` does all the unzipping of arguments and dispatch to
the correct fast getindex for a certain layout of dest/bc via the type system
+ tuples of the arguments (not using C-style varargs). This simple function
makes it possible to get all the advantages of julia's lazy fused broadcasting
to work.

Feel free to checkout this great blog article to see what's possible with the
new lazy broadcasting:

[https://julialang.org/blog/2018/05/extensible-broadcast-
fusi...](https://julialang.org/blog/2018/05/extensible-broadcast-fusion)

You can also look directly at the broadcasting code in Julia Base, which is
already a bit bloated to deal with lots of edge cases, but is still pretty
readable in parts:

[https://github.com/JuliaLang/julia/blob/master/base/broadcas...](https://github.com/JuliaLang/julia/blob/master/base/broadcast.jl#L539)

~~~
abstractcontrol
I like how the broadcasting comes out in the end in Julia. Besides that Julia
has quite many niceties that I could only wish Spiral had and I do think that
they matter significantly even if they are not vital. This is definitely
praiseworthy work.

Since I haven't done so and I really should have, if you or anyone else still
looking at this thread are curious how GPU programming without macros looks
like then take a peek at this:

[https://github.com/mrakgr/The-Spiral-
Language/blob/3a94ca644...](https://github.com/mrakgr/The-Spiral-
Language/blob/3a94ca644160310cb494c92ebf8c6a5ed9836f5f/Learning/CudaModule.fs#L1289)

Comparing Spiral and Julia code that has been linked so far, I feel that no
one would ever guess without knowing ahead of time that Spiral is a static
language and Julia dynamic based on these examples.

As an example of this in action, here is how layer normalization is
implemented using the 'seq_broadcast' kernel which does a sequence of
reductions and broadcasts in registers. It is also used to implement softmax
for example.

[https://github.com/mrakgr/The-Spiral-
Language/blob/3a94ca644...](https://github.com/mrakgr/The-Spiral-
Language/blob/3a94ca644160310cb494c92ebf8c6a5ed9836f5f/Learning/LearningModule.fs#L171)

~~~
simondanisch
>GPU programming without macros looks like then take a peek at this:

I'm not sure if you missed my examples, or if there is some other
missunderstanding going on ;)

GPU code in Julia works 100% without macros, but you are free to use them, so
people do ;) Also, you can pretty much write the gpu kernels like pseydo code,
not sure how much simpler it can get.

With your linked spiral code, I can't even really tell where the algorithm
starts and where the setup code ends - which of course might easily be
attributed to my unfamiliarity with Spiral ;)

I wrote an article some time ago about generic programming with Julia's gpu
infrastructure:

[https://techburst.io/writing-extendable-and-hardware-
agnosti...](https://techburst.io/writing-extendable-and-hardware-agnostic-gpu-
libraries-b21c145a8dad)

If you can make concrete examples of how things can be simpler than this, I'd
be delighted to hear them :)

Cuda only (for no reason actually) and with lots of macros, here is another
article about gpu programing in Julia:
[https://mikeinnes.github.io/2017/08/24/cudanative.html](https://mikeinnes.github.io/2017/08/24/cudanative.html)

Do you have any benchmarks for the softmax kernel? If that kernel has optimal
performance, it would be quite interesting. If it's sup par, it looks much
longer than a simple version.

~~~
abstractcontrol
> With your linked spiral code, I can't even really tell where the algorithm
> starts and where the setup code ends - which of course might easily be
> attributed to my unfamiliarity with Spiral ;)

Heh, I've noticed that the setup code tends to come out longer than the actual
kernels. It is inside `kernel = cuda`. Spiral is indentation sensitive like
Python and F#.

The stuff inside {} is just module creation, think of it like tuples with
named fields.

> Do you have any benchmarks for the softmax kernel? If that kernel has
> optimal performance, it would be quite interesting. If it's sup par, it
> looks much longer than a simple version.

No, I've yet to actually benchmark it. It really depends on how good of a job
NVCC does with the generic sequential reduce kernel. I'll do an in depth
analysis when I am done with all the neural network work that I am doing
currently which might take a while.

[https://github.com/mrakgr/The-Spiral-
Language/blob/7ecd30bdf...](https://github.com/mrakgr/The-Spiral-
Language/blob/7ecd30bdf36e8f59dd89917c64a05794e2b5def1/Learning/LearningTests.fs#L431)

You can see how it is implement here for the forward and backward parts.

[https://github.com/mrakgr/The-Spiral-
Language/blob/7ecd30bdf...](https://github.com/mrakgr/The-Spiral-
Language/blob/7ecd30bdf36e8f59dd89917c64a05794e2b5def1/Learning/LearningModule.fs#L441)

In the actual cost function, it is a bit different since I fuse the forward
and the backward parts.

It is a bit ugly because of all the type checking for whether the argument is
a dual, but that is all done at compile time.

>
> [https://mikeinnes.github.io/2017/08/24/cudanative.html](https://mikeinnes.github.io/2017/08/24/cudanative.html)

This is actually loop unrolling example that I had in mind when I said that
Julia needs macros for.

[https://github.com/mrakgr/The-Spiral-Language#3-loops-and-
ar...](https://github.com/mrakgr/The-Spiral-Language#3-loops-and-arrays)

You can in fact get loop unrolling with just standard functions in Spiral. I
go into it in the context of this chapter. It is achieved by pattern matching
over tuples and recursion.

> If you can make concrete examples of how things can be simpler than this,
> I'd be delighted to hear them :)

Yes, by replacing meta programming with intensional polymorphism and
inlinining guarantees. Also reflection should be done using pattern matching.
I think this last one could be taken entirely seriously as it would not
involve a replacement of the entire type system.

All the claims in the article about inlining and specialization that make it
sound like magic is what in general makes me dubious about languages
pretending to be speed kings. Yes, I am aware that GPU kernels do not require
optimizers capable of having monads for breakfast and that in the context of
GPU programming where they were made they are probably true, but inlining is
the sort of thing that matters more the more high level a language is. For
very high level languages that desire speed, there isn't much choice but to
make them a part of language semantics despite the added burden it puts on the
user.

Since we are still at it, I have a question I need to ask.

Recently I've been informed that Julia is capable of GCing GPU memory. If this
is fully integrated that would be a major feature which is not possible in say
.NET or Racket. I really wanted this in Spiral and could not get it in .NET.

By fully integrated, I mean much like for regular memory for which the GC
takes note of the state of the system for when to do collection and
defragmentation. If it can only make a thin wrapper with a finalizer (much
like in .NET) around an unmanaged resource then it is not a big deal.

Is it fully integrated or is it a wrapper style memory management?

If it is the later, then that is too bad, but I'd suggest to Julia devs that
they work on making it fully integrated as it would be a really good feature.
Obviously, I can't do it in Spiral as I would need to write my own VM and I
have only so many years in my life.

~~~
simondanisch
>You can in fact get loop unrolling with just standard functions in Spiral.

We can do this too actually - and spirals approach sounds very similar to what
you'd do in julia :)

[https://gist.github.com/SimonDanisch/812e01d7183681ed414932c...](https://gist.github.com/SimonDanisch/812e01d7183681ed414932c8f7b533d0#file-
unroll-jl)

As a matter of fact, it's what the compiler devs keep recomending. As a
developper I must say, that it's pretty relaxing to take a meta programming
shortcut from time to time ;) In general, we plan to offer a tool box that
employs these kind of patterns for loop unrolling, tiling etc, to make it
easier to write GPU code without meta programming and perform certain
optimizations/scheduling patterns based on a more trait like system.

> Also reflection should be done using pattern matching

I feel like what Julia does is just more low level right now - And you get
pretty far with just multiple dispatch! If we needed more than that, we could
put the effort into extending the language into that direction. But I think
your use cases must get be pretty advanced untill you need something more
complex. Can you recommend a nice article that shows off beautiful pattern
matching for reflection? I see how it sounds nice, but I can't really imagine
right now how it would improve my life.

Concerning inlining, I do think we actually force codegen to always inline
when compiling for the GPU. I don't remember a 100% anymore, but there might
have been some problems with that. But it's definitely the goal, to have a
flexible compiler that you can fine tune to specific domains like GPU
programming. And if we don't get it as part of the compiler, we can definitely
do things like that with
[https://github.com/jrevels/Cassette.jl/](https://github.com/jrevels/Cassette.jl/)

Appart from that, I'm pretty happy that our GPU compilation infrastructure
isn't making Julia less useful as a general purpose language ;)

>Is it fully integrated or is it a wrapper style memory management?

It's wrapper style. We're playing around with different kind of hacks to make
it less worse, but those are hacks you could do in any GC language. The good
news is, that the Julia compiler devs take GPU computing seriously and
promised to work on a full integration that is aware of the GPU hardware.

~~~
abstractcontrol
> As a matter of fact, it's what the compiler devs keep recomending.

I definitely agree with them in this matter. Macros might have a role in
language development, but they should not be a stand in for compiler
optimizations. For safety and speed, the type system should be there.

[https://github.com/mrakgr/The-Spiral-
Language/blob/0741959ac...](https://github.com/mrakgr/The-Spiral-
Language/blob/0741959acdb9c97ad8c0da2e9712e44b8a26c34b/Testing/run.fs#L17)

Here is how the example you've shown could be done in Spiral. Maybe I should
add express support for literal testing in pattern matching, but it is not a
pattern that comes up too often by itself.

> I feel like what Julia does is just more low level right now - And you get
> pretty far with just multiple dispatch!

What you say is exactly right as pattern matching compiles down to those low
level operations, so there no reason at all why those low level operations
should be done by hand. Pattern matching does not depend on a particular type
system or whether the language is dynamic or static. There are only so many
good ideas in programming languages and this is one of them.

Though there is some overlap between pattern matching and multiple dispatch,
the roles are different. The purpose of multiple dispatch is extensibility,
but the purpose of pattern matching is destructuring.

[https://stackoverflow.com/questions/2502354/what-is-
pattern-...](https://stackoverflow.com/questions/2502354/what-is-pattern-
matching-in-functional-languages)

Note the great disparity in the F# examples that do pattern matching and the
C# examples that do manual reflection.

[https://github.com/mrakgr/The-Spiral-
Language/blob/0741959ac...](https://github.com/mrakgr/The-Spiral-
Language/blob/0741959acdb9c97ad8c0da2e9712e44b8a26c34b/The%20Spiral%20Language/SpiralTests.fs#L542)

Here are a few very simple examples of it in action in Spiral. I use them as
compiler tests.

[https://github.com/mrakgr/The-Spiral-
Language/blob/0741959ac...](https://github.com/mrakgr/The-Spiral-
Language/blob/0741959acdb9c97ad8c0da2e9712e44b8a26c34b/Learning/CudaModule.fs#L1754)

Here is quite a complex example of how it is used in action. I won't go into
detail of this here, but you can see how I repeatedly match on the contents of
a module at different times in order to get more generic functionality for the
kernel.

The particular kernel shown here is the most complex one that exists in the
library right now - I am yet to get to things like generic matrix
multiplication and convolution, but I'll get there eventually.

> I see how it sounds nice, but I can't really imagine right now how it would
> improve my life.

Let me just say that it is really difficult to know ahead of time how a
particular language feature would affect your programming life. I could have
said the same thing about first class functions back in 2015. I am sure in the
future there will be such features I can't even imagine right now.

> And if we don't get it as part of the compiler, we can definitely do things
> like that with
> [https://github.com/jrevels/Cassette.jl/](https://github.com/jrevels/Cassette.jl/)

I'll have to watch the talk on this. Thanks for the link.

~~~
simondanisch
>The purpose of multiple dispatch is extensibility, but the purpose of pattern
matching is destructuring.

Agreed! ;) Just wanted to point out, that I was able to use multiple dispatch
elegantly in the unroll example, where you are using pattern matching.

>pattern matching compiles down to those low level operations

So I guess that means Julia could indeed satisfy you, if we gave those low
level operations some more syntactic sugar? Have you seen:
[http://kmsquire.github.io/Match.jl/latest/](http://kmsquire.github.io/Match.jl/latest/)
?

>Let me just say that it is really difficult to know ahead of time how a
particular language feature would affect your programming life

True story. I'll try to be more observant when writing code and see, if I
could actually solve things more elegantly with better pattern matching.

Btw, I'm writing on an article about GPU programming in Julia - do you have
feature you'd like to have explained, or a killer feature you believe we can't
do and that would impress you if we did?

------
rurban

        inl add a b = a + b
        inl f = add 1
        f 2, f 3
    

is just awesome

~~~
skrebbel
That's just normal partial function application, right? Or am I missing
something special?

~~~
rurban
Yes, but I adore the simple syntax.

    
    
        inl add a b = a + b
    

is basically a macro #define add(a,b) a+b

    
    
        inl f = add 1
    

is a macro #define f add(1)

    
    
        f 2, f 3
    

but e.g.

    
    
        inl f = add c
    

would create a closure to capture c. inl would guarantee that the capturing is
lexical, not dynamic.

There's no need for the comma operator in args, comma is to create lists.

~~~
rurban
oops: inl f(x) = add(1,x) of course

------
arminiusreturns
I'll admit this caused the first non-gaming related slight twinge of nostalgia
for the windows ecosystem I've had in a long time since I went gnu/linux only.
Good on the author.

------
childintime
Isn't this language a VHDL, or what a VHDL should/could be?

If so, please, someone present the language from this perspective, preferably
with an example.

------
dangoljames
Forth reborn?

~~~
whitexn--g28h
It really reminds me of forth. The only forth program I’ve interacted with is
the BIOS prompt for the olpc laptop which used forth to bootstrap the cpu from
the gpu.

~~~
jacobush
When he said shallow call stack, I immediately thought of forth

------
whatshisface
Will there be any chance of a SPIRV backend becoming avalible any time soon?

~~~
abstractcontrol
Author here. The language was made for Cuda programming and until something
comes along to displace it, I have no plans of moving from it.

That having said, the language is quite small and it would be possible to
create a new backend in about 500 lines of code so if you really want it, you
could probably do it yourself. I'd help you integrate it into the language. I
know nothing about SPIRV at the moment though.

