
Ask HN: How to design an array lang (ex: APL or J) LLVM compiler? - throwaway7645
Are the standard compiler books all still relevant here? Anything else one should know?
======
mamcx
I'm in a similar path, but with a relational language. However I have find
have a lot in common with arrays, in fact, I think make sense, similar to
kdb+, to model relations as columnar tables.

You can read about J here:

[https://sblom.github.io/openj-core/ioj.htm](https://sblom.github.io/openj-
core/ioj.htm)

(From my long list of resources:
[https://gist.githubusercontent.com/mamcx/e1743571b9a1ea163a7...](https://gist.githubusercontent.com/mamcx/e1743571b9a1ea163a7f9f07aa6a8ae2/raw/cf89dd40a6d98c185cf6896600d8b1f77a9acb76/Lang.mkdown))

The main gist is understand that you can generaliza arrays and scalars in one
single structure (aka a array!)

A scalar is a array of 1. Others list are of N. You tag the structure with the
type alike:

    
    
       [Type; N, N2, N3...]
    

and then you can do things like:

    
    
       for i in 1:
    

because is totally legal here!

I expand the idea to say that a relation is a multiple-array (aka: Columns),
and array is a relation of 1 column, a scalar is a relation of 1 column and 1
value.

This mean is potentially easy to exploit SIMD and stuff like that, but I
haven't explored that.

~~~
throwaway7645
Thanks for the help! Does your project have a public presence?

~~~
mamcx
No yet.

If interested, we can share ideas?

My web with contact details:
[http://elmalabarista.com](http://elmalabarista.com)

------
mhh__
The standard books (e.g. Engineering a compiler, or Appel's stuff) would be
fine.

I've never implemented a array language myself, but one way of doing this
while minimizing cognitive overhead would be to lower it to more C-like IRs
(Similar to how Haskell is compiled by GHC). The advantage here, would be that
by the time the AST has been lowered into these IRs translating directly to
LLVM IR would be mentally trivial. Whether this is worth doing or not, I have
no idea.

I'd have a look at some LLVM APL compilers and see how they do it.

~~~
throwaway7645
I don't think there are any LLVM APL compilers or I would definitely look. I
think both Dyalog, J, and K use regular C as the VM. Arthur Whitney is said to
have written the first K compiler in only a few (like 3-5) pages of C. Do you
think it'd be easier to start with using a custom C VM over LLVM? I have a
little C experience and no LLVM, but I see it being used for a lot these days
(Julia, D, Rust, Swift, Crystal...etc).

~~~
mhh__
Are you implying writing one's own JIT? This would involve either writing a
new LLVM backend or implementing the same optimizations. Easier to start with,
yes: This doesn't really justify the design decision. (In short: Easier !=
Better)

You could compile to C, then use Clang or GCC to compile to a final
executable: This should, if possible, be avoided as it limits what you can (C
is not a good IR for a compiler). I suspect this is what you meant by "use
regular C as the VM".

As these things go, LLVM is fairly sane/clean. The point of using LLVM would
be to produce an optimized executable, rather than bytecode to be run on a VM.

Also, LLVM is C++ (Complex C++, in fact). If you don't already, I thoroughly
recommend reading about optimizing compilers work (It's either very useful or
fascinating, depending on whether you're the kind of person to find this stuff
interesting). Using LLVM is, after getting over the C++ build times, actually
quite simple (Take a look at
[http://llvm.org/docs/tutorial/](http://llvm.org/docs/tutorial/))

~~~
throwaway7645
Thanks for the explanations and advice. APL isn't as fast as I'd like, so I
thought LLVM would be nice, but then you lose REPL functionality I believe
which is vital for this kind of language. Julia JITs every expression to LLVM
(I think)which is how it has a REPL, so that seems like a good choice to get
said REPL. So maybe just the Julia route would be smart. It appears to be
taking off in the areas that APL, J, Matlab, Numpy, Octave, Mathematica have
typically reigned supreme in.

~~~
mhh__
If the compiler is fast enough, then one could make a REPL out of it (by
modifying it slightly). LLVM also has interpreters. The trick would be finding
a consistent way of representing high level state in the REPL (You can't just
lob bits of LLVM IR together), unless you can manage without (of course).

------
tlack
Even though you'll be outputting LLVM as the intermediate representation, I
think you'll still have to think a lot about the design of the internal
structures and the operation of the "virtual machine" so to speak (although
there isn't one in the traditional sense). You'll still need to think about
how functions interact, what anonymous functions/lambdas boil down to, etc.

To that end, it may be interesting to study the source of J, Klong, Kona, etc.
There are a lot of older papers about representation and performance in APL as
well.

~~~
throwaway7645
Good idea, I started reading Hui's book on J's internals before. Maybe it's
time to take another look.

------
geocar
Why LLVM?

Aaron's Co-dfns[1] is an APL compiler that compiles to C++ and there's some
advantage to doing this.

[1]: [https://github.com/arcfide/Co-dfns](https://github.com/arcfide/Co-dfns)

~~~
throwaway7645
I think Hsu's Co-dfn compiler compiles to GPU for running very high
performance code if you use the parallel co-dfn paradigm. I'm not sure if you
have to just use GPUs or if it gives you a native binary you can use for other
purposes. The other problem is you have to have Dyalog APL, which ain't cheap.
I think a new array language that has a REPL that JITs to LLVM (good for
prototyping, debugging, and testing) and AOT compilation to LLVM for scripts
would be amazing for sharing code and deployment.

~~~
geocar
> I think Hsu's Co-dfn compiler compiles to GPU ... [I'm not sure] if it gives
> you a native binary you can use for other purposes.

It compiles to C++.

> The other problem is you have to have Dyalog APL, which ain't cheap.

Good isn't.

However, you can download Dyalog APL for free and use it for noncommercial
purposes, including _trying it out_ and _personally_ learning how to build an
APL compiler.

I think this is pretty clear from the website.

> I think a new array language that has a REPL that JITs to LLVM (good for
> prototyping, debugging, and testing) and AOT compilation to LLVM for scripts
> would be amazing for sharing code and deployment.

Why do you think that?

I don't see how LLVM helps with any of that. The hard part of writing an APL
compiler is parsing and analysis, and that means finding an algebraic morphism
between APL and CPUs (or GPUs) that you can do meaningful things with. It
isn't making executables.

------
sitkack
Lots of good advice, I'd start small and work on end-to-end examples.

[http://www.compileroptimizations.com/category/loop_fusion.ht...](http://www.compileroptimizations.com/category/loop_fusion.htm)

[https://www.google.com/search?q=array+language+loop+fusion+f...](https://www.google.com/search?q=array+language+loop+fusion+filetype:pdf)

~~~
throwaway7645
Thank you! Will do!

~~~
sitkack
I should have mentioned the Programming Language Zoo [1]. Beware of the LLVM
bog of dead compiler writers.

[http://plzoo.andrej.com/](http://plzoo.andrej.com/)

