
AMA: Explaining my 750 line compiler+runtime designed to GPU self-host APL - arcfide
https://www.youtube.com/watch?v=gcUWTa16Jc0
======
dang
This discussion originated here:
[https://news.ycombinator.com/item?id=13565743](https://news.ycombinator.com/item?id=13565743)

...and the live stream offered there happened here:
[https://news.ycombinator.com/item?id=13638086](https://news.ycombinator.com/item?id=13638086).

We asked the author to do one more thread about it because the latter didn't
get much attention, and we thought you might find it interesting. It's rare
for work this technically sophisticated to also be this deeply
counterintuitive.

~~~
arcfide
Thanks for giving me a chance to chat a bit more!

~~~
FullyFunctional
Thanks for sharing all these insights, it's definitely an eye opener, both
regarding APL as well as conventional wisdom.

------
losvedir
Heh, as I tweeted a couple years ago[0]:

> Ah, APL/J/K. Time for my annual crisis of thinking everything I've ever
> learned about programming is wrong...

Seriously, as I'm watching this video it's equal parts fascinating, confusing,
and frightening.

The world of these array languages is so foreign to my day-to-day web
development work that I'm not sure what to make of it. I'm impressed that you
can accomplish so much in so little space, but I have a few questions as it
relates to how I work:

1\. How understandable is the code if you come back to it months later? Is it
"write once" or are you able to pick it up again easily?

2\. Have you worked on this or other code with collaborators? How well can
someone else pick up the code and start contributing?

3\. It seems APL is well-suited for GPU programming like you're doing here,
but do you think it's more broadly applicable? Do you think _all_ software
should be written with it, including say web development and the like?

[0][https://twitter.com/losvedir/status/636034419359289344](https://twitter.com/losvedir/status/636034419359289344)

~~~
arcfide
Thanks for the questions!

1\. I've addressed this a bit more in the previous threads on this video and
the discussion that led up to the video. It's actually less write once than
other stuff. However, to go with that, it's also more disposable. That means
that basically, I have a low tolerance for code not continuing to make sense
to me. So if I come back to a piece of code and it doesn't make sense to me, I
just throw it out, because it was clearly wrong if I can't make sense of it.

I find it's quite natural to read through the code at this point. Even if I
haven't thought about a particular compiler pass for a while, I just start at
the beginning, and in a few seconds it's easy to see the structure and overall
approach, then I go deeper and see what's actually being done. It's
exceptionally easy to work with. Part of this is also because I don't have to
jump all over the place to know what the code is doing. If I forgot what a
helper does, I just glance a half page up and read the definition of the
helper, and I know what's going on then. I don't have to keep a documentation
page open to remind myself of the APIs, or the intentions or the like, because
all call sites are visible, and the definition of the code is all there for me
to see. A big part of this is relying on idioms rather than black box
abstractions, patterns instead of information hiding.

2\. Yes, I work with a number of other people who have varying levels of
responsibility for the code. A good deal of them only care about the high-
level concerns, and so they either use the compiler and don't read the code,
or they make little tweaks here and their to suit their needs. They don't
require a full understanding. A few others do work through the entire code
base and we use the codebase as our single source of information when
discussing design decisions and everything else. We don't have any need for
supporting docs because it's too easy to work with the code to make other
design docs useful when discussing architectural elements.

As for picking it up and contributing...that depends. There's a lot of
background information you need to know. That's just the reality. The code
itself isn't hard, but it is relying on a host of shared understanding. This
includes mainly the Nanopass style of compiler design and construction, as
well as a firm understanding of array programming, together with a firm
understanding of the fundamental idiomatic algorithms being used in the
compiler, which are the main "new thing" for people to understand after they
understand how to build a compiler and how to write array programs.

Unfortunately, most schools teach neither Nanopass style compiler design, nor
array programming. So, the completely uninitiated have a few things to learn.
The code itself, though, is no worse than expecting someone to understand C
semantics and OOP Tree oriented programming patterns in a monolithic compiler
style when working with something like Clang, or tree traversal in C over C
when working in a simple compiler like PCC.

3\. Actually, looks can be deceiving. I'm doing GPU programming with APL, but
over a domain that is considered far outside of the standard APL comfort zone.
In fact, a key research contribution of this compiler is that I have developed
algorithms and a holistic set of techniques for doing tree/compiler
transformations on a GPU, which was previously a very open problem. This
compiler/domain would have been considered one of the quintessential examples
of a domain that is not well suited to APL. In fact, that's not true, it's
just not suited to APL and the traditional way of thinking about this domain.

As for things like web development and the like, you can see MiServer, which
is Dyalog's Open Source Web framework. There are some Dyalog User Meeting
discussions about the framework, and tutorials, documentation, and the like to
get you through it.

APL isn't a perfect language, but there are very few languages pushing the
boundaries of human-computer interaction with language in the way that APL
does, and I think this is sad.

I think in some ways we might be _forced_ to write programs more like APL in
the future, because parallelism is simply that important to the future of
computing. I'm also doing research into the early stages of learning computer
science, and some initial research suggests that it may in fact be better to
teach students parallel programming in an APL style first, and later
transition to the more traditional stuff, rather than doing it the opposite
way, which so far as just convinced people that they shouldn't write parallel
programs at all. In contrast, APLers prefer parallel programming and regular
transform code from sequential to parallel and back again. They tend to be
comfortable with both modes of thinking, which cannot be said for most other
languages in common use. Agent/concurrency heavy languages might be an
exception to this general rule, but they also get a bad rap.

Finally, I'd like to make a small note about bringing collaborators onboard.
Many collaborators that I've had have been either inexperienced and were
learning for some other reason or another, or were experienced and had one
specific thing that they wanted to fix. What often happens is that the most
inexperienced spend their time, as they should, learning important basics, and
don't ever work on the compiler proper. I think this is the way it should be
there. The more experienced people, I often find that I fix their problem that
they wanted fixed in the compiler as I'm explaining the compiler to them.
Because of the way the compiler is designed, I usually find myself saying,
"So, to do this, you would have to make this change here [edit] and this here
[edit] and ...oh wait, that's it, okay, it's done for you. Enjoy the patch!"

~~~
joe_the_user
Thanks you very much for answering questions here.

I have a question about your "key operator" \- it sounds similar to an idea
I've played with - a way to run heterogeneous operations on GPUs kernel by
running a very small bytecode interpreter on each kernel (Something similar to
what's outlined in this here[1]).

Does that seem at all related to you?

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.321...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.321.6255&rep=rep1&type=pdf)

~~~
arcfide
That's an interesting idea, and I can see there being some overlap, but the
Key operator is actually not my invention, but is a member of a class of
operators sometimes called classifiers. You can think about the GROUP BY
operator in SQL. Or Henglein's work on discriminators. Key is an operator that
is relatively new in Dyalog that came from J and some other APLs, I think. I
have to go back to my paper to remember.

[http://dl.acm.org/citation.cfm?id=2935331](http://dl.acm.org/citation.cfm?id=2935331)

~~~
joe_the_user
I'm scanning the documents.

Basically, group nodes and map/operate on them.

It seem like that would be fairly simple if you are dynamically creating
kernels.

\-- Edit: Which implies you're doing a pretty "bare metal" interface to the
GPU which still lets you do that and there's interesting story there, if I'd
not totally off-base.

~~~
arcfide
I try to avoid going bare metal as much as I can because of the difficulty
involved with working with GPUs at that layer. I try to focus on higher level
concerns, and use the excellent low level work of people such as the ArrayFire
developers to handle the low level stuff. I still have to understand it and
work with it at some level sometimes, but for the most part I stick to my core
abstractions.

~~~
joe_the_user
So ArrayFire allows the creation of kernels/the calling of functions in
parallel/ on the fly?

~~~
arcfide
In some limited, restricted cases, yes.

------
jstimpfle
The conciseness is fascinating. One of the common criticisms of combinator
style programming is the difficulty of making good error messages, or
generally making small fixes in the pipeline that can't be expressed using the
abstraction (i.e. the current compositional context). A concrete example I
have is in Haskell when you develop pure code and you can't easily debug with
a few print statements (more so since it's a lazy language).

I think this criticism definitely applies in the real world of ever-changing
programs without a very clear scope. I assume your focus is more on the
academic / "poetic" side of things, but nevertheless: How did such concerns
influence the development of your compiler? How usable is the compiler from a
dfns programmer's perspective?

And since APL is a very special beast: How transferrable is your style to
writing a compiler for a more conventional language (for instance,
procedural), especially ones with irregularities and such?

(I apologize if these questions are covered in the video. Haven't finished it
yet)

~~~
arcfide
I cover how I debug my program in the video. I use two utilities, pp and px
that are identity functions that print their results as side-effects. I'm
happy that I'm working in a non-lazy language.

This compiler is poetically driven, but it is still a commercially funded and
commercially marketed product. It's just that in the APL community, you can
pursue both and have them be compatible with one another.

In particular, the desire to make the code as "disposable" as possible has
lead to me using strictly combinatory/points-free style programming in the
core compiler. It makes it very easy to change things because it's easy to see
the whole picture and easier for me to delete any given piece of code. Many
"fixes" in my code are single character APL fixes, or maybe two character APL
fixes.

One reason for not being able to stick to the same core abstraction throughout
your whole code is that you may have an inadequate core abstraction. Part of
the design goal in the compiler, which I am trying to point out in the video,
is that the design is meant to keep you in the same "context" or abstraction
level throughout, so making a change will always fit the current compositional
context. I do this by focusing on simplification as much as possible. I also
keep the code as stupid as I can get away with.

This whole discussion got started because of the amount of change that has
happened with this code base (hundreds of thousands of lines). You can read
more about that in the other historical threads referenced by dang. I've
rewritten the compiler at least five times.

This need to make lots of changes all the time and adapt means that I actually
push more towards the points-free, not away from it. By sort of diving head
first into it, I can gain the benefits, but lose the disadvantages that happen
in the "middle ground" where you are jumping back and forth between points-
free and not-points free.

The complexity of this compiler is about the same as what you would normally
encounter if you were going to write a core compiler for Scheme (not the
syntax expander). It's perfectly suitable for procedural programming
languages, or any other, but obviously you'd need to tweak things. One of the
general strategies in the compiler is to eliminate as much irregularity as
early as you can get away with. A lot of it depends on how neatly your
underlying IR/ASM language that you're targeting matches the semantic needs of
the language you're compiling. For example, I don't demonstrate how to do tail
call optimization here, but if you are familiar with a Kent Dybvig style tail
call optimization pass such as found in Nanopass compiler examples, you could
adapt those ideas to this data-parallel style.

Basically, if you already know how to write a Nanopass compiler, the
transition is easier. If you are used to writing monolithic compilers such as
the clang front end or PCC, it will be harder, since you have to absorb the
Nanopass style of thinking.

As for how usable the compiler is, that depends on what you mean. If you mean,
is the compiler usable right now as a product, then it covers most of the
bases for functional array programming, but if you see the LIMITATIONS file,
you'll note a few things that it's missing. Ironically, things like if
statements and a lack of recursion are important, but much less so to a dfns
programmer than having the right primitives already available. As a dfns
programmer myself, I almost never use recursion, for instance, though it's
good to have it when you want it.

The next few releases of the compiler are focusing on eliminating some of
these last issues of usability, but for the most part the compiler is already
completely functional.

As for whether a dfns programmer could readily get working on the code base,
the answer is yes and no. They would have no problem reading the code, as it's
dead simple to a dfns programmer. The overall structure would also be easy to
see. For them, the trick would be understanding the "big picture" of how a
compiler works. Most dfns programmers are not also compiler writers.
Understanding the Nanopass strategy of compilation, and then understanding why
a given pass is there in the compiler, and mapping a given set of array
operations (which they would easily understand) to the domain space of
"compiler operations" would be the struggle. Sort of the reverse. The compiler
writer would struggle to understand how compiler operations map to array
operations, and the array programmer would struggle to perceive the larger
meaning of the array operations in terms of traditional compiler concepts.
Printing out the AST throughout the whole compiler (easy to do by editing tt)
goes a long way towards aiding that intuition.

Once the core idioms used in the compiler are understood, the rest is pretty
straightforward.

~~~
tigershark
I wish that your posts could be as synthetic as your code and your code as
expressive as your posts... that would open an universe for me and probably
for a lot of people.

~~~
arcfide
Personally, I wouldn't want to edit my posts on a regular basis. I consider
them too verbose, but I struggle to condense them.

------
arcfide
Thanks everyone for the great discussions and interest. I'm going to get back
to this thread later, but won't be actively commenting for the rest of the
day. Please feel free to continue discussion and continue asking questions and
I'll get to them when/if I can.

Also, feel free to contact me directly to discuss anything at a higher
bandwidth level. And as always, see the GitHub repository for more information
on the project and ways that you can contribute (help out, fund the project,
contribute code, bug reports, or patches).

[https://github.com/arcfide/Co-dfns](https://github.com/arcfide/Co-dfns)

If I can "shill" a little bit, you can fund the Open Source/Open Work aspects
of this project through micro-funding at Gratipay, which helps to support and
keep the Open Source and community aspects of the project going. Dyalog is a
great company, but ultimately, as some have mentioned, it's the community that
helps to keep these ideas moving forward and improving.

~~~
javajosh
Does working on your project, or running it, require a Dyalog license?

~~~
arcfide
You'll need to get at least a free copy of Dyalog to use the compiler. They
have a number of different free options. If you intend to use the compiler, it
is available for use by all under an AGPL license. However, the generated code
also uses Co-dfns code and is likewise under an AGPL license. If that license
is unsuitable for you, then you'll need to arrange for an alternative license
through Dyalog. Normally this will only be required for people interested in
using Co-dfns commercially in their closed source projects.

The Co-dfns work is run like an Open Company however. You are welcome to
submit contributions (see the README for more information) as pull requests
and can onboard yourself at your leisure. I'm working to encourage more people
to support the funding of Open Work on Co-dfns, but that's a long, slow
process.

So, yes, generally speaking the compiler is designed to be used in conjunction
with the host of additional tool support you get from Dyalog APL (the
interpreter). However, Dyalog makes it as easy as possible to get a copy of
the interpreter.

~~~
javajosh
If I understand correctly, you can use the free, non-commercial-use Dyalog to
run the AGPL Co-dfns compiler to produce programs. And also, the Co-dfns
output is not independent of the compiler itself, meaning that the output must
also be AGPL.

I'm speculating, but I suspect that anyone using Co-dfns output will be
running it on a private bank of GPUs over a large private data-set, so I
suppose the AGPL won't ever matter for them, in practice.

~~~
arcfide
If that works for them, more power to them! They'll still likely need a
commercial Co-dfns license if they are doing that, not because of the produced
code, but because of the programming environment in which they do the
development, unless they want to slice Co-dfns out of its "host" environment.

You are right, however, that you can use the free, non-commercial Dyalog to
run the AGPL Co-dfns. Co-dfns is both a commercial and a research enterprise,
and so it is important to me that the ideas in the compiler are manifest
openly to the public, hence the open-source nature of the project. However, it
also needs to fund that research somehow (have you ever tried to get a grant
to write APL? Hahahaha!), and so commercial uses need to be managed in a
reasonable way, namely, something mutually equitable.

Some interested parties are working to use Co-dfns in public facing services,
and not just private data sets. One group is very slow moving, but we're
exploring the possibility of a rewrite of the cryptographic stack.

------
jarpineh
I had some exposure to APL program for statistics few years back, but never
really dived in. Largest problem being that the tools I encountered then
didn't yield themselves well to intermittent exploration (indeed, Dyalog APL
had Linux version that required Wine, which was very buggy). If somebody would
like to give pointers on how to start with APL in 2017? I mean things
pertaining to developer experience: such as what APL implementation is easier
to start with on MacOS/Linux, how to find libraries and frameworks, good
example app on Github etc.

Though don't want to hijack this thread from questions specifically about the
compiler + runtime. On the earlier thread
([https://news.ycombinator.com/item?id=13565743](https://news.ycombinator.com/item?id=13565743))
you mentioned two things I'd like to know more about:

\- function trains: any examples? could one do those in Lisp such as Scheme?

\- what is Modern dfns based APL? (something about better APL for functional
programming)

Thank you for your comments so far. I tried to watch the live stream, but
there was some audio skips now and then. Will have to try again on next long
train ride.

~~~
arcfide
In 2017 I can mostly only recommend Dyalog APL. Keep in mind that I am highly
biased because my compiler is written in Dyalog APL, and they sell my compiler
commercially. However, IMO< Dyalog APL is _the_ place to start with APL in
2017.

They now have Mac/Linux versions that include the "RIDE" (remote IDE) which
allows you to work with the code similarly to the IDE/Interface you see me
using in this video. I don't use any of the IDE features, such as the code
editor, debugger, explorer, or the like, but for a new user it will be quite
useful, and it will allow you to "point and click" your way through, including
the entering of the APL symbols.

Dyalog APL for Linux also has a slick, but old school console IDE, which
includes a built-in full-screen text editor, debugger, and the like, which is
built around console interactions. If you're a hard core CLI guy, then this
may appeal to you.

There is an excellent book called "Masterying Dyalog APL" which covers all the
major big features, built-in frameworks, and the like, for working with Dyalog
APL. Dyalog comes with built-in interfaces for things like R, .NET/Mono,
ODBC/Relational databases, sockets and networking, cryptography, &c. It also
ships with things like vecdb, a sophisticated web framework, and a GUI
framework (windows only).

Additionally APLers should all have a copy of the FinnAPL Idiom Library at the
ready. It serves as the "core library" for a lot of common tasks which takes
the place of "example app" and "library calls" in many cases. Additionally,
Dyalog has a large suite of dfns to use that they document at dfns.dyalog.com.
Dyalog also comes with document processing and graphing libraries that can be
used interactively or programmatically to produce all sorts of documents and
graphs.

My general recommendation would be to stick to using dfns as much as you can,
programming without trying to use external libraries. It's a common "mistake"
that people learning APL try to "find library function to do X" when it's
often a "half a line" APL program that would be considered "idiomatic" in APL
and thus not something APLers would have a "library function" for. It's better
to master the core language, because you'll find your need for libraries
diminishes quite a bit once you become comfortable with the language. Not that
there aren't libraries, but if you really need a library function, you'll know
it, and then you'll know if you can find it.

The best example apps for writing dfns would be the dfns library by Dyalog at
dfns.dyalog.com which comes shipped as a dfns workspace for use by Dyalog APL
programmers. These are well commented, and cover a vast array (snicker) of
topics.

Function trains are one of the neatest features of the Dyalog APL language,
and represent an extension of traditional APL 2 into the "Rationalized APL"
space from which comes the likes of J. The example is the above video. The
90-line compiler component that I discuss in the above video is written almost
exclusively as a function train. I believe I dedicate some time to talking
about function trains in the above video.

You cannot do these things in Scheme, as they are fundamentally a syntax that
is incompatible with S-expressions. You could do a form of points-free
programming in Scheme, but the nature of S-expressions means that the
structure of the point-free code is explicit, and thus, if you wanted
function-trains, you would have to implement a macro to construct them for
you. That could be done, but at that point you would be well on your way to
implementing APL in Scheme, rather than using Function-trains in Scheme.

Function trains also don't work well for functions whose arity exceed 2.

dfns is a syntax invented by John Scholes for implementing the APL notation
using a lexically scoped syntax for function definition and control flow. They
replace the more traditional flat namespace programming that is dynamically
scoped, or the OOP programming model, both of which are still supported and
widely used in Dyalog APL. However, to a Scheme programmer, dfns is a God-
send, because they closely mirror the core constructs and concepts of a Scheme
programmer, namely: lambda, recursion, and if statements/exception guarding.

When I don't use function trains in my compiler discussed and demonstrated in
the linked presentation, I'm using the more explicit dfns syntax.

~~~
jarpineh
Thank you for comprehensive comment. Hopefully I can understand enough about
the video to get the gist. Idea of Scheme style functional features with APL's
power tools. If you get around writing more about these things I'll definitely
be reading.

Since most of Python and JavaScript I do at work is mostly searching and using
examples and libraries from where ever having full complement already with the
system feels impossible. On the other hand, APL implementations have had some
time to make things right. I'll take a look at that big APL book and see what
I can gather (my favorite way to learn).

BTW: I got my first taste of APL from one of the FinnAPL fellows. I saw some
incredible things by him I've since tried to replicate after a fashion.

~~~
arcfide
The Mastering Dyalog APL book is available as a free PDF, too. Though the
paper version is more enjoyable, and serves as an excellent monitor stand in
need, as well.

As a Scheme programmer, one of the first things I did when learning APL was to
create a new REPL/environment in Chez Scheme called Sapling that replicated
APL in Scheme using Scheme implementations of the primitives.

I quickly discovered that at least for my meager mind, there was no way to
combine the two as one. I could use one within the other without trouble, but
I couldn't mix the two easily.

If you're interested, the Co-dfns compiler is designed to integrate with
existing languages. If your language has a C FFI, then you'll be able to
integrate with Co-dfns compiled code. The workflow is basically for you to
produce a namespace of the functions you want to use, and then you can compile
them and link against that in your Python or C or C++ code. You can then just
call those functions with the appropriate arrays and things are a go. The main
then you need to do is write a function to create a Co-dfns array to and from
the data types you are using in your program.

~~~
jarpineh
This got me a bit confused. Is the purpose of Co-dnfs to compile apps that run
on a GPU? Or does it only itself compile on the GPU for apps than can run on
any target architecture?

Other than that the idea of getting using my domain specific Python stuff (and
few general libraries) with data processing tools is intriguing.

~~~
arcfide
These two goals are not mutually exclusive. The Co-dfns compiler is a compiler
that is designed to self-host on the GPU, but it is a compiler for the Co-dfns
language, which is a lexically scoped syntax in Dyalog APL. The compiler
compiles dfns programs and supports Mac, Windows, and Linux, targeting CUDA,
OpenCL, and Intel CPUs. The compiler itself is written as a dfns program,
hence the idea of self-hosting on the GPU.

Target applications for the compiler include document processors,
cryptographic libraries, neural networks programming, data analytics, high-
performance web applications, vector databases, financial algorithms,
bioinformatics, and HPC/Scientific computing tasks.

So, in short, yes, it compiles apps to run on the GPU, or the CPU, whichever
you prefer. And yes, it is meant to compile itself on the GPU (so, you compile
a GPU program on the GPU).

------
arcfide
This is an AMA for the linked compiler presentation by request. It is not live
video, but I will be answering your questions directly here on Hacker News.
This was requested due to some logistical issues and so forth during the live
presentation. The content is currently being edited to start at the
appropriate place, but at the moment you may have to skip ahead 17 minutes to
catch the start of the presentation.

------
bakul
This is fantastic! I can sort of grok your code at a high level but only
enough to appreciate it; not to do much with it as that would require
investing a bunch of time.

Questions: 1\. Did you find anything really surprising? Some new algorithm or
idiom or something that was surprisingly easy? 2\. Is it worth applying some
of the same tricks in compilers for other languages? 3\. Have you considered
using the same "frontend" to create an APL->Scheme translator? That might have
more of a pedagogical value. Scheme has a great macro system and it is a HLL
so can be a good IL at the very least -- I can then translate the compiler
itself to Scheme and study :-)

~~~
arcfide
1\. There was a lot surprising throughout this whole process. For a while I
didn't think it could be done. That it could be done was surprising in itself.
That using trains actually improved my life was also surprising. There are
some "research class" algorithms that I have published as a result of this
work, so, there are some new idioms and algorithms that came out of this work.
Additionally, this whole compiler is essentially a new treatise on the
construction of compilers in general, providing a set of idiomatic methods for
accomplishing tasks that would have been traditionally accomplished with other
algorithms more commonly taught (such as the way you would normally do a
depth-first tree traversal vs. the use of a path matrix to compute parent-
child relationships and using key to transform the tree). Some compiler passes
turned out to be really surprisingly easy, such as function inlining, which is
a recent addition not shown here. After being familiar with this code, it took
very little time to explore, write, and implement. Certain things that are
multi-part algorithms in other languages turn out to be a single, well-known
APL primitive, such as Key (for tree transformation) or Tally for AST/tree
"count".

2\. It depends on how you write your compiler and what your core abstraction
level is. The compiler design here is language agnostic, and you could
implement a compiler for any dynamically typed, lexically scoped programming
language using this compiler as a template. Adding a type-checker on top would
not be difficult, but I haven't established what I consider to be "idiomatic"
type-checking algorithms yet, though I did have a version of type inference
that was okay that was only recently removed.

However, chasing concision or using micro-level tricks without understanding
the macro-level issues being addressed by the micro-level tricks will probably
lead to unreadable code. The main point here of the compiler design is the
importance of prioritize the macro level thinking and optimizing the micro-
level to support better macro-level reasoning. How you can apply this to your
own compilers depends on how much you're willing to alter the core
"aesthetic/HCI philosophy" of your code base.

3\. It would be almost trivial to do so, and would have probably zero
pedagogical value in the way you're thinking. To do so you would change the
"gc" function to spit out Scheme instead of C code, which would be trivial,
and it would require changing a bit of the syntax of the outputted code on the
following pages. If you wanted to reimplement the runtime as a Scheme library,
that could also be done. You'd lose parallelism, of course. The C code and the
Scheme code would probably be equivalently clear.

The reason it wouldn't be much use, is because there would be nothing
idiomatic about the Scheme code upon which you could "rest your hat." Reading
APL written with S-expressions rather than APL notation is harder, not easier.
It looks no more intuitive to a Scheme (I was a Scheme programmer before this
[see R7RS Small]) programmer than the APL.

Keep in mind that the core compiler is literally just a huge string of
function calls. It's literally nothing but function composition.

(2=+⌿0=X∘.|X)/X←⍳20

vs.

(let ([X (apl-iota 20)]) (apl-filter X (apl-equal 2 (apl-reduce 1 apl-plus
(apl-equal 0 (apl-outer-product apl-modulus X X))))))

Now, if you're really familiar with Scheme, then the terms iota, filter, and
reduce should suddenly pop out at you, but that's dangerous, because Scheme's
implementation of these don't at all behave the way APL's does, though they
are in the same general algorithmic family. So, this might help your study for
about 1 hour, before you had memorized all the most common operators for which
Scheme has a similar primitive that kind of does something a little like it.
However, we just about covered all of them in the above code.

Given the above code, what does it do? Was the Scheme significantly easier to
understand? Now what if you had 90 such lines? The benefit of such translation
starts to die out about after the first line of APL code that you read.

------
gwu78
This is a real gem of a comment:
[https://news.ycombinator.com/item?id=13571160](https://news.ycombinator.com/item?id=13571160)

"petty abstractions" to assist reusability

Layer upon layer upon layer of indirection.

I want to be liberated from this mindlessness.

It is everywhere in the programming world.

Large sums of money are spent to maintain and propagate this way of thinking.

Large companies have been built upon it.

Is it possible to escape this?

These contributions from arcfide are heartening.

~~~
ycmbntrthrwaway
[http://suckless.org/](http://suckless.org/) guys decided to give up building
libraries and abstractions

------
nodesocket
I'm a complete a noob to APL (I come from web development and ops), but
reading the Wikipedia there is a section that says:

The following expression finds all prime numbers from 1 to R. In both time and
space, the calculation complexity is O(R^2).

    
    
        (~R∊R∘.×R)/R←1↓ιR
    

How? What? I feel stupid now. ;-)

~~~
Avshalom
ιR creates an array 1...R

1↓ drops the first element of whatever is to the right (1...R)

R← just assigns the thing to the right to R, so now R is a variable containing
the integers 2...R

R∘.×R creates a 2d matrix of every element in R multiplied by every other
element in R

R∊ takes whatever's to the left (that matrix) and sees if any value in R is in
that matrix. it spits out an array that is like 1 0 0 1 0 1 1 0, where the
ones correspond to elements in R that are also in the matrix

~ just reverses that list so now we have an array thats like 0 1 1 0 1 0 0 1
which corresponds to elements in R that can't be created by multiplying any
two elements of R together

/ goes through and only returns values of R that correspond to those 1's

the flow is like this:

    
    
      R←5
      ιR
        1 2 3 4 5
      1↓1 2 3 4 5
        2 3 4 5
      R←2 3 4 5
      R∘.×R
         4  6  8 10 
         6  9 12 15 
         8 12 16 20
        10 15 20 25
      R∊R∘.×R
        0 0 1 0
      ~R∊R∘.×R
        1 1 0 1
      1 1 0 1 / 2 3 4 5
        2 3 5

~~~
arcfide
Thanks!

------
gwu78
From reading your blog, I detect that you like working in MS Windows.

Can your compiler be ported to BSD?

Here is the rationale for why this can be useful: BSD can in turn be ported to
new hardware with reduced amount of effort, sometimes a project that is
manageable by one person.

(Compare this to what is required for porting Windows, Linux, etc. to new
hardware).

Imagine an APL computer with an open source kernel that is easily ported to
new hardware.

~~~
arcfide
Actually, I'm an old school OpenBSD user who still programs with ed(1). Check
misc@ out back a number of years. I've had to move away because OpenBSD isn't
the greatest platform for doing high performance GPU research. :-(

The compiler is already portable to BSD. However, the underlying library I use
to do GPU programming (ArrayFire) is not ported to BSD, though it Is for
Linux. The Dyalog interpreter is available for AIX as well as Linux, and
that's what you need to host the compiler right now. If Dyalog received the
interest, it would be pretty straightforward to port Dyalog APL to BSD.

If one wanted to provide a self-hosting Co-dfns compiler, you could implement
a minimal dfns interpreter and then run the Co-dfns compiler over that. To my
knowledge no such interpreter exists yet. However, porting either ArrayFire or
the Co-dfns runtime would go a long way to doing this, because you could then
share the runtime library between the interpreter and the compiler.

So, yes, this is quite possible, and probably not even that much work. I'd
welcome such a contribution. :-)

~~~
kovek
How do you manage to still work with ed? Where can we read about that? I'm
curious to see how useful it can be

~~~
arcfide
I don't have it anymore, but I did a live coding session in ed(1) where I
edited thousands of lines of XML code as a demonstration of using ed. It's a
very powerful editor, almost too powerful for my "simple tools" aesthetic.

Just a few things you can do with Ed:

Structured/Outlining Refactoring Rapid code catenation and splitting

It's one of my favorite editors. I don't think I have a video of it, but prior
versions of the code were optimized for ed(1) hacking, allowing me to do
structured/outline editing easily with ed(1).

------
breck
Really different and fascinating video, thanks!

Two surface level questions: 1) The "trains" design pattern you mention, do
you have a link (to wikipedia or something) with more info? i might know the
pattern by a different name 2) I liked the tree printout seen at 1:42. Is
there a library or algo you used for that?

~~~
arcfide
1\.
[http://www.jsoftware.com/papers/fork.htm](http://www.jsoftware.com/papers/fork.htm)
See Hook and Fork. This was the first introduction of Trains programming in
APL, I think. You can see help.dyalog.com for more information of the current
implementation in Dyalog.

2\. I think the tree printout of trains is a built in library feature of
Dyalog APL. The dfns library (dfns.dyalog.com) has a tree printing dfns that
may be of use in implementing it, and I think you can read the actual source
code for that in the user command for display or the like.

I may implement a version of this for my ASTs to augment the Xml and px
utilities, as it is a nice format and one that might be of use for printing
out the AST of the compiler. I have not had need of it yet, though, so I'll
probably get to it the next time I have to do a fancy demo for which I can
prepare more adequately.

~~~
breck
Thanks! Also found this link which had more info on Trains:
[http://help.dyalog.com/14.1/Content/Language/Introduction/Tr...](http://help.dyalog.com/14.1/Content/Language/Introduction/Trains.htm)

~~~
arcfide
Ah, there it is, thanks!

------
psandersen
This looks interesting, but getting a feel for the implications is pretty
difficult without spending a lot of time.

A discussion of the implications in terms of speed, robustness,
maintainability and future hardware would be really useful.

E.g. if some team did the crazy undertaking of re-writing all the software
(kernel included) in a base ubuntu install using APL on the GPU what would the
positives vs negatives be? Is this even possible/desirable or does it require
research breakthroughs? Is this more suited for DSP functions, or even
extendable to probabilistic programming?

What is the best hardware match? Does this offer advantages to push synthesise
programs to FPGA's, or some of other hardware (can't remember the name, many
small dumb cores with limited memory; maybe a bit like the Cell processor).

~~~
arcfide
Robustness and maintainability is as much a product of the minds working on
the code as it is the code itself. However, I can say that for me, this was
not even a conceivable enterprise, much less one that could be done by one
person or, really, a few people, without the approach I have taken, for me at
least.

I am interested in this partly because I feel that future hardware will more
and more reward code written in this style than in other styles, and I want to
make it possible for people, especially non-computer scientists, to learn and
produce good code and "think" about problems more rigorously, which, as
problem sizes increase, makes this sort of approach important for scaling.

As for speed, one nice, but currently unexplored feature of the compiler as it
is written above is that the entire complexity (algorithmic complexity, Big O)
is computable, even trivially derived from the code itself, because the
compiler is just a string of function compositions on primitives with well
known algorithmic complexity. A future feature plan for the compiler is to
produce Big-O complexity analysis for code of this sort to help developers
spot performance regression points earlier in their code, since the compiler
can now assist you in doing so symbolically, and not just with test cases.

I have thought a bit about how easy it would be to compile code like this to
the FPGA, and I think there are a lot of possibilities here, but I've not been
able to make time to study it completely. My current scope is limited to GPU's
and CPU's of the more standard ilk.

Rewriting the kernel of an operating system would be very challenging if you
expected good performance on the GPU. GPU's are not designed to make context
switching of this sort (as found in Linux) easy, or fast. It's highly likely
that you will run into divergence problems, at the very least.

However, people have explored the implementation of operating systems on top
of these concepts (see kOS). I expect those to be very practical, and it would
be an interesting future project to bootstrap Co-dfns programs to run natively
on bare metal.

A more interesting initial idea would be to explore rewriting the base
userland in OpenBSD or Ubuntu in something like Co-dfns. This would open up
interesting challenges relating to file systems and interacting with GPU's and
the file system, but it is at least conceivable that many userland functions
would become much easier to write and maintain (maybe) by playing with this
approach. However, it would require additional infrastructure support and the
like which Co-dfns currently does not have. I imagine the first step would be
to identify the core file system APIs that would need to be provided in some
reasonable analogue in Co-dfns, and then try implementing the code on the CPU
first. I would expect that the code would be simpler, and could potentially
better utilize the CPU, but I am not sure you would see fundamental
improvements in performance without going way up in scale and increasing
computational needs to the point that the speed of the filesystem is not the
limiting factor.

~~~
psandersen
Thanks heaps for the detailed reply! The BigO complexity analysis 'for free'
sounds pretty interesting.

I'm wondering if Co-dfns would be a natural fit for an algorithmic design of
programs through deep learning. We're already seeing some attempts at using an
AlphaGo like architecture (marrying traditional AI such as monte carlo tree
search with modern deep learning to learn useful heuristics, or in the case of
a recent paper on program design I can't find right now, they combined the
network with an SMT solver).

If the compiler naturally gives the complexity of a program, creating a
feedback loop with a speed prior (
[http://people.idsia.ch/~juergen/speedprior.html](http://people.idsia.ch/~juergen/speedprior.html)
) might work very well for AGI.

Sorry if this far out speculation is off topic, I'm just really keen on
exploring tech that could be valuable in AI development.

~~~
arcfide
If that's something of interest to you, get in touch with me via email and we
can see if we can't get Co-dfns supporting what you need to make it a reality!
There are a number of things that could make this nice in Co-dfns, including
the possibility of using idioms instead of library calls, potentially making
adaptation of algorithms easier.

------
hzhou321
I understand that speed may not be an issue here but I still like to ask:
won't so many nano-passes affect speed?

~~~
arcfide
See Andy Keep's dissertation on using Nanopass to reimplement the Chez Scheme
compiler. The short answer is, not really. Nanopass can add some constant
factors, but is essentially a linear/constant addition, whereas the majority
of compilers are quadratic or greater in complexity. Chez Scheme was/is N log
N in most cases, and even then Nanopass did not significantly harm
performance.

It actually has benefits on the GPU beyond just the CPU, because the passes
tend to be smaller, and thus more readily executed efficiently on the GPU,
meaning that you would likely see a large net gain in performance by using
Nanopass. Since I believe this might be the only compiler of its kind, it's
hard to say for sure on that mark, though.

~~~
hzhou321
Does N here refer to the length of source code or the number of passes? When
the source code is big, even the identity pass takes time, right? I don't
understand how it become constant addition.

Could you quickly comment on that the majority of compilers being quadratic or
greater?

By the benefits on the GPU, it refers to the size of the code for the passes
or the size of the source code? I assume the passes are run serially, right?

[I'll read Andy Keep's paper.]

~~~
arcfide
In compilers the N value is commonly considered to be the number of nodes in
an AST. Compiler performance is generally evaluated based on its performance
over the code, which is generally bound by the number of nodes in the AST.

The identity pass is linear in the size of the AST in almost all functional
compiler designs. Optimizations would then make this constant time for most
functional languages with adequate optimizations. The identity pass in Co-dfns
would be the single character ⊢. The Co-dfns compiler would produce code for
this pass which is constant time.

Likewise, with a bit of work, this would also be true for a Nanopass compiler
function, unless you insisted on traversing the AST and rebuilding it
yourself, in which case the framework could not optimize this away.

The majority of compilers use algorithms that are quadratic in the size of the
AST (number of nodes) for a number of passes. Many compilers use a variety of
graph coloring register allocation, which is NP-complete. However, better
register allocation algorithms exist that are quadratic or quadratic log in
the number of nodes in the AST. The Nanopass research has demonstrated that
for real life compilers, the cost of certain compiler passes generally
overwhelms most of the cost of the rest of the compiler, and so adding another
linear factor by splitting a pass into multiple passes, generally does not
affect the big-O complexity of the entire compiler, and thus, the overall
performance at scale, though it can affect the constant factors by a small
amount, depending on how good your allocation engine is for your host
language.

This Nanopass research has only been done on compilers implemented on the CPU.
My research is exploring whole compilers on the GPU, which is an area that
shows promise, since even memory bound computations such as most compiler core
compiler transformations, can see significant performance speedups, even if
they move from a linear to log linear complexity.

The passes are not run serially. Each pass is data parallel, and while there
is a serial dependency in that each pass is dependent on the results of the
previous pass, nothing in the way the compiler is written forces a hard
barrier (such as a join in fork-join programming) at the compiler pass
boundaries. It is conceivable that optimizations techniques for stencils such
as Trapezoidal decomposition could be applied to compilers written in this
fashion to allow intra-pass optimization to occur, but this is future work.

The passes themselves execute in parallel on the GPU, or at least, that is
their design. They are expressed as natively data parallel, or vectorized,
algorithms. If run on the CPU through the interpreter, then they will execute
using the Intel Vectorized code units using the optimized primitives in the
Dyalog APL interpreter.

~~~
hzhou321
Thanks. Your work is inspiring.

------
panic
How much abstraction overhead does the ArrayFire library introduce? Do you
think you could get better performance and simplify the overall system by
generating CUDA or shader code yourself?

~~~
arcfide
I would consider ArrayFire a "hard abstraction" that is basically the
bottoming out of the system. Previous to this I was using OpenACC. See my talk
at the Dyalog User Meeting Last year to see some of the struggles there with
Scan.

In short, ArrayFire is handling a lot of the core GPU algorithms for me, and
that's its main contribution. This prevents me from needing to implement these
myself, because this is itself a full time job. Much better to put a hard
cutoff point and work on the high-level questions, rather than spending time
trying to support three different architectures by myself. There is basically
no good way of working at that level that isn't ugly and very very painful. I
am glad to have someone else do that for me.

The benefit is that I can take better advantage of their platform independent
and performance tweaks for individual GPUs (which is still a lot of work with
the state of GPU programming today). This leaves me to focus on higher level
concerns that are more relevant to the APL programmer.

------
pavanky
Hi Aaron, I saw the link earlier but did not get the time to check it out
earlier. Someone from arrayfire pointed out that you were using arrayfire and
now I recognized your name / id from the arrayfire forums :)

Anyway, I am giving it a listen now.

If I had to ask a question, what is something that frustrated you about
arrayfire and what do you think needs fixing ?

~~~
arcfide
ArrayFire was designed primarily as an user library for scientists, it seems,
and so there is a clear bias there. My coopting it for use as an IR language
for a compiler front-end is perhaps outside of the original scope of
ArrayFire. The good thing is that ArrayFire manages to capture a few key ideas
that are often absent, but critical to APL, so that was good.

On the other hand, it is non-trivial to work with more advanced concepts and
workflows/algorithms in a way that allows me enough control over my data to
get the performance I want when dealing with nesting or other nuanced
execution patterns. As a trivial example, it would be not having more control
over how Scan works. ArrayFire is unfortunately not unique in the very limited
amount of support for scan operations that will be executed in parallel. I
have worked around this, but it's not ideal.

Additionally, the rank limitation of 4 in ArrayFire is infuriating. Given that
we're working with C++, it seems like there should be no reason that I should
be able to have arbitrarily many dimensions.

The design of arrayfire also makes certain code compositions difficult (this
goes back to the issue with nested execution) meaning that I'm spending a lot
of effort on working around this limitation in the compiler, having to add new
compiler passes and optimizations to massage the output code in such a way as
to avoid the syntactic and semantic limitations of the arrayfire primitives.
I'm not sure ArrayFire can do better in this regard, and this is the sort of
optimization that a compiler front-end could make plenty of use of, but it
would have been a nice touch to have some built-in support for mixed CPU/GPU
statement overlap to avoid me needing to separate out certain codes and
possibly introduce various stages to make sure that all the computation can be
properly fused and executed. This limitation is especially felt with GFOR.

So, if I had my way, ArrayFire would be less user friendly to the scientists
(that's why Co-dfns is here! ;-) ) and more friendly as an IR for implementing
APL runtimes. :-)

~~~
pavanky
> On the other hand, it is non-trivial to work with more advanced concepts and
> workflows/algorithms in a way that allows me enough control over my data to
> get the performance I want when dealing with nesting or other nuanced
> execution patterns. As a trivial example, it would be not having more
> control over how Scan works. ArrayFire is unfortunately not unique in the
> very limited amount of support for scan operations that will be executed in
> parallel. I have worked around this, but it's not ideal.

Can you give me an example ? If you think this is not the forum to discuss
this, you can always email me (check my profile) or ping me on gitter
([https://gitter.im/arrayfire/arrayfire](https://gitter.im/arrayfire/arrayfire).

> Additionally, the rank limitation of 4 in ArrayFire is infuriating. Given
> that we're working with C++, it seems like there should be no reason that I
> should be able to have arbitrarily many dimensions.

I want to solve this as well, but it requires a lot of tedious work so it
keeps getting postponed. But rest assured this will be fixed!

> So, if I had my way, ArrayFire would be less user friendly to the scientists
> (that's why Co-dfns is here! ;-) ) and more friendly as an IR for
> implementing APL runtimes. :-)

ArrayFire development has been request driven. So the features requested /
used a lot got implemented first. But I don't see why the features you
requested can not be implemented in arrayfire!

~~~
arcfide
I'll try to contact you out of stream. Thanks for getting involved here!

------
jonahx
arcfide,

What are your thoughts on J vs APL? Also, is APL your preferred language for
most kinds of problems? When it is not, why not?

~~~
arcfide
J's ideas are terrific. Dyalog is actively incorporating the ideas in J back
into APL. J's choice to use ASCII characters instead of Unicode is a
historically understandable design decision that I feel should no longer
constrain the APL community. Otherwise, Dyalog APL and J are my two favorite
APLs.

If I can't write in APL, I would probably write it in Scheme. I haven't found
anything yet that wasn't quite addressable in APL, and the one that was
(writing compilers) I figured out how to do, obviously.

However, if you want to create a DSL or otherwise play with notational
elements in a DSL style for which macros are the "right choice" then APL is
unlikely to be your friend, because I have yet to find a way to create good
macro systems on top of APL. Scheme simply wins here. However, vocabulary
based DSLs (combinatory DSLs or the like) are very well suited to APL.

Outside of these, I'm actively exploring the idea of theorem proving based
programming, and currently things like Agda, Idris, ACL2, or Isabelle are
going to be better for this right now. Not inherently, but because of tool
support. I hope to change this in the future.

------
throwaway7645
Is parallel programming really that easy in Dyalog APL? Could someone provide
a trivial example?

~~~
arcfide
Currently Dyalog APL provides green-threaded concurrent programming using a
traditional "fork" model as well as a CSP-style distributed computing model
called "Isolates." In addition, Dyalog will utilize vectorized operations on
the CPU for a number of primitives and code patterns. Co-dfns provides for
additional functionality by providing vector level parallelism on the GPU/CPU
and a true threading "fork" (this feature is only part of the language
extension, and has not yet been implemented). So, if you write your code in a
Co-dfns compatible manner (closed single Namespace of dfns, and some
limitations in the current version which are being lifted rapidly), then you
will be able to execute it on the GPU, CPU, and SMP machines.

As a trivial example, say you wanted to compute the quadratic prime numbers
algorithm in parallel:

    
    
        Primes←{(2=+⌿0=X∘.|X)/X←⍳⍵}
    

You would simply compile this function inside of a namespace using Co-dfns and
you could then run this function on the GPU to improve performance. The
compiled code would run on either the CPU, GPU, or SMP machines.

~~~
throwaway7645
It just does it automatically?

~~~
arcfide
Yes.

------
Yahivin
I love the fearlessness. You're now one of my heroes.

~~~
arcfide
Acquire the APL, and then be the hero in your own code every day.

------
erikpukinskis
Look at this guy's code and you start to understand how we'll get to the
Matrix situation of being able to watch scrolls of random-looking symbols and
picture people and food and cities in our head.

~~~
arcfide
And here's a slightly more sophisticated version that mimics the animation you
see on the Matrix:

    
    
          T←41 80⍴' '
          )ed T
          {⎕DL÷16⊣T∘←⎕AV[156+?80⍴54]⍪1↓(-0=?80⍴5)⊖T}⍣≡⊢0
    

In your Dyalog APL session, type the first two lines, put the opened edit
window somewhere you can see it, then go back to the session repl and enter
the third line. On Linux's terminal you use the APL+Tab key to switch between
windows on the console.

For best effect, change your edit window theme to use green on black. :-)

------
zzzcpan
Is there a transcript?

~~~
arcfide
Unfortunately, no. There's a lot of code demonstration, some REPL work, and
interacting with the source, so I'm not sure that the transcript would
communicate things adequately without significant editing and adaptation. I
would like to put this into some sort of guided documentation, however, at
some point in the future.

------
beagle3
Haven't had time to watch the youtube video ...., but here's a question
related to APLing in general:

Do you find that the GPU angle makes people more receptive to the ideas behind
APL? In my own experience, APL/K/J quickly get a SEP field applied to them (is
also visible in the discussions linked by dang), some people dismiss it
outright, and some say "well, it might have been easier to grasp if the syntax
wasn't so obtuse" (which is demonstrably wrong). I've given up on evangelising
APL/K, but being such a perfect match for GPU semantics, maybe there's hope
for more acceptance? What's your take?

~~~
coldtea
> _and some say "well, it might have been easier to grasp if the syntax wasn't
> so obtuse" (which is demonstrably wrong)._

Can we have that demonstration then?

~~~
beagle3
One example I like for K is: Maximum subarray sum[0]. The canonical K solution
(sorry, my APL is too rusty to translate) is

    
    
        |/0(0|+)\
    

Yes, this is the complete function, and it is the efficient O(n) solution; A K
programmer is likely to recognize it immediately (using eyes for pattern
recognition). Now, if I were to rewrite it in infix/paren syntax, it would
become:

    
    
       mss(x):=fold(max,0,scan(lambda(a,b) max(0,a+b), 0, x))
    

This is not at all more readable (arguably less readable) than the K version,
and it's inherent, in the sense that the K uses your organic pattern matcher
(supplied by eyes and brain) much better than a haskell/lisp version ever
will.

Similarly,

    
    
        ,//
    

Is flatten; If you were to write it verbosely, it would be

    
    
        flatten(x):=converge(fold(concatenate,x))
    

While this may be more readable, the K syntax (as does the APL one) gives you
a concise, well defined, pattern matchable version. (Which doesn't actually
need documentation - the implementation is 3 chars, so if you want to know how
it behaves on an empty list -- just read the implementation!)

[0]
[https://en.wikipedia.org/wiki/Maximum_subarray_problem](https://en.wikipedia.org/wiki/Maximum_subarray_problem)

~~~
arcfide
I believe the translation of |/0(0|+)\ is ⌈/0,(0⌈+)\

~~~
Avshalom
K does a thing (like J I think?) where it's scan/reduce returns each
subsequence/intermediate-value as it goes which is what makes that problem so
trivial in K.

~~~
RodgerTheGreat
The K "over" adverb (/) is like a conventional reduce/accumulate/fold. The K
"scan" adverb (\\) produces a list of the intermediate results. Having both is
extremely handy.

If the verb argument of over/scan takes a single argument (rather than two),
the adverbs will iterate to a fixed point. As before, "/" will return the
final result and "\" returns all the intermediates.

For example:

    
    
          +/1 2 3
        6
        
          +\1 2 3
        1 3 6
        
          *:/,,,0
        0
        
          *:\,,,0
        (,,,0
         ,,0
         ,0
         0)

~~~
arcfide
Oh wait, so you're not preserving the subsequence, you just preserving the
sum. So it's a standard scan. So my original translation is correct, my
subsequence preserving version is doing a bit more than this by preserving the
subarray as well as the sum.

------
BuuQu9hu
I don't know any languages in the APL family. Which should I learn and why?
Which should I definitely avoid and why?

~~~
arcfide
Among the APL language families I would recommend are Dyalog APL and J. J is
great for some of the ideas it has implemented and to gain a flavor of
Iverson's vision of APL as it evolved. Dyalog APL is, IMO, more practical, and
we are quickly integrating as much of J into APL as is reasonable.

Personally, I am highly biased towards using the APL symbols, and not to get
"sucked in" to the idea that using ASCII symbols is easier. I think the APL
symbols provide important visual and pneumonic information that actually makes
learning better over time than trying to use ASCII. There is built-in APL
keyboard support on Linux, and Dyalog APL comes with a good IME for Windows
and Mac, so there's no reason not to use the symbols anymore.

I don't know that you should avoid any of them, but I would encourage you to
use dfns if you can. Using a lexically scoped syntax is going to give you a
better idea of how "good modern" APL is written. A lot of APL's don't include
the J innovations or they don't have structured programming features or lack
lexical scope, which is a bummer.

So, overall, go Dyalog. I'm highly biased.

~~~
jxy
I would be sold if Dyalog had a graphical environment equivalent to J's jqt,
something you can at least plot the data. Everything I can find with Dyalog is
.Net and Windows. RIDE seems to be the right step forward, but still much
inferior than jhs.

On the other hand, Dyalog can't beat J's open source, but I guess that's
something much more difficult to change.

~~~
arcfide
One problem with cross-platform graphical environments, IMO, are that they
fail the integration standards that I've come to expect from GUI apps. I want
Mac apps to feel like Mac apps, Linux apps like Linux apps, and Windows apps
like Windows apps. This means making heavy use of specific interaction
paradigms for each platform to achieve the best level of integration. This
isn't possible with shared graphical environments.

~~~
bjterry
I think this expectation has mostly dropped away in the last couple years. A
lot of apps now have interfaces inspired by apps (e.g. Airmail on Mac OS X
acts like an iPad app) or web apps (all the Electron apps). Obviously this
isn't saying that a personal dislike for it is wrong, I just think that it's
becoming less important in the market.

~~~
arcfide
This might be true for standard consumer applications that can fit within the
web paradigm. However, you then fail to be able to leverage vendor specific
interaction paradigms, which is something I want to be able to do, such as the
Mac touch strip, the Windows Surface Dial and Pen, or the Gnome title
bar/notifications/messaging interface. I like to interact with voice services,
so you need something for this as well, or handling the various HiDPI features
of each system.

