
Ruins of forgotten empires: APL Languages (2013) - mml
https://scottlocklin.wordpress.com/2013/07/28/ruins-of-forgotten-empires-apl-languages/
======
rebootthesystem
I used APL professionally for about ten years. I even have a picture with Ken
Iverson, the language's inventor, when I was 19. While I still programmed in
assembly, Forth and C, I was one of the "APL guys" at work.

The language changes the way you solve problems computationally. It changes
your focus away from the mechanics programming to inmerse your mind in the
problem domain. It is hard to understand this without working with APL for a
year or more.

There's another element here that is often ignored: Notation.

APL uses a unique set of symbols and it is these symbols that give you
expressive power not found in text-based approaches. A paper by Ken Iverson
himself, titled "Notation as a tool for thought", covers this:

[http://www.eecg.toronto.edu/~jzhu/csc326/readings/iverson.pd...](http://www.eecg.toronto.edu/~jzhu/csc326/readings/iverson.pdf)

The value of notation has been largely forgotten today due to the fact that
APL suffered a popularity gap of probably twenty years due to the language
being way ahead of technoloy at the time. The language didn't fail. The
hardware of the time did.

This, BTW, is the reason I think J is an absolute abomination. Yes, Ken
Iverson was behind J. The motivation behind going for ASCII and losing the
very symbols that made APL notation so powerful was to attempt to deal with
the limitations of computers at the time. For example, on some systems you had
to manually reprogram the graphic card's character ROM just to be able to
display the symbols.

Iverson sought to make the APL concept more accessible by transliterating to
ASCII when creating J. In doing so he utterly destroyed the very notation he
understood to be critically valuable as a tool for thought.

For more than one reason it would be wrong to attempt to bring APL back today.
The concepts, the approach and a form of notation for the expression of
computational concepts could and should come back.

~~~
bane
There's lots of good reasons for using text for programming. But I've long
thought that, as a medium for expression, it's pretty terrible.

I'm reminded of this constructive review of TempleOS [1] and thinking _YES_
I'd like to have some of those code-writing features.

I'm also reminded of Perl, a language that grows multi-character operator
glyphs at a rate only slightly slower than teenagers invent emoticons. Imagine
if Perl could just be programmed with proper alternative symbol support like
APL.

Having proper symbols might also stop the minor endless war of deciding to use
"=" as both assignment and evaluation.

1 -
[https://news.ycombinator.com/item?id=9681501](https://news.ycombinator.com/item?id=9681501)

------
white-flame
Much of his thrust seems to boil down to being really offended by the cost of
safety of allowing concurrent writes while reads are going on.

When dealing with analysis workloads, his line of thinking can yield fast
runtimes and doesn't bring that much of a programmer burden. However, it's
killingly complex when it comes to live mutating business databases that need
to scale. Even many job-based analysis workloads have a lot of concurrent
mutation in trying to efficiently figure out what's been done and what needs
to be done next when the actual workflow is not a straight pipeline.

I do agree that 'for' loops should be replaced by broader declarations of what
needs to be done, instead of these explicit instructions on how to iterate
through the data.

~~~
wyager
Could you expand on what you mean about for loops?

In the functional world, there are all sorts of precise constructs to operate
on lists (maps, folds, unfolds, scans, etc.), and many versions of these that
support concurrency. That's what I imagine when you say "declarations of what
needs to be done".

~~~
white-flame
The standard "for" structure is necessarily ordered. Either an index is
incremented/decremented, or an iterator object's next() comes in. Since there
are side effects that can affect either the iterator or the index, and
dependent side effects between loop body runs, obeying a "for" loop means
strictly ordered (unless the iterator interface abstracts something else),
serialized execution of the loop body.

Even maps, folds, etc are often defined in terms of what order they process
data. This is useful for early exit, and mutable state dependency between
iterations. But again, this demands serialized, specifically ordered
processing of elements.

Lists themselves are ordered structures. Their generalized interface does not
assume it's just a bag of items easily thrown around in sub-parts, especially
if the linked list nodes are to be modified.

Of course, the above applies to impure functional languages. Purely functional
of course has far more freedoms, but even the Lisps and such ultimately demand
sequential processing for most of their fundamental list operations.

~~~
taeric
Much of the functional world is concerned with defining structures that are
heavily linked to themselves. Tagged data is essentially the norm.

So, it isn't just that maps are defined in terms of car/cdr, it is that the
list itself is defined in terms of car/cdr. The "aha" moment is when you can
see that an array of items in c can efficiently represent a tree structure
where the parent/child relationships are implied, not tagged. The performance
gains from this are impressive.

~~~
reagency
Functional languages have arrays.

~~~
taeric
I did not mean to claim that they do not. My claim is that "much of the
functional..." So, referring to things like cons lists is easy, though that
one is old. Look at the multitude of pointers in any of the purely functional
data structures, though. There is a lot of overhead in building up these
structures.

Look into the Scala (and clojure?) vector sometime. Literally a 32-ary tree.
Makes the algorithm somewhat pleasant to look at, but are we really shocked
when something using a simple array can beat it in performance?

Now, it is a perfectly fair and valid point that optimizing development time
may be preferred. We are allowed contradictary valid points in situations that
are trade off defined.

~~~
justthistime_
I think that no-one will argue that an imperative, transient data-structure
will be faster than a functional, persistent data-structure if you use it
imperatively.

If you use persistent data-structures the way they are meant to be used, they
can and will be faster than a simple array.

Both have valid use-cases, but they are often distinct.

~~~
taeric
You would be surprised at the arguments you see.

Even the one you are making is... tough. Persistent data-structures are not a
sure fire win against mutable versions. Again I find myself refering to the
DLX algorithm. The entire point of which is that it is easier and faster to
permute through different configurations of a data structure then it is to
have all instances instantiated in a persistent fashion.

Does this mean that persistent data structures are without use? Of course not!
Again, you may not be optimizing speed of execution. Which is perfectly
valid!!

------
leoc
I'm just going to leave this here: [http://www-
dev.ccs.neu.edu/home/pete/pub/esop-2014.pdf](http://www-
dev.ccs.neu.edu/home/pete/pub/esop-2014.pdf)

------
Detrus
Would be interesting to see a more detailed explanation of how Q compares to
Redis at least. The low level bits, performance, RAM usage, usage comparison
of the same tasks etc.

~~~
geocar
How detailed do you want to get?

If you're using Python or C or something else to use redis as a server, it's
probably just as fast as Q/KDB, and some informal benchmarking supports this:

    
    
        redis-server &
        redis-benchmark -t set -n 1000000 -c 1 -P 1000000
        0.00% <= 1368 milliseconds
    

v.

    
    
        q)h:neg hopen`:localhost:1234
        q)\t 1000000 h/"a:69"
        1013
    

However I don't think this necessarily a good way to build your system,
because if you're going to do a million of (read something write) every
second, and _then_ do a million of (read store), then you might as well write
it in KDB and just make it (read something store) and save yourself 40% on
your heating bill.

~~~
joebo
It seems KDB excels at queries, joins, and aggregates on large datasets. I
have limited experience with redis, but if that type of code would need to be
written in c/python with redis then KDB may have a performance and
productivity lift. I would like to see an example like that.

Another benchmark is here:
[http://kparc.com/q4/readme.txt](http://kparc.com/q4/readme.txt)

~~~
geocar
Like what?

A program in KDB instead of a program in Python and Redis?

Here's an implementation of a multiuser group chat in KDB:

[https://github.com/srpeck/kchat/](https://github.com/srpeck/kchat/)

and here's one in Python and Redis:

[http://programeveryday.com/post/create-a-simple-chat-room-
wi...](http://programeveryday.com/post/create-a-simple-chat-room-with-redis-
pubsub/)

------
ah-
The great thing about K/Q is it's simplicity and minimalism. It's fast because
it doesn't do very much and doesn't have layers upon layers of abstractions.

I would love to see a derivate of it with proper async support and more useful
error messages.

------
brudgers
PLaSM is an APL like language for describing building architecture (based on
IBM's FL, a predecessor to J).

[http://www.plasm.net/docs/](http://www.plasm.net/docs/)

------
panjaro
>> " Modern code monkeys don’t even recognize mastery; mastery is measured in
dollars or number of users, which is a poor substitute for distinguishing
between what is good and what is dumb. Lady Gaga made more money than
Beethoven, but, like, so what? " Spot On !

~~~
pjmlp
Hence why valley hipsters jump into things like Go.

[http://cowlark.com/2009-11-15-go/](http://cowlark.com/2009-11-15-go/)

------
transfire
OTOH,
[http://cs.ecs.baylor.edu/~maurer/SieveE/apl.htm](http://cs.ecs.baylor.edu/~maurer/SieveE/apl.htm)

------
dkhenry
I think he came to the wrong conclusion. We shouldn't be looking at APL or J
to see what they did right ( surprisingly little ) we should be focusing on
how they failed and making sure we don't fail the same way. Yeah an APL master
might be able to crunch numbers much better then the average Java programmer,
but in reality I can get more number crunching implemented by spinning up an R
based development team then I would ever get done trying to find, train, and
pay an APL based team.

~~~
eggy
Your statement validates his statement. Just because R-based development teams
are in number, does not make quality, and vice versa: APL or J programmers may
be lower in numbers, but you may be getting better quality. The catapult was
invented, and then forgotten, and had to be re-invented. In today's age with
the internet, and the relatively short span of computer science, I must say I
agree that the short-sightedness seems worse. I play with APL/J and I like
Julia and Haskell. To me J (APL) are close to the succinctness and
expressiveness of mathematical formula, and they work naturally with arrays,
no batteries needed.

~~~
dkhenry
It sounds like you both have this ideal of quality that theoretical quality
counts for something, even expressed quality doesn't really matter that much
if you can't bring _quantity_ to bear.

------
trhway
>Comparing, say, Kx systems Q/KDB (80s technology which still sells for
upwards of $100k a CPU, and is worth every penny) to Hive or Reddis is an
exercise in high comedy. Q does what Hive does. It does what Reddis does. It
does both, several other impressive things modern “big data” types haven’t
thought of yet, and it does them better, using only a few pages of tight C
code, and a few more pages of tight K code.

it is exactly exercise in comedy. Hive and Reddis are free. They do for
regular persons and tasks what only $100K/CPU systems of yesterday were doing.
It is like comparing high-professional $100K film cameras of yesterday with
today's consumer 5-10M cameras. A $100K thing which i would never had access
to vs. real cheap/free gigantic expansion of my capabilities. The choice is
obvious to me.

~~~
branchless
Banks have moved to linux from paid o/s yet they continue to shell out for KDB
(and the dev are not cheap either).

Why did they move a whole o/s platform but not from kdb?

~~~
joebo
I assume to keep licensing current to support legacy code. I would be
surprised if much revenue comes from new installs

~~~
gd1
KDB is the tool of choice on new projects. Also, $12.5k a core these days
(last I checked).

------
nickpsecurity
Another person realizing my meme that IT regularly forgets and (when lucky)
re-invents the past. Often in an inferior way. K and J were great tools for
datamining. Well-implemented, too. Hadoop-style efforts could've saved a lot
of time and gained plenty efficiency starting out with such a design. Any
suggestions on best way to solve this? A Wikibooks-like, categorized,
searchable site of all kinds of solutions to various problems, perhaps?

Fortunately we have nice counter-examples with groups such as FoundationDB
doing Hadoop scale with strong consistency. Building on the right, proven
principles can get one ahead.

~~~
rbanffy
What I find slightly disturbing is that people remember history but forget we
got to where we are precisely because of it.

64-bit processors and more than 4 gigs of RAM are relatively new things. Mmap
is useless for datasets larger than the disk you can attach to the CPU
mmapping it or the size of its virtual address space. In 1991 it would be OK
to call a couple billion entries "big data" (although we were yet to invent
the term) but not anymore.

Having said that, the heat dissipation of instantiating objects from their on-
disk representation is probably a relevant part of the refrigeration costs of
any data center. We just have unimaginable computing power available and,
sometimes, we do things the easy way, not the most efficient one.

~~~
nickpsecurity
I agree it's important to consider refrigeration costs. Yet, the rest seems
hypothetical given their site says it scales to petabytes across many
machines, from clusters to grid computing. Their benchmarks are on a diverse
array of machines with some being flash storage arrays and others compute
nodes with 16+ cores & 512GB memory. I have a hard time believing this thing
is going to choke on most datasets.

Licensing and hardware will probably be the biggest costs here.

~~~
rbanffy
> I have a hard time believing this thing is going to choke on most datasets.

It won't, but 512GB is nowhere near "big data". In fact, most datasets we see
should not be called that. My personal rule of thumb is that nothing that can
fit in less than a dozen machines can be called "big data". The whole idea
begun when we started manipulating datasets that were so huge that moving the
data from where it was stored to where it would be processed was no longer
practical.

~~~
nickpsecurity
My comment said "petabytes" of data across many machines. The 512GB RAM was
one node they tested it on. As in, the 512GB is around how much can be in one
node at once rather than what the database can process. If petabytes of OLAP
don't count, I'll concede that this database isn't for "big data."

------
GFK_of_xmaspast
Never read the comments.

("Computer science worked better historically in part because humorless
totalitarian nincompoopery hadn’t been invented yet. People were more
concerned with solving actual problems than paying attention to idiots who
feel a need to police productive people’s language for feminist ideological
correctness.

You may now go fuck yourself with a carrot scraper in whatever gender-free
orifice you have available. Use a for loop while you’re at it. ")

~~~
rbanffy
There can be hope.

[https://github.com/filipekiss/sanity](https://github.com/filipekiss/sanity)

------
frozenport
>>Why have multiple threads tripping all over each other when a query is
inherently one job?

Because these systems are limited by IO latency, this way the threads respond
immediatly when data is available. You dont have the 'stepping' on each other
problem unless they are actually running. Further, threads on the same CPU
socket, even share resources like memory access, even they can step on each
other. You shouldnt be worried about threads and cpu matching if your
computation takes 10ms, but data access is 150ms.

~~~
vezzy-fnord
Or you can multiplex on a file descriptor to asynchronously handle events? Not
sure what exactly you intended to say, or if there's any corner cases in job
processing.

------
woah
Someone should tell this guy about JS's Array.map()

~~~
dang
This is the kind of shallow, dismissive comment we need less of on HN. Please
don't post them.

~~~
pjonesdotca
This.

