
K Language (2011) - lelf
http://www.math.bas.bg/bantchev/place/k.html
======
invalidOrTaken
A story, since in retrospect I think it's worth telling.

Some years ago I was at the SF Clojure meetup. Arthur Whitney's daughter
worked at the sponsoring company, and he agreed to come tell us about K.

In retrospect, I don't think we gave him the welcome he deserved. No one was
rude or anything, but it seemed there was a disconnect: Arthur was keen to
show off how _fast_ K (and Kdb) was, over zillions and zillions of rows.

But the thing that the Clojurists were all searching for (that got them into
Clojure in the first place) was _expressivity_. Is Clojure fast? I don't know,
generally the problems I face come down to avoiding balls of mud rather than
performance bottlenecks. And I think that was true of most there.

So Arthur got something of an underwhelming reception. I remember someone
asking "Does K have the ability to self-modify, a la Lisp macros?" When Arthur
said no, you could see most people in the room just sort of mentally shrug and
move on.

And this was too bad. Because recently I've been playing around with J
(another APL descendant) and been very impressed by some
expressivity/readability benefits. Some small things that have very big
effects on the codebase you actually end up with.

The first thing is the avoidance of abstraction. To use a Twitterism:

Broke: Just write your code and don't divide it into functions, creating one
long main method

Woke: Divide your code up, naming parts that get reused

Bespoke: If your code is made up of _really really short_ things, it ends up
being shorter than _the names you would use_ , so you can just _write the
thing_ rather than your name for it. An analogy would be: there is no human-
comprehensible way to communicate the idea of "picosecond" in less time than
an actual picosecond.

The other thing I didn't expect was the benefit of multiple dispatch being
baked into e v e r y t h i n g. In Clojure I might write (map + a b) to add
each index together; in J I could just write a+b.

This is neat stuff! Best practices for keeping complexity down in APL's tend
to be the _opposite_ of what they are in other languages. Aaron Hsu gave a
talk about this:
[https://www.youtube.com/watch?v=v7Mt0GYHU9A](https://www.youtube.com/watch?v=v7Mt0GYHU9A)

It's too bad! Arthur came to tell us about speed---there's a reason it's used
on giant datasets in _finance_ , where performance translates directly into
cash---but I wish we'd had the presence of mind to ask more about _experience_
of writing K.

So, Arthur, if you're reading this: Sorry everyone seemed kinda bored in SF a
few years ago when you kindly came to present. We missed out!

~~~
kazinator
> _so you can just write the thing rather than your name for it_

I do not find this convincing. Names, especially good names, can convey
additional meaning, which is helpful even if the names are longer than what
they stand for in terms of raw character count.

One obvious example is that I would often rather see a seven-letter manifest
constant than the two-digit decimal literal it represents.

In a similar vein, I would rather have a structure with named fields, than
array indices or offsets. employee.department is objectively better than
e[13]; a compiler can generate the equivalent of the latter from the former.

In calculations that have many nodes in the syntax tree, it's helpful to break
up the tree and give intermediate values names, which remains true even if the
names are longer than the intermediate calculations.

A name may _longer_ than what it denotes in raw character count, but in spite
of that it may less complex due to its flat structure which only indicates
"same or not". If a 13 character identifier stands for some 9 character syntax
that denotes a tree with 7 nodes, the identifier is certainly simpler.

Lastly, different expressions that occur in a program can sometimes be very
similar to each other. If we do not use names, then these differences can be
hard to find. It might seem that identifiers which are longer than what they
denote make the problem worse, but that is not necessarily so. Identifiers
which are clearly different from each other can be selected for denoting forms
that are very similar to each other. If instead of ((<]):) and ((<[):) I write
foolyfoo and barlybar, that will be less error prone.

~~~
smabie
I used to think the exact same things, but after using kdb+/q for AoC2019, I
felt like I had a revelation. When you get rid of all abstraction, your code
size is reduced so dramatically that abstraction is no longer needed. My
solutions were maybe 10-100x shorter than everyone else's (except some guy's k
code I found) and to this day, I can still mostly remember them, character for
character (much like how an equation is easier to remember than a traditional
line of code).

Being able to fit all of your code in your head at once is a truly magical
feeling.

~~~
kazinator
You should give definitions a chance, because they point to a way to make your
code even smaller. Consider that dictionary compression, like LZ77 and LZSS
works by encoding redundancy using symbols that index into a table.

Even APL/J/K programs with repeated patterns in them are further compressible.
Occurrences of any repeated string more than two bytes long can be replaced by
a 16 bit symbol.

Speaking of which, why don't you just learn the data structure and algorithm
of a compression program like gzip inside out? Then you can write compressed
code in any language. Others can easily work with your code by uncompressing
it and then compressing it again when their changes are applied. Or _vice
versa_ ; you can work with other people's code by compressing it first, then
doing your changes in the more convenient, ergonomic compressed format. Maybe
some checkout/commit hooks can do that transparently.

Compression will ferret out gratuitous frivolities like 17 copies of
_remoteDatabaseConnectionHandle_ , and turn them into a convenient, tiny code.

Think of how much smaller it can be if you go beyond reducing tokens to
character. You've now got Huffman and dictionary coding at your disposal, and
possibly more.

------
aasasd
Not sure I actually understand why K is pushed here almost daily. Is it just
the esoterics? Alright, APL and K are expressive for array computation. Do
many people here deal with array computation? K is also supposedly fast—not
once have I seen an explanation of what makes it fast and why those techniques
aren't looted for open-source languages.

I'm told that the interpreter is so small that it fits in the cache. Yeah, so
how much time does a non-K program normally spend loading the code? How does
this free K from loading and writing _data_?

Is K in fact a thin layer on top of assembler (specifically SIMD and other
extensions), for number crunching in the vein of shaders/CUDA?

~~~
kick
_Not sure I actually understand why K is pushed here almost daily._

Most programmers find k interesting. k is fast. k is small. k & APL are great
as a tool of thought. It has a "lost art" feel. It's an intellectually
satisfying endeavor.

 _Do many people here deal with array computation?_

Like three Dyalog employees occasionally contribute to discussion, fintech
employees make up like 10% of the people regularly writing comments on the
site, and array programming is incredibly popular when in disguise and made
less efficient as Python (numpy), pretty much essential for multiple subfields
of computer science, so forth.

 _K is also supposedly fast—not once have I seen an explanation of what makes
it fast_

People talk about it all the time here, though. 'geocar mentions why in almost
every thread, and he should know: he helped write an operating system in the
language.

 _and why those techniques aren 't looted for open-source languages._

Because they don't want to believe it to be true! When told you're messing up
something simple, you have to question your assumptions a bit. People hate
doing this, especially when they've invested time in seeing them through.

 _Is K in fact a thin layer on top of assembler (specifically SIMD and other
extensions), for number crunching in the vein of shaders /CUDA?_

k is higher-level than C, Haskell, Rust, etc.

~~~
aparashk
There are open source implementations of Kx, e.g.

[https://github.com/kevinlawler/kona](https://github.com/kevinlawler/kona)

I do not know how good/performant they are.

~~~
kick
I think everyone in this thread probably knows about kona. It's neat, but it's
also trying to be an implementation for a very old version of k.

Here's a list of a few different k implementations. Most aren't aiming for
performance.

[https://ngn.bitbucket.io/k.html](https://ngn.bitbucket.io/k.html)

------
alexeiz
K/KDB is a pile of shite. It's loved by quants and managers in financial
firms. But it's hated by programmers who actually have to write code in it and
deal with shitty code written by quants. K is full of arbitrary restrictions
(two many global/local variables - what is it, 1970s?) and crazy quirks
(function closing brace has to indented, otherwise you'll get a weird error).
All in the name of performance, of course. The error reporting is insanely
cryptic. It doesn't tell you what caused an error or even where it happened.
It just tells you something like "'rank" \- and that's it. Once your code
grows beyond 10 lines, you can kiss your sanity goodbye.

And let's talk about performance. K is inherently single threaded. If you have
a powerful 96-core machine, it's of no use to you because your K code will be
using only a single core. I've seen numpy easily beating K when numpy uses MKL
library with automatic parallelization.

There is a reason K/KDB is not very popular outside of a certain domain. It's
not because it's proprietary and it costs a fortune. It's just bad.

~~~
rajinl
Having worked a bit with KDB, my experience is totally different. The
investment bank I worked at had a large and thriving KDB community of
developers. In one particular project, we were able to replace a system
consisting of 10's of thousands of fairly good but heavily abstracted java
code reduce it to a few hundred lines of Q. The java application at took 7
hours to run its worst case query and the Q code that ran the same query in
less than a second. The Q code was solved the business case better, it was
more readable mainly because there were no unnecessary abstractions. There was
still a OOP middle tier, but it was mainly pass through. Calling any
technology a "pile of shite" should be done with great care. Most solutions
solve a use case and KDB/K solves its use case exceptionally well. To be
honest, understanding what the 'Rank' error means is not that hard. For the
project I referenced earlier I personally found it much easier to debug Q than
trying to find out what an an AbstractCalculationFactorybuilder was doing.
There are very few poor languages, just poor developers.

~~~
de_Selby
This comment reads like someone who worked with the language very briefly. The
gripes are all things beginners might say and aren't real issues.

This isn't a critique that's really worth giving much thought to. "A pile of
shite"

My guess is that they were a grad with FD for a short period and had a bad
experience.

------
lokedhs
I'm currently working on my own APL dialect. Like the author of K, I decided
to not be bound to the restrictions of APL, and want it to be more functional.
As I am also doing a lot of Lisp, a lot of ideas come from there as well.

The biggest difference in my language (which as of yet does not have a name)
is that it is immutable, and function results are lazy. This means that it can
be highly parallel, and the laziness means that there is much less
synchronisation needed when parallelising computations.

I'm also making some of the syntax more traditional, in order to make large
programs more manageable.

It's implemented as a Kotlin multiplatform project, which means that I can
build the interpreter for the JVM, native and Javascript.

Why am I talking about this? Well, it's because my biggest issue with APL is
that it's not really suited for large programs (where large means on the order
of 100 lines of code or more). Dyalog provides some extensions that make
imperative programming easier, but the syntax is really ugly.

Thus, I'm trying to merge the idea of concise APL syntax together with typical
functional programming. So you'll be able to write code like this:

    
    
        if (foo = bar) {
            a←⍉b
        } else {
            print "debug message"
            a←x∘.+y
        }
    

Even though the core syntax is still APL, the structure of the program is much
easier to understand, compared to the way you'd write it in APL.

I still don't know if experiment will actually lead to something good, but at
least it's a fun project where I can play with Kotlin multiplatform support.

The code is on my Github, but I'm not linking it here since it's quite early,
and it's not really usable yet (there are a lot of functions missing). I'm
expecting anyone who actually looks it up anyway to understand that it's not
ready yet.

~~~
Athas
> The biggest difference in my language (which as of yet does not have a name)
> is that it is immutable, and function results are lazy. This means that it
> can be highly parallel, and the laziness means that there is much less
> synchronisation needed when parallelising computations.

In my experience, it is the other way around. Laziness when implemented as
normally is essentially state plus control flow, which is death to
parallelism. Parallel Haskell is obscure for this reason, and typical
programming patterns use some form of "deepseq" construct whenever thread
boundaries are crossed (explicitly or implicitly). Actually taking advantage
of laziness in a parallel setting is very tricky. Offhand, I cannot recall
anyone who has done so convincingly.

~~~
lokedhs
Yes, that it true. However, functional evaluation isn't lazy. It's the
computation that is lazy.

Consider the case where you have a very large arrays a, b and c (let's say is
contains a few hundred million elements each). You then evaluate the following
in APL:

    
    
        a+b+c
    

What interpreters such as GNU APL does is to first evaluate b+c, resulting in
a new array, and then add that result to a.

In my case, b+c returns a deferred computation. This is then added to a which
returns yet another deferred computation.

The actual values are not computed until I read the results out of the
deferred computation. This means that there does not need to be a
synchronisation point between the two addition operations.

There is also the benefit that if not all results from a computation are
needed, they might not even have to be computed in the first place.

However, the functions themselves are still evaluated, meaning that side
effects are called when you'd expected them to.

Well, there is an exception. Consider the following:

    
    
        {print ⍵ ◊ ⍵+1}¨foo
    

This also returns a deferred result, and this will indeed result in print
being called at an unexpected time. You can force evaluation in this case, but
I haven't decided what to do about this case yet.

~~~
ssfrr
> What interpreters such as GNU APL does is to first evaluate b+c, resulting
> in a new array, and then add that result to a.

Is this really what most implementations would do? I'd assume it would parse
the expression `a+b+c` into a tree like

    
    
         +
        / \
       a   +
          / \
         b   c
    

But then run a pass on the tree to identify expressions like this that could
be fused together without creating intermediate results. I know that most APL
implementations have more efficient implementations of certain idioms
(commonly-used sequences of operators/functions), so I figured they must be
doing something like this to be able to identify those idioms.

This is just conjecture though, I'd love to know more about how this actually
works in practice.

~~~
i_don_t_know
Some implementations use reference counting. So in your example, for + the
output requires as much memory as the inputs, and if at least b or c have a
ref count of 1, then you can reuse one of the inputs for the output. Otherwise
you have to allocate memory to hold the sum of b and c. But that intermediate
sum has a ref count of 1 and can be reused for the final output.

The key is to code your primitives in such a way that you can reuse one of the
inputs as the output.

You can also do clever tricks that describe the array transformations rather
than perform them. It’s similar to what happens during stream fusion in
Haskell. See
[https://news.ycombinator.com/item?id=17506789](https://news.ycombinator.com/item?id=17506789)
for more details.

~~~
beagle3
IIRC numexpr got about x2-x5 faster when they switched from numpy's
implementation (which naively mostly works as you describe, one op at a time,
refcounted, though no good reuse) to a VM that reuses temporary result memory
and works in 8KB pages (thus getting way, way better cache use).

But I'm not familiar with an APL/J/K implementation that does.

~~~
i_don_t_know
I didn’t know about numexpr. That looks really interesting. Thanks.

------
smabie
I love K/q, but there's some trade-offs (probably for speed) that I find
extremely aggravating. The biggest sin of all is the lack of lexical scoping.
If k/q had lexical scoping and macros, it would probably be my ideal
programming language.

Regardless, the philosophy behind the language is so different, and so
incredible, I had the most fun I've had in a long time programming when I
learned it. kdb+/q is a work of art and probably the best engineered piece of
software I've ever come across.

~~~
beagle3
All Ks have basic module level lexical scoping, which does not deliver all the
benefits of lexical scoping, but has none of the pitfalls of dynamic scoping
either.

K2/K3 did have (a version of nested) lexical scoping - an inner function would
close over the values _at the time definition was encountered_ ; you can still
simulate this in later Ks by passing what you need to close over and
immediately projecting over existing values, e.g.

    
    
         make_adder:{[n] {[n,x] n+x}[n;]}
    

Which defines an inner two-parameter function, and then projects it over the
outer n parameter (this is how K2 implemented it IIUC). But that doesn't let
you modify names in an upper scope, so is not really comparable to Lisp's
lexical scoping or Lua's upvars and Python's nonlocals or whatever they are
called these days.

Andrey Zholos' kuc[0] dialect has a JIT and real closures, you might find that
interesting, though it hasn't seen an update in a while.

> probably the best engineered piece of software I've ever come across.

As you can probably tell, I'm a K fan and evangelist myself. Yet personally I
cringe at some of the kludges. I love K. It's awesome, it's amazing, but I
wouldn't use the "best engineered" label myself; "Best set of tradeoffs" would
probably be my description (which is a distinction without difference to some,
I recognize - those who define engineering as the art and craft of the
tradeoff).

[0] [http://althenia.net/kuc](http://althenia.net/kuc)

------
kick
The RPN calculator example makes me really happy.

As for the content of this page, there's a line within that now applies to
itself:

 _Introductory articles – somewhat dated but nevertheless rather informative:_

k4, what it lists as the most recent version, is about 20 years old now, and
it talks about it like it's new. Still informative, though!

EDIT: Giving it another look, it looks like the links have been updated over
time (for example, the reference to oK), though the main content of the post
is about ten years old and the initial list of links is over ten:

[https://web.archive.org/web/20110927003800/http://www.math.b...](https://web.archive.org/web/20110927003800/http://www.math.bas.bg/bantchev/place/k.html)

------
aw1621107
I posted something similar to this on the last k-related thread that came up,
but new responses and perspectives are always nice:

Lots of the discussion about the disadvantages of k/q/other APLs seem to
revolve around its syntax and mental model. If said barriers were removed
(e.g., if the relative obscurity of the syntax and semantics were not an issue
and you could assume that you/your team/anyone who looks at your code is
familiar), when would I _not_ want to use k/q/APL/etc.?

GC'd languages may not be good for latency/memory-sensitive programs. C/C++
may not be good for reliability/security-sensitive programs.
Ruby/Python/Javascript may have performance issues. Haskell may have issues
around predicting resource usage. Lisp may have issues around being too
expressive.

What issues may I run into that would make k/q/etc. not a good choice?

~~~
beagle3
“Too expressive?”

All APl variants I know are GC’s so what you wrote applies (though refcounted
and cycle free so deterministic, and well defined resource consumption)

They are mostly functional, so as usual care must be taken with memory usage
when updating data structures.

If your workload does not lend itself to APLization, your program will not
enjoy any of the benefits, of course. But in my experience most projects have
very significant parts that do.

~~~
aw1621107
"Too expressive" is suboptimal wording on my part; the intent was to cover the
criticism that Lisp is flexible enough that Lisp programs tend to evolve their
own potentially programmer-specific DSLs for the given problem domain.

I guess a potentially more interesting question would be how I could identify
workloads that APL-alikes would or would not work well for. In particular, are
there resources where I can learn more about ways to translate problems to a
format that lends themselves to APLization? One example is the substring
search problem; I know about naive string search using loops, Bayer-Moore,
etc., but I don't think I would have been able to come up with the APLish
version of match-char-of-needle-in-haystack, rotate, and AND. Is this just
something that gets picked up as one does more work with APL-alikes?

~~~
beagle3
> Is this just something that gets picked up as one does more work with APL-
> alikes?

I suspect that it is, much like picking up SQLisms - there are many things
you'd do in SQL[0], such as SELECT AVERAGE(x>3) FROM T instead of SELECT
(SELECT COUNT( _) WHERE x >3 FROM T) / (SELECT COUNT(_) FROM T) FROM DUAL, the
latter would be a result of translating Javaism to SQL (though, that is still
more SQLish than iterating through a cursor, which would ACTUALLY be
equivalent to doing it in Java with a for loop)

The APLisms that I've picked, and carried to other languages (which makes my
programs usually much shorter and faster, though less idiomatic) are:

1) Process things in stages - which processing a list, instead of trying to
process each element from beginning to end, try to do one stage on each
element of the list, and only then go to the next stage; This often requires
rethinking e.g. error handling, but on the other hand, stages tend to become
generic, commutable, cachable, etc.

2) not independently, grouping is an important stage: having a stage that
(solely) decides which element needs to be processed as a part of which group,
and label it accordingly - that allows the next stages to run on different
groups in parallel, for example, and otherwise tends to make resource use much
better and deterministic.

3) not independently, almost everything can be processed using linear memory
access patterns almost exclusively. Gather/scatter is often needed, but can be
put in well defined points and optimized accordingly (that's probably the main
K speed engine - L1 dominates much better than in most languages, because
everything tends to be sequential, and even gather/scatter are delimited
enough to often prefetch)

I suspect the only way to learn it is to do it.

[0] In this case, the Javaism may be significantly more efficient if the
optimizer is stupid and x has an order-capable index; but the idiomtic shorter
SQL is the first one, and in most cases the added conditions and real
computation make the Javaistic SQL less efficient.

~~~
aw1621107
> I suspect the only way to learn it is to do it.

Guess it's time for me to go looking for my advent of code repo...

Those APLisms you list there sound really useful. Some questions about them:

1\. Is a shell pipeline an appropriate analogy for what you mean by processing
things in stages? And if so, do you have any tips for deciding how big to make
each stage?

2\. Is this sort of like submitting jobs to a thread pool in a more controlled
manner? Is this something that APL-alikes tend to handle better than other
languages?

3\. What do you mean by gather/scatter here?

~~~
beagle3
> 1\. Is a shell pipeline an appropriate analogy for what you mean by
> processing things in stages? And if so, do you have any tips for deciding
> how big to make each stage?

Yes, it's a good analogy and mental model, but not good experience to draw on,
because the primitives are on such different levels. An example off the top of
my head: Checking if an expression has balanced parentheses. The way you would
do it in (not $APL and not $PARSER_GENERATOR) is to go character by character,
increase a counter for '(', decrease it for ')' and bail out if you ever go
negative. In K you'd do:

    
    
        open: x='('          / vector of 1 when '(', 0 otherwise
        close: x=')'         / vector of 1 when ')', 0 otherwise
        change: open-close   / vector of +1 when '(', -1 when ')', 0 otherwise
        running: +\change    / running sum of change
        balance: *|running   / last element - "first of reverse of", this is an idiom in K
        dipped: &/running    / minimum over running sum of changes
        good: (balance=0)&~(dipped<0) / we balanced at the end, and no close before open at any point.
    

You can't really break things this fine in pipes, and the "one input one
output" ergonomic used in pipes tends to restrict the data passed between
stages (open and close are two "streams" in the pipe model - you could use
more numeric descriptors, but then you lose the elegant pipe structure - and
people rarely ever do that).

And after you've grokked it, you'd just write it as two expressions:

    
    
        b:*|r:+\(x='(')-(x=')')    / r:running balance; b:balance at end
        g:(b=0)&~(&/r)<0           / good if balanced and no close-before-open dip
    

Which through time, you'd likely prefer to write as the one liner function

    
    
        balanced:{b:*|r:+\(x='(')-x=')';(b=0)&~(&/r)<0}
    

That almost everyone on HN would claim is an unreadable abomination, but would
still be perfectly readable to you five years down the line after one minute
of reflecting. And then you'll see that Arthur Whitney's function achieving
the same is twice as short, five times as fast, but may take you 5 minutes to
grok. (Yes, I can write it shorter than this, but I won't).

> 2\. Is this sort of like submitting jobs to a thread pool in a more
> controlled manner? Is this something that APL-alikes tend to handle better
> than other languages?

I don't know about Dyalog, but the underlying data model and K implementations
do threads and processes quite easily (K threads have more sharing limitations
than you are used to from other environments, though - but processes are just
as easy)

the = (group) operator returns a dictionary from key to indices; e.g.

    
    
        group:="yabba dabba doo"
        group
    
         |5 11
        a|1 4 7 10
        b|2 3 8 9
        d|6 12
        o|13 14
        y|,0
    

You can now process the letters independently using "each", e.g. "#:'. group"
read 'count each value of group"

    
    
        #:'. group
        2 4 4 2 2 1
    

Want to do that in parallel? Switch from each (') to parallel-each aka peach
(':):

    
    
        #:':. group
        2 4 4 2 2 1
    

Except this time it was done on multiple threads (or processes on the same
machine, with some setup, or even on multiple machines). I think this
qualifies as "better than most languages", and is enabled by (a) being mostly
functional, so synchronization is rarely needed, and (b) making all loops
implicit - "each" instead of explicit "for". Order of iteration is NOT
guaranteed, and unhandled exceptions can only abort the entire computation. In
return you get parallelization that actually works well with a single
character change.

> 3\. What do you mean by gather/scatter here?

Let's say we want to implement "uniq" for letters ourselves (even though it is
built in; written "?" and called "distinct" these days. Remember our "group"
from before? We could take the keys - !group would give that (ordered
lexicographically in this case, but that's not always the case). But let's say
we want it in order of appearance - we need to take the first of each list, so
{x@*:'.=x} - x at the first of each index list of the grouping of x. The "x
at" accesses memory non-sequentially (it does in ascending order in this case,
but random in general). This is a "gather" operation because we gather data
from many places, and if the indexes read are random, is the worst case use
for cache systems. By making it a separate stage, it is much easier to
optimize - for both the person, and the interpreter/compiler.

The "scatter" is the opposite. Want to do counts of each letter?
"c:@[&256;0+"yabba dabba doo";+;1]" read: "in an array of 256 zeros, in each
index you get from converting the string to ascii, add one. (From which you
can get the lex sorted range with "`c$&c>0" : convert to char those indices
where c is greater than 0). But the updates are scattered inside the vector.
Do note that grouping allows the "+1" for different groups to happen in
parallel without more help (though you'd have to explicitly parallel-each it
with the current implementation)

Have a look at [https://kcc.kparc.io/](https://kcc.kparc.io/)

~~~
aw1621107
Your explanations really help! I think I can see where k/APL-alike proponents
are coming from; the semantics make sense, and it'd come down to recognizing
what symbols or group of symbols do. It might take a bit of going back and
forth with a reference, but it isn't impossible.

Yet another question: how would performance analysis of this function work?
Does the interpreter actually materialize arrays for stuff like (x='(')-x=')',
so you'd have several iterations over the entire input array?

Again, thanks for your great explanations! I'll definitely put k on my list of
languages to explore in my free time.

~~~
beagle3
Yes, almost all current interpreters and compilers do materialize the results
and making whole passes through the array in such a case. In practice, since
the primitives (e.g. “=“) are implemented in tight, predictable, branch free,
often SIMD code, it ends up faster than the obvious branching C-like routine
despite the interpreter overhead - especially since any strung under 16K is
essentially guaranteed to be in L1 for the 2nd pass (and even if it’s not -
it’s predictable linear prefetched access even in the first pass, almost as
good).

When was the last time you wrote prefetching branchless SIMD code for any task
other than your inner loop? In APL/K you pay interpreter and GC overhead, but
often get those perks for your whole program “for free”.

Indeed, k/APl make use of your visual pattern matching in ways that no other
language does (or can, with standard practices, due to code density). Some APl
compilers identify idioms and replace them with a faster implementation (e.g.
sum which is +/ is done by essentially all, more complicated things like
“average” by some); but the cognitive load is independent of that - you don’t
need a manual because a semantically equivalent implementation is in front of
your eyes.

~~~
aw1621107
The closest thing I can think of to prefetching branchless SIMD code was some
numpy scripts I did some work with, but I'm not sure it would be quite as
tightly optimized.

It's really quite fascinating, and I'll have to see if I can handle the
pattern matching as well as the more experienced APLers here. Thanks for
taking the time to answer my questions!

~~~
beagle3
You’re welcome.

------
aw1621107
Are there any papers or such that have any details about what allows k/q to be
so fast? Or would those be considered implementation secrets?

It's always nice to be able to read about the technology and design
underpinning something, but in this case at least it'd be understandable if
that were not available.

------
leggomylibro
These languages come up here every now and then; they look very interesting,
but it also seems like they must be very heavily optimized to be as fast as
they are.

So, sorry if this is a dumb question, but does K run on architectures other
than x86/64?

I checked [https://kx.com/](https://kx.com/) but I couldn't find a definite
answer. They mention IoT a lot, which makes me think it might run on embedded
devices too?

~~~
SifJar
kx also have a version for ARM (available here
[https://kx.com/download](https://kx.com/download) ). Note this is the non-
commercial version, not sure if they have a commercial version for ARM.

Their frequent mention of IoT is (I think) more to do with storing data _from_
IoT devices, not directly on the devices themselves.

Shakti (Arthur Whitney's current company) has also mentioned a wider support
of architectures I think, but at the moment their k (k9) is in pretty early
stages, still x86/64 only I think.

There are various other open source implementations of k (posted elsewhere on
this HN thread), some of them might also support other architectures (although
they won't have the performance of Kx/Shakti versions).

~~~
leggomylibro
Ah, that makes sense - thanks. I guess that's just for Cortex-A ARM cores? I
was hoping to try a lean language like this on a fast micro like the new
Teensy boards, but I suppose those kinds of chips probably aren't used very
much in their target applications like HFT.

I do wish that people would stop using "IoT" to mean "sending tons of data to
the cloud", but I guess that's how the buzzword crumbles.

------
aparashk
Good to see a link from the Bulgarian Academy of Sciences! I did my Masters
thesis there.

------
ngcc_hk
Given such small footprint can it run in microprocessor level or a bit larger.
Or fpga implementations?

~~~
tluyben2
At Shakti they are working on that; there are quite a lot experimentations
with embedded boards in the (small) community of people who work on the core.
They might provide ARM compiles (they already work) so it should work on quite
a range of devices; however you do need somewhat more memory than for instance
I am used to work with (MCU's with 23 kb usable memory); I think it needs at
least a few 100kb to do anything and (probably) a few meg to do actual work
with it. It'll improve.

------
Russelfuture
This is an interesting thread - thx for posting. I used to know a bunch of
folks that used both J and K. I still use APL - the old APLWin 3.5 from
APL2000 (Manugistics/STSC).

APL was the first language I learned as a tiny child. I write my pseudo code
in it. I live in a ranch house out on the tundra, with a 50ft transponder
antenna behind the place which links me to the 'net. I've used C, c++, Python
with all the numpy stuff, etc and a few flavours of TensorFlow.

But it's APL that gets it done. I've heard of K, and I knew a guy who worked
for Arthur Whitney. Languages like K and J and APL can be fast, if you use
them right. And it is almost tragic-comical, that the genius of the APL
glyphic character set was given up, just as wonderful devices were created
which could paint any character at all on the screen. My girlfriend uses
Japanese Kangi and Hirogana on her iPad, no problem. In a GUI environment, the
original APL characters are easy and clear - no need for silly ASCII
overloading.

See, I have these simple GUI apps, written many years ago in Windows, using
APLWin 3.5. They still run, and I use them daily. They keep track of financial
time series data, and I can use them in a variety of ways. The APL stuff can
convert tables of various formats into conformable vectors, which can be
transformed into input for GNUplot or TensorFlow or even (my favourite),
Hinton's original Xerion product, which I have running on some Linux boxes.

APL is simply a wonderful, easy, useful language that can allow one single
person to support some mission-critical production-grade software. Example:
This AM, I discovered that all the data tables that my data-table downloader
had got last nite, had a new format, with a zero-row of data, and today's
date.

This meant when I updated the database of time series data, I got tables where
every series had a bogus, zero-valued most recent date.

It was 7:30 AM, and markets would open at 9:30. A bit of code inspection, and
a simple fix: Update the MERGETABS function with two lines of code:

    
    
           ZEROROWS <- 0 = +/PNEW[;(1 downarrow iota 1 downarrow rho PNEW)]
           PNEW <- ( tilde ZEROROWS)/[1] PNEW
    

And this solved the problem. Update the runtime version of the workspace by
changing QuadLX latent expression to start the GUI when the W/S is loaded, and
roll out the new W/S to a few different boxes. All done before Mkt open.

The code above just takes the new table PNEW, selects the data columns, finds
those that are zero valued and removes them using a boolean vector reduction
along the highest dimension (the table's rows). Easy peasy. Done quick, and
done right.

With my data updated and correct, I run some stuff against it, and get the
courage to hold my positions - concluding it will be an up day, as the stuff I
run helps me make that decision. I close the day up, just under $40,000 to the
good. (Crazy day...)

And oh, btw, the APL stuff runs on both Linux and Windows boxes - and the
Linux is both 32-bit and modern CentOS 64-bit. I still run the old 32-bit
boxes because they are very fast and very reliable. The APL stuff runs on the
Linux boxes under WINE, which works amazingly well.

It's old school stuff, but it just works. APL is the magic that lets me do the
other magic, that pays the bills. I gather that K can provide something
similar for big corporate environments, when the datasets and timeseries get
crazy large. You can write anything in any language - but one must not lose
sight of the objective, which is to make the business side of things work
well. One can write APL - or any language - in a way that is unclear and
confusing. But you don't have to do so. If you are careful, you can build APL
solutions that are fast, easy to maintain, and smoothly flexible. Maybe this
is possible in K as well?

