
Arthur Whitney's One-page Interpreter (1992) - chrisdew
http://www.jsoftware.com/jwiki/Essays/Incunabulum
======
PhasmaFelis
Interpreter of what? "J", apparently, from context and inference, but that
should have been in the post title.

What's an "interpreter fragment"? Most of the Google results for that term
point to the same story.

I know this is Hacker News and everyone is supposed to know everything about
everything already, but on obscure topics it's nice to take a moment and
explain what you're talking about.

------
chipsy
When code is bulky, you can excuse your ego because your visibility is limited
in a literal sense. But when it's dense like this you actively know, as you
try and fail to skim it, that you can only usefully focus on a small segment
at a time. So you blame the code instead of yourself.

~~~
ejk314
I think it's fair to blame the code in this case. It's not written at all for
readability (by anyone other than the author while he was writing it). Of
course he knew what those 1 or 2 letter variable names meant when he came up
with them, so it's a lot easier to load into memory. But I doubt Whitney could
easily coming back to this code after any length of time.

~~~
theoh
He is famous or notorious for rewriting from scratch every few years. So, your
impression sounds plausible.

~~~
avmich
You think Arthur doesn't look to his code back for those few years between
rewritings?

~~~
ksherlock

        BC: Do you ever look at your own code and think, "What the hell was I doing here?"
    
        AW: No, I guess I don't.
    

Of course, maybe he was answering the first 8 words.

[http://queue.acm.org/detail.cfm?id=1531242](http://queue.acm.org/detail.cfm?id=1531242)

~~~
theoh
Logically, he could be answering the entire question truthfully, except that
he doesn't look at old code at all. He'a clearly somebody very focused on the
current "working set", with his rewrites and one-page files. As opposed to the
kind of programmer who curates a sprawling historic code base.

------
stiff
I am at a vacation and bored, so I decided to try to actually understand this.
I do not know much APL/K/J, but for a start, here is a version with more
traditional line breaking:

[https://gist.github.com/anonymous/f72e5c4a432492abce59](https://gist.github.com/anonymous/f72e5c4a432492abce59)

Some basic observations:

\- He uses all the obscure C features, like the ability to not declare the
type of an argument or return value and let the compiler fill in, and also the
old school style of declaring functions:

    
    
      foo(x,y) int x, y { return 0; }
    

Instead of:

    
    
      int foo(int x, int y) { return 0; }
    

Since the compiler will infer the int, and since he uses R for return this
example eventually becomes:

    
    
      foo(x,y){R(0);}
    

\- Variables, or really registers, are only accessible as letters from a to z,
and the "st" array stores the values of all registers. Numbers entered in the
REPL have to be between 0 and 9, and hence he avoids the dirty work of making
a proper lexer. It's also super easy to trigger a segfault since any error
handling is non-existent.

\- DO(n,x) is a C macro that evaluates the given expression "x" for all
numbers between 0 and "n"

\- V1 is a C macro that defines unary operators for the interpreted language,
and V2 defines binary operators. In V1 definitions the operand is called "w",
in V2 the operands are called "a" and "w".

\- For example ",", which calls the cat function, is a binary operator that
creates vectors:

    
    
      1,2,3,4
      4 
      1 2 3 4
    

\- The vt, vd and vm arrays map ascii symbols to the functions defined with V1
and V2. { is the second symbol in vt, so when used as a unary operator it
calls "size" (second non-null element of vm):

    
    
      {5,6,7,8
      
      4
    

and when used as a binary operator it calls "from" (second non-null element of
vd).

\- wd is a parser that goes from the original input string to a weird
intermediate form that is an array of longs. Each input character gets mapped
one-to-one to an item in this intermediate form.

    
    
      If the input character was a number between 0 and 9:
        Value type instance gets allocated
        Intermediate form for this input character consists of the address of the allocated instance
    
      If the character is a letter between "a" and "z":
        Intermediate form consists simply of this character
    
      If the character represents an operator
        Intermediate form consists of the index of the operator in the vt array
    

In other words, the intermediate form is an array where some elements are
ascii characters, others are memory addresses and yet other indices into some
array. This part is really something.

\- The ex function executes the intermediate form. Since everything in the
input is fixed length, and there is no syntax checking, it just indexes into
the intermediate form assuming everything is well formed, while the parser did
not check that so it's not really guaranteed - again a source of easy
segfaults. The execution goes from left to right and consists of looking at
the first position in the intermediate form and then making recurrent calls if
necessary (let X be the current item in the intermediate form):

    
    
      If X is a character
        Lookahead one item
        If it is a '=' char
          Assign the result of executing everything after the '=' to the register indicated by X
        Assign to X the value of the register named by X
      If X is not a character and is a small integer
        We are applying a binary operator
        X is the index into the "vm" array
        Fetch the function from "vm", apply it to the result of executing the rest of the intermediate form
      Otherwise:
        If there is any more input remaining other than the current item, we are applying a binary operator
        Lookup the function in "vt", apply it to the result of executing the intermediate form to the left and to the right of the operator
    

\- I have the biggest problem with understanding that "a" struct, that
represents all values in the interpreted language, which are arrays. ga is
clearly the basic allocation function for it, "plus" obviously adds two
arrays, so it's clear the "p" field holds the actual contents, but that's
where things get very shady.

~~~
avmich
Didn't look at this piece, but more recent J interpreter is here -
[https://github.com/openj/core](https://github.com/openj/core) . I think the
main idea for "a" struct (it is here -
[https://github.com/openj/core/blob/master/jt.h](https://github.com/openj/core/blob/master/jt.h))
is that it has "rank" \- the single integer which is the number of dimensions,
then array of dimensions - "shape" of length equal to "rank", then "data" part
which has the size equal to multiplication of all elements in "shape".

Edit: the structure is at
[https://github.com/openj/core/blob/master/jtype.h](https://github.com/openj/core/blob/master/jtype.h)
.

~~~
stiff
Thank you, I was wondering about the purpose of the tr function, which
calculates the product of all elements in the vector, now it's clear that it
goes from the array of lengths in each dimension to the total "flat" number of
elements.

So the "r" field in the "a" struct is the number of dimensions of the array
that "a" holds, and the "d" field is the array that holds the length of the
array "a" in each dimension.

------
nine_k
This is a perfect illustration why normal people (like me, unlike Arthur) have
problems working with his apparently brilliant creations, such as the k
language. The amount of information required to efficiently read (let alone
write) such texts just does not fit into normal people's working memory. One-
letter names and the absence of comments don't help either.

(WRT weird names: IIRC, some time ago kdb featured a number of functions named
with digits, like '2' being the function to connect to a socket or something.)

~~~
yang_guo
As a guy with a terrible memory, I think of it the opposite way. For Q (and
more so K), the actual number of distinct items to remember is very small, as
basically anything complex is created by composition of the simple
functions/variables. This way you don't need to remember nearly as many
libraries/variables/function names, and also the function itself becomes its
own description.

That being said, for more production-heavy applications and interfacing with
other languages, variable/function naming could be more descriptive :)

~~~
icsa
Agreed. The small(ish) vocabulary of operators makes K easier for me to use.

Would you rather write/read:

DIVIDE X BY 5 GIVING Y

or

y=x/5;

The first is COBOL (designed to make code easier for "normal" people to read.
The second is C (which looks more like the math that we learn in grade
school).

k/APL/J simple moves further in the direction of math.

As a far as descriptive names go, in k, names can always be aliased with
longer names (assuming no conflicts). I do this when exposing a library k code
as a web interface.

------
golemotron
I want to offer a bounty for a version that compiles with a contemporary C
compiler. There's some magic going on with longs being treated as pointers
here.

~~~
avmich
That's a rather normal situation - when your long is 32 bits long and the
pointer is also 32 bits long, to satisfy a strict C compiler you just need a
cast - which will be a no-op in machine code. This approach was used quite a
bit...

~~~
cplease
Still segfaults with gcc -m32. There's some seriously nonportable,
noncompliant type abuse going on in there.

~~~
FullyFunctional
Try 3+5. Worked for me. I think it's very rudimentary and doesn't handle very
much.

I agree, I would very much like a ansified very that doesn't crash on
unsupported features, but still in this style and in a single file.

~~~
golemotron
3+5?

~~~
FullyFunctional
My point was that very few things work and that was an example. Multi digit
numbers will fail. Here's a few examples that doesn't segfault:

3+7

a=3+7

a,a,a

b=a,a,a

{b

b+9

p=2,3,5,7

2{p

{p

~8

<~8

#~8

4#3

c=b#2

c,c

------
FullyFunctional
This transcript of Roger Hui's talk casts more light on the general idea:
[http://archive.vector.org.uk/trad/v094/hui094_85.pdf](http://archive.vector.org.uk/trad/v094/hui094_85.pdf)

It's interesting that what he calls "Parsing" is more typically known as term
reduction by pattern matching in the functional world.

------
marktangotango
He "cheats" by putting multiple statements on one line:

noun(c){A z;if(c<'0'||c>'9')R 0;z=ga(0,0,0); _z- >p=c-'0';R z;}

would be

    
    
      noun(c){
       A z;
       if(c<'0'||c>'9')
         R 0;
       z=ga(0,0,0);
       *z->p=c-'0';
       R z;
      }
    

In this forms, it's still cryptic, but not quiet as inscrutable.

~~~
phpnode
Except this isn't him "cheating", this is his preferred coding style, for
readability!

~~~
avmich
To avoid scrolling, which he says he hates.

------
ha292
Isn't it wonderful that many wall st firms use k/q to write some of the most
complex analytics ?

Of course, it is understandable. NOT.

~~~
avmich
[http://kaleidic.com/2009/composing-j-in-my-
mind/](http://kaleidic.com/2009/composing-j-in-my-mind/)

"J is conducive to thinking about computation because a lot is expressed by
combining even a few components. With J it is natural to think in code, not
just about code."

------
cplease
Complete shit. C was a mature language in 1989. Even for banging out a quick
hack, and forgiving the gets into a 99 byte stack array, this is crap by any
standard.

I can compile and run pretty much any K&R code from the C Programming Language
in 1978 with no problem. Finding the appropriate compiler, I can do the same
with BCPL, the Language and its Compiler (1981). And the code is actually
understandable.

This, on the other hand, is an obfuscated mess that segfaults on any input, J
or otherwise. I'll be damned if I spend 5 minutes debugging it, ex() segfaults
with infinite recursion. It's not worth the trouble.

Not saying this isn't worth sharing as a curiousity or an artifact, but this
is not a work of genius and it is not defensible or to be emulated. Do your
colleagues a favor and don't write like this.

~~~
avmich
> Do your colleagues a favor and don't write like this.

I'd second this - in any normal situation.

If you're really aspiring to form the future, strive to be extremely
productive, demands that from your colleagues - that's another thing.

Sigh - industry as a whole isn't nearly on the level to use those advanced
tools.

~~~
cplease
>Sigh - industry as a whole isn't nearly on the level to use those advanced
tools.

Is this a parody?

Is J a language for writing personal code with personal meaning only that you
can only run in your head?

There is nothing professional about this. Sounds more like a mental illness.

~~~
avmich
I once needed to develop an algorithm for a underspecified task. Something
like "generate a real-looking map of the floor - with walls, windows, doors,
corridors..." . Took me 40 minutes to try several variants in J. Then some 3
hours more to complete C# code. I don't know how long it would took if I'd
experiment directly in C#. Probably not too long - but iterations would surely
be more painful; too much boilerplate code.

In my experience, J is closer to thought process than many other languages.
Really. You just have to try - and get used a little bit.

~~~
cplease
There's a reason that APL never gained a foothold and J has not been adopted
by software engineers. You can program at the speed of thought in TECO as
well, but the reasons for its limitations and peculiar history vanished long
ago.

There are far better tools available these days for rapid prototyping;
interpreted languages with REPLs and libraries for just about everything--
without resorting to a write-only line-noise language and actually ending up
with a usable product.

For that matter, I suppose it's possible to write understandable, maintainable
TECO, APL and J with some effort if you don't value obscurity and cleverness
over simplicity, correctness, maintainability and understandability.

If your business logic or product is written in line noise written by some
lone J genius, your company is fucked when your guy inevitably gets hit by a
bus, goes nuts, or leaves for greener pastures.

~~~
icsa
The same could be said for writing a document in Chinese, Arabic or Hebrew.
They all look like "line noise" until you make an effort to learn them.

"For that matter, I suppose it's possible to write understandable,
maintainable TECO, APL and J with some effort if you don't value obscurity and
cleverness over simplicity, correctness, maintainability and
understandability."

Yes. I do separate long functions in k onto separate lines. However, long
functions in k are rare (and usually) an indication that I haven't arrived at
a simple solution.

Writing in J/k/APL allows to think in chunks where the chunk is an algorithm -
not all of the tiny steps in an algorithm. It is a higher level of abstraction
- not unlike programming in C versus assembly language or machine code.

E.g.

The sum of a list (of any type) in k (APLis similar):

sum:+/x / read "plus over x" or "reduce x with plus"

versus (C code fragment) which computes the sum of integers only:

int sum=0: for(int i=0; i<n; ++i) { sum = sum + x[i]; }

Btw, John Von Neumann once chastised a grad student for writing an assembler.
Perhaps real software engineers only write in machine code.

"If your business logic or product is written in line noise written by some
lone J genius, your company is fucked when your guy inevitably gets hit by a
bus, goes nuts, or leaves for greener pastures."

I think the larger issue is documentation. Undocumented code (of any kind) is
a liability. When I write in k, Every line of k is documented (as text past
column 80). In addition, there is header documentation that described the
purpose and approach of the code in the file/module. The number of lines of
code in a file is typically no more than 10 lines.

