Hacker News new | past | comments | ask | show | jobs | submit login

I am at a vacation and bored, so I decided to try to actually understand this. I do not know much APL/K/J, but for a start, here is a version with more traditional line breaking:


Some basic observations:

- He uses all the obscure C features, like the ability to not declare the type of an argument or return value and let the compiler fill in, and also the old school style of declaring functions:

  foo(x,y) int x, y { return 0; }
Instead of:

  int foo(int x, int y) { return 0; }
Since the compiler will infer the int, and since he uses R for return this example eventually becomes:

- Variables, or really registers, are only accessible as letters from a to z, and the "st" array stores the values of all registers. Numbers entered in the REPL have to be between 0 and 9, and hence he avoids the dirty work of making a proper lexer. It's also super easy to trigger a segfault since any error handling is non-existent.

- DO(n,x) is a C macro that evaluates the given expression "x" for all numbers between 0 and "n"

- V1 is a C macro that defines unary operators for the interpreted language, and V2 defines binary operators. In V1 definitions the operand is called "w", in V2 the operands are called "a" and "w".

- For example ",", which calls the cat function, is a binary operator that creates vectors:

  1 2 3 4
- The vt, vd and vm arrays map ascii symbols to the functions defined with V1 and V2. { is the second symbol in vt, so when used as a unary operator it calls "size" (second non-null element of vm):

and when used as a binary operator it calls "from" (second non-null element of vd).

- wd is a parser that goes from the original input string to a weird intermediate form that is an array of longs. Each input character gets mapped one-to-one to an item in this intermediate form.

  If the input character was a number between 0 and 9:
    Value type instance gets allocated
    Intermediate form for this input character consists of the address of the allocated instance

  If the character is a letter between "a" and "z":
    Intermediate form consists simply of this character

  If the character represents an operator
    Intermediate form consists of the index of the operator in the vt array
In other words, the intermediate form is an array where some elements are ascii characters, others are memory addresses and yet other indices into some array. This part is really something.

- The ex function executes the intermediate form. Since everything in the input is fixed length, and there is no syntax checking, it just indexes into the intermediate form assuming everything is well formed, while the parser did not check that so it's not really guaranteed - again a source of easy segfaults. The execution goes from left to right and consists of looking at the first position in the intermediate form and then making recurrent calls if necessary (let X be the current item in the intermediate form):

  If X is a character
    Lookahead one item
    If it is a '=' char
      Assign the result of executing everything after the '=' to the register indicated by X
    Assign to X the value of the register named by X
  If X is not a character and is a small integer
    We are applying a binary operator
    X is the index into the "vm" array
    Fetch the function from "vm", apply it to the result of executing the rest of the intermediate form
    If there is any more input remaining other than the current item, we are applying a binary operator
    Lookup the function in "vt", apply it to the result of executing the intermediate form to the left and to the right of the operator
- I have the biggest problem with understanding that "a" struct, that represents all values in the interpreted language, which are arrays. ga is clearly the basic allocation function for it, "plus" obviously adds two arrays, so it's clear the "p" field holds the actual contents, but that's where things get very shady.

"He uses all the obscure C features, like the ability to not declare the type of an argument or return value and let the compiler fill in"

This isn't an obscure feature, the default C89 type is "int". Leaving out "int" has long been considered bad form, and since C99 omitting return type has not been allowed (but generally supported in nonstrict modes).

Didn't look at this piece, but more recent J interpreter is here - https://github.com/openj/core . I think the main idea for "a" struct (it is here - https://github.com/openj/core/blob/master/jt.h) is that it has "rank" - the single integer which is the number of dimensions, then array of dimensions - "shape" of length equal to "rank", then "data" part which has the size equal to multiplication of all elements in "shape".

Edit: the structure is at https://github.com/openj/core/blob/master/jtype.h .

Thank you, I was wondering about the purpose of the tr function, which calculates the product of all elements in the vector, now it's clear that it goes from the array of lengths in each dimension to the total "flat" number of elements.

So the "r" field in the "a" struct is the number of dimensions of the array that "a" holds, and the "d" field is the array that holds the length of the array "a" in each dimension.

Uuh. This is ancient K&R C. Here's a version that is somewhat closer to todays C. https://gist.github.com/jpfr/560c861cd3eb76700a54

Can we replace some of the one-letter codes with better namings?

It still doesn't compile, though, does it? When I try, the expression pr(w->p[i]) doesn't work because the argument is a long and pr is declared to take an A *.

Sure it compiles.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact