Hacker News new | past | comments | ask | show | jobs | submit login
J one-page interpreter fragment (1992) (jsoftware.com)
61 points by Tomte on Jan 25, 2021 | hide | past | favorite | 43 comments



Its density is many times higher than most C programs, but that's no big obstacle to understanding if you don't attempt to "skim" it; you need to read it character-by-character from top to bottom. It starts off defining some basic types, C for Character and I for Integer, and then the all-important Array. This is followed by some more shorthand for printf, return, and functions of one and two arguments, all of the array type. The DO macro is used to make iteration more concise.

Then the function definitions begin. ma allocates an array of n integers (this code is hardcoded for a 32-bit system), mv is basically memcpy, tr (Total Rank?) is used to compute the total number of elements, and ga (Get/Generate Array) allocates an array.

This is followed by the definitions of all the primitive operations (interestingly, find is empty), a few more globals, and then the main evaluator body. Lastly, main contains the REPL.

While I don't think this style is suitable for most programmers, it's unfortunate that the industry seems to have gone towards the other extreme.


Btw, this is considered more or less canonical k style. (with q being a slighty more readable version of this)

It takes some time getting used to it, but, once you get the hang of it, going back to e.g. Python/Pandas is painful.


While this example is pretty dense, a "one page programming language interpreter" doesn't need to be so impenetrable. Here's a one-page (67 line) interpreter for a subset of the Jsonnet programming language, implemented in Scala:

- https://github.com/handsonscala/handsonscala/blob/v1/example...

From the top down, that snippet contains an AST definition, a parser, an object model, an evaluator, and an output serializer. This 67-line programming language interpreter is one of the projects in my book https://www.handsonscala.com/ (chapter 20), and while this example is minimal the same techniques were used for my production-ready Sjsonnet interpreter


My favourite one page implementation of a programming language is 'a micro manual for lisp, not the whole truth' by John McCarthy.

https://www.uraimo.com/files/MicroManual-LISP.pdf


Some time ago I've made a humble decoding of this wonderful code snippet: https://zserge.com/posts/j/



Thanks for this!


I don't get why the 5 in '5 + tr(...)'. t+r+d[3] is 5, I guess? The verb function should return an I and not an A. e->tokens in the wd function.


Indeed - was confused what elements of struct were on line 2.

Thanks for explaining


I want to point out the large part of "unreadability" feeling here comes from the fact the code is pre-ANSI C89. so tr(r,d)I d;{I z=1;DO(r,z=zd[i]);R z;} is just a function definition.

I would say the structurally this extremely easy to read considering there is zero comments and the way it is presented. One can make this readability obvious by just expand the macros, change indentations and add line breaks, or, you can just spend several days to "get used" to it like Roger Hui (and Arthur who just use this kind of style for life).


The repository[0] for J is just as unreadable (albeit with slightly more white space). I don’t know how a project is managed like this. Even the file names look obfuscated.

[0]: https://github.com/tavmem/j


> The repository[0] for J is just as unreadable

People who can read this find it readable.

I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.

> I don’t know how a project is managed like this.

It's not dissimilar to working with any proprietary programming language: You have to learn how to read and write this language.

The fact that we can use an existing language's compiler can be confusing to people who don't know that language very well, but if you approach it from the perspective of a new (proprietary) language, using an existing language's compiler can be a great way to leverage the benefits of that existing language.


> I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.

This argument is significantly weakened when simply removing the meaningless macros and adding whitespace improves readability.

It's not like it is complicated because it is written in a structurally different language, nor will you actually leverage any benefit from learning this language. It's just obfuscated under the guise of abbreviation.


> This argument is significantly weakened when simply removing the meaningless macros and adding whitespace improves readability.

I disagree wholeheartedly. Whitespace tends to move code further away from the code that uses it; scrolling and tab-flipping requires the developer hold that code in their head where it is most likely to degrade. It is much much better to make them remember as little as possible.

It also helps reuse: If you don't scroll, you can see more code, so you can see if something is a good candidate for abstracting, just because you can see them doing the same thing.

Macros like this help with that, and so they aren't "meaningless". Less whitespace helps with that, and so it actually improves readability (to those who have to read it!).

The trade-off is that you can't hire someone with "C" on their CV and expect them to be productive on their first day, but in a complex codebase, this might not be possible for other reasons.


I have a hard time believing that increasing the size of your terminal "helps reuse".

First, I do not agree that working memory is any significant limit when analyzing code, specially because one of the first steps is going to create the mental abstraction that allows you to, precisaly, understand the code. The density of that abstraction is definitely uncorrelated to the amount of whitespace. Thus, scrolling is only going to be an issue for the first couple of reads.

Second, say your patented steganography mechanism manages to fit 3x the amount of "code" in the same terminal size (and I am being generous). Is this going to increase "code reuse" by any significant amount?


> one of the first steps is going to create the mental abstraction that allows you to, precisaly, understand the code.

Precisely.

Now a short program is "short enough" that you can convince yourself it is correct; That is to say, I'm sure you can imagine writing "hello world" without making a mistake, and that there is some threshold of program length where your confidence in error-free programming will be lost. For every seeing-programmer I have ever met, and I suspect strongly all seeing-programmers, that length is measured in "source-code pixels". Not lines, or characters, but literal field of view. Smaller screen? More bugs.

Where you are forced to deal with your application in terms of the mental abstraction, rather than what the code actually says it does, it is simply because that code is off-screen, and that mental abstraction is a sieve: If you had any true confidence in it, you would not believe that program length correlates with bugs.

> scrolling is only going to be an issue for the first couple of reads.

I've worked on codebases large enough that they've taken a few years to read fully, and codebases changing so quickly that there's no point to learn everything. Sometimes you can read a program, and sometimes you can't, but when you can't, the damage that scrolling does seems infinitely worse.

> Is this going to increase "code reuse" by any significant amount?

Yes, and usually by a factor of a thousand or more.


> For every seeing-programmer I have ever met, and I suspect strongly all seeing-programmers, that length is measured in "source-code pixels". Not lines, or characters, but literal field of view.

By the same logic: font size affects number of bugs.

I still doubt it. First, the size of the mental model is definitely not related to physical source code length, but rather an abstract, hard to define "operation" concept. Therefore "hello world" is the same size, no matter how large your font size is nor how much whitespace there is between the prologue and the first statement/expression.

In fact, I would even argue, one's mental abstraction is farther from the actual on-screen code the more abbreviated your code is. If it reads like this:

     MC(AV(z),AV(w),m*k);                 /* copy old contents      */
     if(b){ra(z); fa(w);}                 /* 1=b iff w is permanent */
     *AS(z)=m1=AM(z)/k; AN(z)=m1*c;       /* "optimal" use of space */
It doesn't matter how much space it occupies on screen. The simple mapping of names to identities is going to fill the entirety of your working memory. And I wouldn't believe you can "learn" this mapping. Our memory works in terms of concepts, not letters; the reason a 7 word passphrase is almost as easy to remember as a 7 character password. The identifiers here do not follow any discernible pattern (sometimes it's memset, other times it's MC instead of memcpy), and I would really doubt any structure can be followed at two chars per identifier. People already have problems remembering the much shorter and much more descriptive set of POSIX system calls.

> Sometimes you can read a program, and sometimes you can't, but when you can't, the damage that scrolling does seems infinitely worse.

I've worked for companies that used to remote into old X11 servers for viewing the code. Latency was measured in seconds. Impact of scrolling would have been huge. It was definitely not the biggest impact to productivity. In my experience, branchy code flow was still the biggest hinder.

> Yes, and usually by a factor of a thousand or more.

This would imply a "power law" of code reuse, where the code you are likely to need is closer to the point where you need it. The only way I would believe such a rule is, precisely, if your code base doesn't reuse any code at all and people just copy code "close to point of use" due to some arcane coding style.

My impression: I'm assuming people are cargo culting here.


> By the same logic: font size affects number of bugs.

Yes it does.

> I still doubt it.

https://www.ets.org/Media/Research/pdf/RR-01-23-Bridgeman.pd...

> Our memory works in terms of concepts, not letters; the reason a 7 word passphrase is almost as easy to remember as a 7 character password

And yet we write things down because our memory is limited, and the notation we choose strikes a balance between packing meaning into glyph, and the speed at which your thoughts can translate into the mark, so you really have this exactly backwards: You have to memorize less if you can see more of it.

> It was definitely not the biggest impact to productivity

"The quality of the software is rarely the biggest impact on a business", is perhaps the most depressing thing I've ever heard anyone say about their job.

I'd like to think that my work is a bit more important than that, and worth any expense to make me more productive at it.

> And I wouldn't believe you can "learn" this mapping.

It sounds like ¯\_(ツ)_/¯ you wouldn't believe a lot.


> This argument is significantly weakened when simply removing the meaningless macros and adding whitespace improves readability.

Yes, but for this case adding whitespace actually decreases readability. At least if you know Whitney's hate of scrolling.


>People who can read this find it readable.

> I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.

The same could be said about Brainfuck. With enough effort people can learn to read almost anything; that doesn’t mean that everything is equally readable.


Brainfuck is so simple that it is trivial to read. The problem there is a semantic one rather than a syntactic one, you cannot build any abstractions. J-style C has all the semantic expressiveness of C, and chooses a terse, but fairly uniform style.


That is pretty readable C... for developers from that particular community.

The ngn/k implementation of the k language [1] is even more terse.

[1] https://git.sr.ht/~ngn/k/tree


I think this is the main J repo [0]. (Though I’m not sure if it’s any more readable.) Possibly interesting that Tavmem is the maintainer of Kona (open source K3/4). I guess he’s comfortable with all APL/J/K languages.

[0] https://github.com/jsoftware/jsource


What people often miss with J is how good it is for exploratory programming due to its terseness and most importantly its integrated columnar store. It replaced R for data wrangling in my stats programs, and relegated it to just running the models.

Once you get used to it, it's really a breeze and far easier to modify than 500+ sloc R scripts.

Of course, that does not work for everyone, and especially not for big teams, but still!


> It replaced R for data wrangling in my stats programs

Do you find it faster?

I have good experiences with J due to its terseness, but I'm curious why scientists still use R/Python - is it just inertia? Libraries/FFI?


Faster to program yes, massively so. Faster to execute, it depends on what I'm doing, but it's never a problem for me as my data is usually pretty small and quite messy. It doesn't feel slow at all, at least. And obviously if you're doing arrays you'll certainly not run faster in Python/R.

As for R/Python it's mostly familiarity with the notation (especially for python) and established popularity, with a large ecosystem as a consequence.

I mean, as a beginner in Python it just works (slowly). As a beginner in J, you cry... The interesting distinction is that Python/R APIs can be quite convoluted and the rug may be pulled from under you without warning, while in J you learn the primitives and you're off to the races. Also, J is much faster for its use case and avoids the need to write C in most cases where using it is relevant.


Here is a recent C program in a similar compressed style, a Scheme compiler:

https://c9x.me/qscm/data/qscm.c

via this page which was recently linked on HN:

https://c9x.me/qscm/


Somehow still more readable than actual J code


Unlikely. The point of writing C in this manner was to bring C closer to APL - or J, which was an evolution of APL at the time. So that J interpreter would be written in a form of J language. J language itself doesn't need to satisfy also C syntax, so J code is somewhat easier.


If curious see also

2017 https://news.ycombinator.com/item?id=14463874

2016 (one comment): https://news.ycombinator.com/item?id=12654054

2014 https://news.ycombinator.com/item?id=8533843

Related from last year: https://news.ycombinator.com/item?id=22831931

I'm pretty sure there have been other threads about this, probably via different articles.


I actually used this as inspiration for a little self-challenge to write a programming language in 24hrs not too long ago [1]. I didn't get quite as far as I'd've liked to, but nonetheless if you have some time to kill, it was an extremely fun task to tackle.

[1] https://github.com/block8437/afternoon-lang


A good walk-through of this code is here: https://rickyhan.com/jekyll/update/2020/01/16/j-incunabulum-...


Question: where/how are the `mv` and `pr` functions defined? And how does this even compile when it's using `gets` without `#include <stdio.h>` ?

I do see it's using the `implicit int` rule though.


`mv` is defined on the same line as `ma` is, right after the five `#define`s.

`pr` is on the last line before the only line break.

It's a pretty dense program to walk through, but not as completely impenetrable as I first thought after taking some time and making notes as I went.


This is pre-C89, K&R C.


Is there any Lisp interpretation on J language? I have hard time searching anything called with one letter only.


Anyways, info about J can pretty much be found only on its website (jsoftware.com), so don't really bother looking elsewhere. The only other good place is Rosettacode.

What do you mean by "Lisp interpretation on J language"?

There's an APL in Common Lisp called April. Otherwise, there's an unfinished J interpreter on the Racket package manager.


Sorry for incorrectness, I want to look at interpreter of any dialect of Lisp language, written in J, if the one exists. I found J's approach to syntax interesting to explore.


Here you are:

https://code.jsoftware.com/wiki/Scripts/Scheme

http://tangentstorm.github.io/apljk/bjonas-scheme.ijs.html

However, in my understanding this code is clear but is just an exercise to use J. A real Scheme parser in J would go through the use of ;: which is a primitive defining a state machine. Given the appropriate state table, it would yield a Scheme tokenizer (with the whole thing amounting to state_table ;: string). Here be dragons, though! You can find a J parser in J (not up to date with current J) using it here:

https://www.jsoftware.com/help/dictionary/d332.htm


Writing a lisp interpreter in J won't really do more to help explore J's syntax than writing anything else in J. It might be a good way to explore how J handles tree-like data, but that's not really something J is designed to do well (if you really want to use an array-oriented language for tree processing, Aaron Hsu's dissertation covers a range of techniques that are more scalable than the naïve nest-of-short-boxed-vectors approach typically used in J).


It leaks memory on every line it evaluates.


I guess freeing all those pointers returned by ma() would be too verbose.

Seriously though, this must have been a deliberate design decision? Arthur Whitney is no amateur.


I'm sure memory management was left to deal with later. In Roger Hui variant, memory leaks were eliminated early on, on a systemic level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: