Its density is many times higher than most C programs, but that's no big obstacle to understanding if you don't attempt to "skim" it; you need to read it character-by-character from top to bottom. It starts off defining some basic types, C for Character and I for Integer, and then the all-important Array. This is followed by some more shorthand for printf, return, and functions of one and two arguments, all of the array type. The DO macro is used to make iteration more concise.
Then the function definitions begin. ma allocates an array of n integers (this code is hardcoded for a 32-bit system), mv is basically memcpy, tr (Total Rank?) is used to compute the total number of elements, and ga (Get/Generate Array) allocates an array.
This is followed by the definitions of all the primitive operations (interestingly, find is empty), a few more globals, and then the main evaluator body. Lastly, main contains the REPL.
While I don't think this style is suitable for most programmers, it's unfortunate that the industry seems to have gone towards the other extreme.
While this example is pretty dense, a "one page programming language interpreter" doesn't need to be so impenetrable. Here's a one-page (67 line) interpreter for a subset of the Jsonnet programming language, implemented in Scala:
From the top down, that snippet contains an AST definition, a parser, an object model, an evaluator, and an output serializer. This 67-line programming language interpreter is one of the projects in my book https://www.handsonscala.com/ (chapter 20), and while this example is minimal the same techniques were used for my production-ready Sjsonnet interpreter
I want to point out the large part of "unreadability" feeling here comes from the fact the code is pre-ANSI C89. so
tr(r,d)I d;{I z=1;DO(r,z=zd[i]);R z;}
is just a function definition.
I would say the structurally this extremely easy to read considering there is zero comments and the way it is presented. One can make this readability obvious by just expand the macros, change indentations and add line breaks, or, you can just spend several days to "get used" to it like Roger Hui (and Arthur who just use this kind of style for life).
The repository[0] for J is just as unreadable (albeit with slightly more white space). I don’t know how a project is managed like this. Even the file names look obfuscated.
I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.
> I don’t know how a project is managed like this.
It's not dissimilar to working with any proprietary programming language: You have to learn how to read and write this language.
The fact that we can use an existing language's compiler can be confusing to people who don't know that language very well, but if you approach it from the perspective of a new (proprietary) language, using an existing language's compiler can be a great way to leverage the benefits of that existing language.
> I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.
This argument is significantly weakened when simply removing the meaningless macros and adding whitespace improves readability.
It's not like it is complicated because it is written in a structurally different language, nor will you actually leverage any benefit from learning this language. It's just obfuscated under the guise of abbreviation.
> This argument is significantly weakened when simply removing the meaningless macros and adding whitespace improves readability.
I disagree wholeheartedly. Whitespace tends to move code further away from the code that uses it; scrolling and tab-flipping requires the developer hold that code in their head where it is most likely to degrade. It is much much better to make them remember as little as possible.
It also helps reuse: If you don't scroll, you can see more code, so you can see if something is a good candidate for abstracting, just because you can see them doing the same thing.
Macros like this help with that, and so they aren't "meaningless". Less whitespace helps with that, and so it actually improves readability (to those who have to read it!).
The trade-off is that you can't hire someone with "C" on their CV and expect them to be productive on their first day, but in a complex codebase, this might not be possible for other reasons.
I have a hard time believing that increasing the size of your terminal "helps reuse".
First, I do not agree that working memory is any significant limit when analyzing code, specially because one of the first steps is going to create the mental abstraction that allows you to, precisaly, understand the code. The density of that abstraction is definitely uncorrelated to the amount of whitespace. Thus, scrolling is only going to be an issue for the first couple of reads.
Second, say your patented steganography mechanism manages to fit 3x the amount of "code" in the same terminal size (and I am being generous). Is this going to increase "code reuse" by any significant amount?
> one of the first steps is going to create the mental abstraction that allows you to, precisaly, understand the code.
Precisely.
Now a short program is "short enough" that you can convince yourself it is correct; That is to say, I'm sure you can imagine writing "hello world" without making a mistake, and that there is some threshold of program length where your confidence in error-free programming will be lost. For every seeing-programmer I have ever met, and I suspect strongly all seeing-programmers, that length is measured in "source-code pixels". Not lines, or characters, but literal field of view. Smaller screen? More bugs.
Where you are forced to deal with your application in terms of the mental abstraction, rather than what the code actually says it does, it is simply because that code is off-screen, and that mental abstraction is a sieve: If you had any true confidence in it, you would not believe that program length correlates with bugs.
> scrolling is only going to be an issue for the first couple of reads.
I've worked on codebases large enough that they've taken a few years to read fully, and codebases changing so quickly that there's no point to learn everything. Sometimes you can read a program, and sometimes you can't, but when you can't, the damage that scrolling does seems infinitely worse.
> Is this going to increase "code reuse" by any significant amount?
Yes, and usually by a factor of a thousand or more.
> For every seeing-programmer I have ever met, and I suspect strongly all seeing-programmers, that length is measured in "source-code pixels". Not lines, or characters, but literal field of view.
By the same logic: font size affects number of bugs.
I still doubt it. First, the size of the mental model is definitely not related to physical source code length, but rather an abstract, hard to define "operation" concept. Therefore "hello world" is the same size, no matter how large your font size is nor how much whitespace there is between the prologue and the first statement/expression.
In fact, I would even argue, one's mental abstraction is farther from the actual on-screen code the more abbreviated your code is. If it reads like this:
MC(AV(z),AV(w),m*k); /* copy old contents */
if(b){ra(z); fa(w);} /* 1=b iff w is permanent */
*AS(z)=m1=AM(z)/k; AN(z)=m1*c; /* "optimal" use of space */
It doesn't matter how much space it occupies on screen. The simple mapping of names to identities is going to fill the entirety of your working memory. And I wouldn't believe you can "learn" this mapping. Our memory works in terms of concepts, not letters; the reason a 7 word passphrase is almost as easy to remember as a 7 character password. The identifiers here do not follow any discernible pattern (sometimes it's memset, other times it's MC instead of memcpy), and I would really doubt any structure can be followed at two chars per identifier. People already have problems remembering the much shorter and much more descriptive set of POSIX system calls.
> Sometimes you can read a program, and sometimes you can't, but when you can't, the damage that scrolling does seems infinitely worse.
I've worked for companies that used to remote into old X11 servers for viewing the code. Latency was measured in seconds. Impact of scrolling would have been huge. It was definitely not the biggest impact to productivity. In my experience, branchy code flow was still the biggest hinder.
> Yes, and usually by a factor of a thousand or more.
This would imply a "power law" of code reuse, where the code you are likely to need is closer to the point where you need it. The only way I would believe such a rule is, precisely, if your code base doesn't reuse any code at all and people just copy code "close to point of use" due to some arcane coding style.
My impression: I'm assuming people are cargo culting here.
> Our memory works in terms of concepts, not letters; the reason a 7 word passphrase is almost as easy to remember as a 7 character password
And yet we write things down because our memory is limited, and the notation we choose strikes a balance between packing meaning into glyph, and the speed at which your thoughts can translate into the mark, so you really have this exactly backwards: You have to memorize less if you can see more of it.
> It was definitely not the biggest impact to productivity
"The quality of the software is rarely the biggest impact on a business", is perhaps the most depressing thing I've ever heard anyone say about their job.
I'd like to think that my work is a bit more important than that, and worth any expense to make me more productive at it.
> And I wouldn't believe you can "learn" this mapping.
It sounds like ¯\_(ツ)_/¯ you wouldn't believe a lot.
> I think it's a mistake to think that because you can't read the code, that it is the code that is somehow unreadable, instead of a language you simply haven't learned to read.
The same could be said about Brainfuck. With enough effort people can learn to read almost anything; that doesn’t mean that everything is equally readable.
Brainfuck is so simple that it is trivial to read. The problem there is a semantic one rather than a syntactic one, you cannot build any abstractions. J-style C has all the semantic expressiveness of C, and chooses a terse, but fairly uniform style.
I think this is the main J repo [0]. (Though I’m not sure if it’s any more readable.) Possibly interesting that Tavmem is the maintainer of Kona (open source K3/4). I guess he’s comfortable with all APL/J/K languages.
What people often miss with J is how good it is for exploratory programming due to its terseness and most importantly its integrated columnar store. It replaced R for data wrangling in my stats programs, and relegated it to just running the models.
Once you get used to it, it's really a breeze and far easier to modify than 500+ sloc R scripts.
Of course, that does not work for everyone, and especially not for big teams, but still!
Faster to program yes, massively so. Faster to execute, it depends on what I'm doing, but it's never a problem for me as my data is usually pretty small and quite messy. It doesn't feel slow at all, at least. And obviously if you're doing arrays you'll certainly not run faster in Python/R.
As for R/Python it's mostly familiarity with the notation (especially for python) and established popularity, with a large ecosystem as a consequence.
I mean, as a beginner in Python it just works (slowly). As a beginner in J, you cry... The interesting distinction is that Python/R APIs can be quite convoluted and the rug may be pulled from under you without warning, while in J you learn the primitives and you're off to the races. Also, J is much faster for its use case and avoids the need to write C in most cases where using it is relevant.
Unlikely. The point of writing C in this manner was to bring C closer to APL - or J, which was an evolution of APL at the time. So that J interpreter would be written in a form of J language. J language itself doesn't need to satisfy also C syntax, so J code is somewhat easier.
I actually used this as inspiration for a little self-challenge to write a programming language in 24hrs not too long ago [1]. I didn't get quite as far as I'd've liked to, but nonetheless if you have some time to kill, it was an extremely fun task to tackle.
Anyways, info about J can pretty much be found only on its website (jsoftware.com), so don't really bother looking elsewhere. The only other good place is Rosettacode.
What do you mean by "Lisp interpretation on J language"?
There's an APL in Common Lisp called April. Otherwise, there's an unfinished J interpreter on the Racket package manager.
Sorry for incorrectness, I want to look at interpreter of any dialect of Lisp language, written in J, if the one exists. I found J's approach to syntax interesting to explore.
However, in my understanding this code is clear but is just an exercise to use J. A real Scheme parser in J would go through the use of ;: which is a primitive defining a state machine. Given the appropriate state table, it would yield a Scheme tokenizer (with the whole thing amounting to state_table ;: string). Here be dragons, though! You can find a J parser in J (not up to date with current J) using it here:
Writing a lisp interpreter in J won't really do more to help explore J's syntax than writing anything else in J. It might be a good way to explore how J handles tree-like data, but that's not really something J is designed to do well (if you really want to use an array-oriented language for tree processing, Aaron Hsu's dissertation covers a range of techniques that are more scalable than the naïve nest-of-short-boxed-vectors approach typically used in J).
Then the function definitions begin. ma allocates an array of n integers (this code is hardcoded for a 32-bit system), mv is basically memcpy, tr (Total Rank?) is used to compute the total number of elements, and ga (Get/Generate Array) allocates an array.
This is followed by the definitions of all the primitive operations (interestingly, find is empty), a few more globals, and then the main evaluator body. Lastly, main contains the REPL.
While I don't think this style is suitable for most programmers, it's unfortunate that the industry seems to have gone towards the other extreme.