Actually, hoon syntax reminds me of a coffescriptish set of M-expressions for a ...

juped · on Nov 5, 2015

Yeah - we are homoiconic (twigs are nouns), but we aren't actually Lisp.

I think with training wheels it probably pays off to be excessively empirical. See the situation in Haskell where the only reason "map" only maps over lists and not generic Functors is that people were afraid to put the word "Functor" on a newbie's screen - but many newbies are in fact confused by "map" and "fmap" both existing! So holding one's nose and accepting something inelegant for that reason should probably be done if and only if it works. So far, we haven't actually had any issue with digraph runes in particular, and I think there's a number of reasons why.

I think in practice a lot of what we actually do is aggressively subset the language - which is okay, because a great deal of the runes are in fact compiler-level macros. While we can't put this in any documentation because it would hold us to too much, the character choices in runes are actually designed to have patterns that are easy to remember in them. For instance, %-, %+, and %^ are 1, 2, and 3-ary "function application", respectively, and %. is %- with the order of its children reversed. Analogously, :-, :+, and :^ build 2, 3, and 4-tuples, respectively, and :_ is :- with the order of its children reversed (a minor irregularity because :. is somewhat awkward, especially when your comments start with ::). Finally, ?. is "unless" - it's ?: with the "then" and "else" branches reversed. So a new Hoon programmer might learn |=, along with the fact that | runes make cores, and ?:, along with the fact that ? runes are conditionals, and already be able to sorta-read the decrement gate given above.

Another sort of training wheel we have is that some of our concepts can masquerade as more familiar ones. For example, many people, including you, read the irregular form for %=, or "evaluate with changes" (seen above as "$(b +(b))") as "recursion", because most of the time you see it explicitly in simpler code it is actually being used to recurse. The reason this turns out not to be a huge deal is because there's basically no non-pathological case where %=ing $ (which is actually not syntax, but an empty name, in the code above referring to the |- trap that it's inside) is not recursion, making it an intermediate-level topic (and it's a higher-level topic that, at the lower level, %= is basically the most fundamental operation we do).

It's not surprising that you could remember %wtcl easily - a lot of people associate verbally. I do, and Curtis has made it clear that he named the punctuation because he does. The weird names make it less of a subconscious thing than it is with, say, "char *argv[]" or "s->len", and also mean that Hoon programmers in the same office can talk quickly and unambiguously (it does feel silly at first, I admit).

The biggest insight of the syntax, though, (in my opinion, which is not necessarily shared), is that fixed-arity syntactic forms let us avoid both code that tries to escape off the right edge of the screen and unsightly piles of closing brackets, because most syntax trees are greatly unbalanced. Something really prosaic that feeds into this is just the fact that all the runes are two characters wide, giving Hoon that surprisingly easy to scan "imperative" feel. You might be able to find four-character keywords (without cheating and using wtcl etc, naturally!) and make this work, of course.

mst · on Nov 5, 2015

Ok, so, the digraph runes are the sticking point, for me, trying to read hoon code, so far (caveats very definitely necessary). I love the tall form indentation system, and I actually do know what $() is from reading the hidduc-posmeg stuff ('recur' is common sugar for a lisp continuation-based recurse-into-somewhere-often-the-current-function - please don't confuse that with me thinking it's plain recursion) - but, well, other than ?: which fits the hook operator thing (and ? and ?? which seem pretty clear), I've not got a good way to hang memories off the rest. Some sort of set of mnemonics, even very silly mnemonics, ala 'perldoc perlvar', might make things a lot easier for me.

Can you invent a just so story that explains why % is for apply, and : is for making tuples? It doesn't have to be particularly real, I could just do with something to hang the liveware memory structures off.

(note that I am -trying- to get my head around this stuff, but I've not quite got far enough in terms of reading code for creating an installation to seem worth the effort ... but I want to, so I'm trying to find constructive ideas that'll help other people trying to learn and incidentally also me as well ;)

juped · on Nov 5, 2015

It's easiest for ? (conditionals) and ! (black magic), but it's not ever really completely arbitrary. (Some of this may just be stuff I made up rather than the actual etymology, but that's what you asked for and it's useful anyway - cf. traditional Chinese character "etymologies", which drive linguists mad but are great memory aids.)

$, buc - construction of tiles (which I think we have since renamed to molds). This is type-system stuff, and I think the $ is supposed to evoke shell/Perl/&c. variable sigils, which are also type-system stuff.

|, bar - construction of cores. I really think this one is just there because of the pseudo-box-drawingness of how a generic multi-armed core looks:

  |_  [a=@ b=*]
  ++  arm  42
  ++  gate-arm
    |=  a=@
    (add a arm)
  --

something with a | at the root, and then ++ arms hanging off it like... arms. Cores really read more as layout than text to me.

:, col - builds tuples. This is a punctuation mark that joins things, standing for some kind of consing operation. It's also seen in the second position in runes like $:, where it also tends to mean a consing operation.

%, cen - evaluation. Everything here is a syntactic sugar on the most general operation, %=, which evaluates something from the subject with specified changes. I find this easy to remember because it's the most important one. It's also on Tlon's logo, for what it's worth.

., dot - Direct Nock. The vast majority of . runes in use are .=, test for equality, in its irregular form =(a b). Nock is the low-level target machine, and . is a pretty low punctuation mark (_ is probably lower, but never appears in the first position in a digraph rune).

#, hax - Pretty printing. I don't think these are ever really seen in their regular form (just in string interpolation syntax, really), but pretty printing involves many hacks, and # is pronounced 'hax'.

^, ket - Type operations, chiefly casts. If you imagine types like superscripts on values, then the ^ character makes sense.

~, sig - Runtime hints, either debugging printfs or actual hints to the runtime. Imagine writing notes on some text, preceded with a squiggly line.

;, sem - Composers. Pretty much the only one you'll see in its regular form is ;~, for monadic composition, though ;: which promotes a binary gate to more arguments (say, for adding 3 numbers) might show up in tall form as well. The semicolon is a punctuation mark which composes related but somewhat standalone clauses together.

=, tis - modification of the subject. Hoon is a language where the environment is first-class - other languages have variables and scoping and the like, Hoon just has the subject, with everything from your function's arguments to the Hoon compiler and kernel in it. = is a common assignment operator in programming languages, which is a special case of modifying the subject.

?, wut - conditionals. Pretty easy, fortunately!

!, zap - black magic. Black magic deserves the ! - special honors go to !!, or "crash here". !: essentially turns on debugging mode for anything inside it - you'll often find it at the top of code files, given that Urbit is under construction.

Thanks for the feedback - this sort of thing is the main value that we get out of posting things on the net. It's much appreciated.

mst · on Nov 6, 2015

I like most of those but here's some alternative stuff I made up that I think may work better for me (emphasis on may and me) -

~ -> it's a hint, so it's saying "hey, this is 'ish' that" or so (~ always kinda means 'ish' in my head). or maybe we use the fact that ~ is bitwise complement and say "it's info that complements the thing"

# -> "format it for a comment"

: -> ':' connects chunks of sentence together

; -> semi-:, so does half that and half something else as well, oh and also sem for 'sem'antic composition whereas col just 'col'lects stuff together ... actually that sem/col version really makes me happy.

Glad you're finding this at least vaguely constructive, I'm rather enjoying it too.

juped · on Nov 7, 2015

Someone also pointed out that people have called monads "programmable semicolons".