Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Indentation-based syntax for Clojure (github.com/ilevd)
75 points by ilevd 9 days ago | hide | past | favorite | 50 comments





What the author did is cool in a social sense; lispers go on and on about homoiconicity, which here means that the original lisp code is data. The author uses this property by reading this code/data and transforming it into text, that represents syntax-full code. It is "cool" because it demonstrates the value of homoiconicity.

However; it's not cool in a technical sense. The result is not homoiconic any more. Code that had a cool property has been turned into code that does not, and the result looks like every other syntax-full blub language. The result destroys value.


If the semantics are exactly the same as clojure, can you give me a specific example of what value is lost? Surely the semantics must be different for there to be a loss of tangible expressivity of the language?

the lisp text representation is not lisp data. Nor is it code. It's just a textual representation of the homioiconic abstract code/data.

A different textual encoding does not affect the homoiconicity


(foo (bar (baz 1)))

is equivalent to

foo

  bar

    baz 1
Its still homoiconic

they can still, and in the case of the linked project, probably are turned into the exact same data structure.

I get your objection insofar as whats the point? Youre transforming a language that does a thing well into a language that does a thing less well than another language. But on the other hand different approaches make things easier to reason about.

sometimes the shell approach of baz 1 | bar | foo may make more sense. sometimes it might be a map, sometimes a for loop.

So in that sense, a DSL for python type, scripting makes sense.


I'm sorry if I touched you on an ouchie.

> Its still homoiconic

> they can still, and in the case of the linked project, probably are turned into the exact same data structure.

What you are saying means it is not homoiconic any longer. If you print out the parsed representation of lisp code, you get back the same lisp code. If you print out the parsed representation of this whitespace code, you get back the lisp code (data) also. But this is different to what you fed into the parser.

The point of homoiconicity is that the data structure representing the code is the code itself. There's no syntax, it just represents itself. It's super simple. A programming language with syntax cannot be homoiconic. This is complexity.

> I get your objection insofar as whats the point?

"Easy" IS an argument to choose something over something else, but "simple" is a better, stronger argument. Parens syntax is simple. Whitespace syntax is easy, iff you come from/is familiar with a python-style background.

We are debating some obscure thing. The point is the argument :) I hope you have a great day.


> I'm sorry if I touched you on an ouchie.

In case you're not a first-language English speaker: the wording here implies that they're having a childish emotional reaction to what you said, which (if intentional) is an entirely inappropriate method of discourse, especially when all they did was give a very reasonable and level-headed critique of your definition of homoiconicity (that it's a property of the internal representation, not the external syntax).


>What you are saying means it is not homoiconic any longer. If you print out the parsed representation of lisp code, you get back the same lisp code. If you print out the parsed representation of this whitespace code, you get back the lisp code (data) also. But this is different to what you fed into the parser.

I don't know if we're talking past each other here.

lisps internal representation of (foo (bar (baz 1))) is not literally just that.

it is a linked list, generally (?).

An indented version of the above:

foo

  bar
  
    baz 1
is exactly the same, as is the tcl version:

foo{ bar{ baz{ 1 }}}

The structure describes the data.

you can turn that internal representation back to a lisp list, or you could turn it into json or tcl or whatever.

Homoiconicity also means, as I understand it, that foo would appear the same regardless of whether its built in.

in C for example you have to care whether 'for' is a built in language construct.

you can't do

foo( int i = 1 ; i < x; i++) {}

the language doesn't allow it.

if lisp had:

(for ((= i 1) (< i x) (++ i)) .....)

then that would be homoiconic, and you could make it look like C with a front end, and it would end up as the same internal structure.

It would be more complicated to turn that internal representation into the C like syntax, but it isn't impossible.


> If you print out the parsed representation of this whitespace code, you get back the lisp code (data) also. But this is different to what you fed into the parser.

First, Lisp is not homoiconic by that definition either. If you read "( + 1 2 )" as a Lisp object then print it, you'll get "(+ 1 2)", which is not the same string.

We understand that difference in sexp whitespace to be superficial, and superficial can be technically defined, but it's a difference. If you accept that superficial differences are allowed within the definition of homoiconicity, you can define the difference between indententation syntax and sexp syntax to be superficial too, if you want.

Second, it's not true that what gets printed has to be the Lisp representation. It depends which print function you use. Homiconicity requires the reader and printer match. If you write a different reader, to accept a different syntax, then you need to use the corresponding printer.

If you print_with_new_syntax (a function name I've made up), you get back the original code in its indentation-based form, which makes the pair read_with_new_syntax + print_with_new_syntax homoiconic in the classical Lisp sense.

There may be superficial differences, such as if you put in "baz 1 " and get out "bar 1" (missing a space), or "baz" followed by "1" indented on the next line, but as noted above, Lisp sexps have this characteristic too.

(Then consider "(+ 1 #-t 2 3)" which in a Lisp that supports #- evaluates to 4. Read that and print it, you'll get "(+ 1 3)", which isn't a superficial difference any more.)


"Homoiconic" means that a program's definitions are stored in a textual form which can be re-edited at run time. Bash is homoiconic because you can type "set" to see all the function definitions in source code form, and edit them via copy-paste. Line numbered BASIC interpreters are likewise homoiconic. The program is held in memory in lightly tokenized form; any line of it can be recalled and edited.

That's how the originator of the term "homoiconic" defined it.


Sexpressions are syntax for linked lists. You can encode a linked list in other ways and as long as you have a reader and printer for it , you could implement lisp on top of it , and it would be homoiconic.

As a very simple example: replace the () with []

As a slightly more complex example: json. (With support for symbols if necessary but that’s orthogonal—you get the idea)

And from there we can go to: yaml.

Lo: we have a white space sensitive homoiconic language :)


But it's not itself, it's an array of characters. It's AST can be represented as simply as a recursive list data structure (with a head and a tail), but I think this is a bit over-mysticized.

macros still work yo

my text editor represents sexprs as pixels, macros still work when src is viewed in GUI editors too, regardless of font, and even when i have my glasses on


Give it a rest.

Interesting project, showing how "easy" it is to host[1] another language within Clojure. Like others, I admit I see little value for myself or as a selling point for beginners. For experienced engineers however, like I wrote above it should serve as a case study into how to hook up everything together to produce a working tool. Then it is a matter of seeing if and when there is opportunity to reuse such techniques to build DSLs that compile to native Clojure.

[1] Clojure is known to being designed to be hosted within another platform, but like all Lisps it is also a valid (and productive) hosting target by itself. It should be known that many of the APIs in the Clojure ecosystem rely either on:

- literal data structures - macros reusing native Clojure patterns / forms such as `let`, `def` ... - the shape of native (ie: defined within clojure.core) Clojure APIs, such as `get-in`, `with`...

Some examples: specter, integrant, mount, clojure.core.logic, datalog (datascript, datomic, xtdb), clojure.spec ...


I'd argue that the main thing this illustrates is how silly the argument that s-exps are hard to read is. A very slight syntactic change makes Clojure look just like Python. Anybody who claims they just can't wrap their head around writing Lisp is not being serious.

Yes, exactly.

This is really cool, I like it a lot! I don't think the parens in lisp are actually the problem, I think it's the embedding. I.e. List programps naturally tend to get very nested very fast, kinda like this:

(-

  (+ (* 3 4)

     (- 3 1))

  3)
It's just a simple sum, but reading it involves a lot of remembering contexts and how things should work in lisp. The CWP examples uses arrow/pipeline style macros a bunch, which are awesome! I kinda feel like they already solve the issues that a lot of people have with parens:

(->> 3

     (* 4)

     (+ (- 3 1)

     (- 3))

There's still a tonne of parens there, but the flow is a lot more readable for me.

Edit: sorry for the very difficult to read code formatting - couldn't figure out how to monospace things in hackernews.


The best effort I’ve seen at working around lisp’s parens notation and make its syntax more declarative-looking is Rhombus, which is based on Racket, and actually on track to replace it.

what it looks like:

https://github.com/racket/rhombus/blob/master/demo.rhm

more info:

https://doi.org/10.1145/3622818

https://github.com/racket/rhombus


The problem is that the Lisp code structure is hierarchical. Transforming it for input/output into a sequential code structure can help, but hinders the learning of the hierarchical mode of working with trees and nested lists. In the long run the mental model of trees or nested lists is more important than the "improvements" of the sequential visible structure. Typically the sequential model is preferred by those who have learned other languages with imperative code structures (in the history that was FORTRAN, but nowadays it's often languages like Python and Java). Not unlearning and then learning a different model, limits the understanding of the underlying language and its evaluation model.

That's really interesting! I definitely could be sticking to a mental picture without realising. Curious if you're a lisper and have got used to reading that kind of heavily intended code? Or if you have other strategies for minimizing the complexity?

For me containment is important. (a (b)) -> (b) is inside a. Also that the whole form is enclosed in parentheses. If I move the mouse over a form, the whole form can be selected.

    (pipe
      (a 1 2 3)
      (b 4 5 6 7))
If I move over the second form, there is an argument, which is above it.

My mental model is more based on tree evaluation, not so much on sequential evaluation.

    (b (a 1 2 3)
       4 5 6 7)
For me it's more important to see that all arguments to b are enclosed in a single form. If I want to move it around, I can easily select the whole form (with the mouse typically by a double click on the parentheses or a middle click).

The list becomes a physical thing (-> manipulating lists), not so much a syntactical thing (dealing with the syntax of a pipe operator).

But then I'm used to Lisp environments with a higher code complexity (longer code with deeper nesting, lots of macros) and support for that.


> Text after a blank line that is indented by two or more spaces is reproduced verbatim. (This is intended for code.) [0]

So...

  (-
    (+ (* 3 4)
       (- 3 1))
    3)
And...

  (->> 3
       (* 4)
       (+ (- 3 1)
       (- 3))
[0] https://news.ycombinator.com/formatdoc

I often write threaded math but it unfortunately it doesn't compose cleanly with division :) (subtraction can be transformed into an addition without too much extra)

If you do thread-last and then need to divide by some value, you will need to write something like (* (/ 1 37)) which is ugly

I think you could actually write a fraction here, but if instead of 37 you have a binding then that doesn't work


Ah nice, thank you!

An excellent point about lisps. I've long thought that one of the core issues with lisps is that logic is tree based, where a large majority human brains are far more amenable to linear process. I suspect there's un-recognized cognitive overhead in the branching of trees versus linear with early return.

Forth does that without parentheses:

  3
  4 *
  3 1 -   +
  3 -
That’s the ultimate “object verb” way to do it and IMO at least equally readable (ignoring familiarity, of course)

The problem with this is that the parsing depends on the arity of the functions.

Wouldn't that be true of infix syntax as well? For example, if I define an operator, %, which divides one number by a second number and then multiplies it by a third, you have to know its arity to parse:

  70 % 90 100 * 10
Even worse, in this case you're also dealing with operator precedence, which in (Reverse) Polish Notation isn't an issue.

Yes, but infix syntax is usually used only for a few select functions, whereas Forth uses postfix syntax for all functions.

In Forth parsing doesn't even come into play, because an operator just tries to pop as many arguments off the stack as it needs. Annoyingly, that does mean there's no way to tell what the following:

  a b +
Means, as it can be any of these, for example:

  add(b, a)
  add(b(a))
  add(b(a()))
  add(b(), a())
  add(b, a())
  add(b(), a)
In a language where you don't need brackets for function calls it can be reduced to:

  add b, a
  add b a
And even if the add function takes two arguments the second option would be fine because the call:

  b a
May put two arguments on the stack. add(b(), a())

Mathematical notation often gets brought up as an example of there being problems with Lisp syntax. I have problems with that. For one, how often do you actually write...

  foo := 7 * 3  + 4 - 5;
Instead of:

  foo := 20;
Which, in Lisp syntax could be as simple as:

  (set! foo 20)
I could see an exception being made for something like:

  constant MEMORY := 640 * 1024;
But the Lisp equivalent:

  (defconstant! MEMORY
    (* 640 1024))
Isn't _THAT_ much worse.

With more complex examples we still see that the difference isn't too terrible, especially as we try to make code more readable. An example of code whose readability is questionable could be:

  r := (t - l) / (n - 1) * n;
Which, when we choose names with a little more meaning could become:

  readjusted_score := (total_score - lowest_score) / (number_of_students - 1) * number_of_students;
Of course, that line is a bit long (just a bit, but I'm not interested in actually going and typing out a very long line), so it could be reformatted to something like:

  readjusted_score := (total_score - lowest_score)
                      / (number_of_students - 1)
                      * number_of_students;
Which isn't _TOO_ much different from:

  (set! readjusted-score
    (* 
        (/ 
           (- total-score lowest-score)
           (- number-of-students 1))
       number-of-students))
And in fact, the Lisp syntax could make certain errors more visible, such as when we compare the following two snippets:

  readjusted_score := (total_score - lowest_score)
                      / (number_of_students + 1)
                      * number_of_students;

  (set! readjusted-score
    (*
       (/
          (- total-score lowest-score)
          (+ number-of-students 1))
       number-of-students))
And yes, indeed, when we start looking outside of mathematics, sometimes a little bit of sugar can even reduce the amount of brackets needed in Lisp syntax beyond the brackets needed in non-Lisp syntax:

  abstractFactoryInstanceBean := AbstractFactory.getInstance().getBean().initialize();

  (set! abstract-factory-instance-bean
    (-> Abstract-Factory
        get-instance
        get-bean
        initialize))
Speaking purely of mathematics, writing a function which turns infix syntax into Lisp syntax isn't too daunting anyway, it's even an exercise in good old SICP, so with very little effort we could write a macro to let us do the following:

  (set! foo
    (infix bar + baz * (quux - zot)))
Or, for the RPN aficionados:

  (set! foo
    (rpn bar baz quux zot - * +))
Of course this is very much personal opinion, but I would even go as far to say that infix notation isn't even that popular, as we can see with the following examples:

  new_student := new Student("Name", points);

  new_student := Student new ("Name", points);

  if (length(students) > 30) {
    error("Too many students in class!", length(students));
  } else {
    addStudent(new_student, class);
  }

  (length(students) > 30) if {
    "Too many students in class!" error length(students);
  } else {
    new_student addStudent class;
  }
All of that being said and done, I doubt I will change anyone's mind, but I personally really like Lisp syntax (as in good old, double open brackets on the let, Lisp syntax), and I just hope to see arguments which are a little stronger that the old classic 'we learn 1 + 2 in school, not (+ 1 2)'.

I totally agree with you about infix notation actually! I don't find it particularly bad to read (- 3 4) vs (3 - 4) and the first has the benefit of achieving some clarity of what's happening under the hood (- is a function being called with 3 and 4).

The main example for I has in mind was actually data science/analysis where you often chain together data manipulations. I only picked arithmetic functions because it seemed like it made it really clear which functions were being called, without needing any knowledge of clojure/lisp beyond basic s-expr syntax.


The parenthesis are a strength that allows for faster editing and easier parsing. Although I respect every en-devour to have fun with code, I hope people don't run with this as a serious option. It would be better to have editor extensions that emulate this with regular Clojure code; that! I could get behind.

I just want a programming language (probably a Lisp) where every thing is on a separate line, every sub-thing is indented and there is no choice (there always is only one right way to write something).

E.g.

    println(greetings(users))
should always be

    println
        greetings
            users

I started my toy language with this idea, but it quickly becomes unreadable, so I chose to support both notations.

https://github.com/jen-Ya/tab

Here is a Fibonacci example:

  fn fibo x
   if (< x 2) 1
    +
     fibo (- x 1)
     fibo (- x 2

That's XML without the angle brackets

This reminds me of Breck Yunits and his Parsers idea for a programming language: https://scroll.pub/blog/teddyTalk.html

Scroll, a publishing system, seems to be the main focus of his efforts: https://scroll.pub/

He's on HN, but his posts and comments irritate people: https://news.ycombinator.com/user?id=breck


I have more or less exactly this syntax written in a notebook as a transformation.

The one critique I'd have here is I think that name binding should be handled differently and _not_ introduce extra indentation. Naming shouldn't create indentation IMO, and it's one of the cardinal sources of readability issues in lisps.

But for you you could just avoid it entirely (though you might need to be a bit clever about scoping if you care about lifetimes).

Just let me write `a = thingy` and then go to my next line!


Wouldn’t “let x to 1” be much more clear if it were “let x be 1”? How is this typically read in English mathematical contexts?

Or perhaps “as”:

  with-open rdr to :random-value

  binding *file* to .getFile(url):

Yes. This is a dead giveaway that the author is not a native English speaker. And overall it gives the project a mid vibe.

It's a cool thought experiment but you can also use use an editor tool that matches braces and colours them lightly to achieve nearly the same effect, but also more readable.


I found--when working in Clojure for a while back in 2012--that what I really wanted was just a way to collapse deeply-nested parenthetical definitions, and ended up with a simple reader macro that reads one form and then pushes the rest of the current form down into the end of that one (and I added a special case for a let).

    (use 'reader-macros.core)
    
    (defn macro-read-one [reader]
        (clojure.lang.LispReader/read reader true nil true))
    
    (defn macro-read-rest [stop reader]                                                                      
        (if (skip-whitespace reader)                                                                         
            (let [data (.read reader)]                                                                       
                (.unread reader data)                                                                        
                (condp = (char data)                                                                         
                    \; (do                                                                                   
                        (macro-read-comment reader (.read reader))                                           
                        (macro-read-rest stop reader))                                                       
                    stop ()                                                                                  
                (cons                                                                                        
                    (macro-read-one reader)                                                                  
                    (macro-read-rest stop reader))))))      
    
    (set-dispatch-macro-character \> (fn [reader quote]
        (let [data (.read reader)]
            (.unread reader data)
            (if (= \( (char data))
                (concat (macro-read-one reader) (macro-read-rest \) reader))
                `(let [ 
                    ~(macro-read-one reader) 
                    ~(macro-read-one reader) 
                ] ~@(macro-read-rest \) reader))))))
I just find that the GC (and being hosted in Java, in particular) causes you to have to have a very large number of (with-open)s around in just about everything you do to deal with the lack of deterministic finalization, and, when combined with the number of recursive let bindings I was accumulating, I felt like I was going insane.

    (defn handle [manager socket] 
        #>(with-open [socket socket])
        #>(with-open [input (new java.io.BufferedInputStream (.getInputStream socket))])
        #>(with-open [output (.getOutputStream socket)])
    
        #>input (atom input)
        #>output (atom output) 
    
        #>(let [sendln (fn [& data]
            #>(let [data (apply str data)]) 
            (println "< " data)
            #>(let [data (.getBytes data "ISO-8859-1")])
            (.write @output data)
            (.write @output crlf)
        )])
    
        ...etc etc...
    )
I am now looking back at some of this code--after not seeing it in a decade--and honestly I'm still very happy with this design choice and still feel like this drastically helps my ability to understand what is going on in some of the more complicated functions, as my level of indentation/nesting is so much lower.

So like, if you are going to create a new syntax for Clojure--especially one that is going to force indentation for nesting that could have previously been elided by stylistic choices--my maybe-I'm-unique suggestion would be to add a similar feature that let's you have a version of ":" that just dominates until the end of scope and let's you reduce one level of indent.

(Which, I guess, put like that, is a feature I could see also adding to Python. The existence of mutable data--and how the most popular implementation has deterministic finalization via reference counting--means I don't feel quite as visceral of a need for this, but it would still be pleasant at times.)

(Ugh... reading that macro reminds me just how angry I found myself getting every day I used that language how ',' was bound to something entirely useless and so they had to break the beautiful symmetry of ',' with '`' in macro definitions. It is something so small, and yet, and I appreciate this is nigh unto ridiculous for someone who clearly didn't exactly like Lisp either enough to not implement this insane reader macro ;P, kept making me sad about Clojure any time I had to touch a macro, and macros were the only reason I was using it in the first place.)


Maybe try something like Gleam?

But, where are my parens? :D

Joking :)

Looks rather interesting, to be honest. I'll play with it.


You mean: "when kids do reckless stuff like this, you gotta ask: where are the parens in all this?"

I am looking for my grandparents.

At a glance, it looks quite similar to Python.

Reminds me of Apple's Dylan.

There is an enormous difference between the things you can do and the things you should do. This one doesn't belong in the second set.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: