Yes I agree... that's why I took the trouble to write a very complete bash parse...

jcrites · on Jan 25, 2017

Just wanted to say that I really like the ideas you've expressed on the blog. I've found myself thinking similar thoughts, such as about combining shell, awk, make into one language. Keep it up!

I'll add that my ideal "command and control language" (as I've been thinking of it) would also be one that's strong at expressing data literals. If there are a number of data types like maps, sets, lists, etc., then it should be easy to initialize them with literals (not true in languages like Java). There are so many configuration file formats that I wonder if they could be subsumed into files with literals expressed in this language (perhaps with a "data mode" to prohibit executable code).

I've been wanting to design a language along these lines as well. I'll start by learning about oil, and reach out if I'm interested and able to contribute!

chubot · on Jan 25, 2017

Thanks! Yes I plan to have data literals -- it's basically going to be JSON, because that is sort of the least common denominator between JS, Python, Ruby, Perl, etc. Shell is a glue language, and JSON is pretty natural at this point.

There is also going to be influence from R and CSV files (which goes with awk).

I have thought about the config problem a lot -- and a configurable sandboxed "data mode" you are talking about is probably what I will go with.

If you squint, the oil shell will look not unlike nginx configs, e.g. a bunch of words with nested {} for blocks. Maybe like https://github.com/vstakhov/libucl or https://github.com/hashicorp/hcl (hm this seems to even have here docs!)

This kind of thing was done a lot with Python at my last job, to various degrees of success (e.g. https://bazel.build/ - the build language is derived from Python). Python sort of has some "data mode" sandboxing features, like being able to set __builtins__ when you exec/eval, but probably not enough. It didn't work well enough to prevent people from writing their own config file parser eventually. The syntax is close but not exactly what you want for some use cases.

taeric · on Jan 25, 2017

I'm curious how far from just dropping to sexps you are looking. While I will not claim that lisp is the be all of languages, when you have a literal syntax that is literally made for linked structures, it becomes a lot easier to represent whatever you are wanting to do. And it you can keep yourself from having tons of "pseudo languages" that are used for different structures.

chubot · on Jan 26, 2017

I've mentioned this a few places so I should probably write a post about it, but I did experiment with Lisp and shell (using femtolisp which is used to bootstrap Julia). There is an obvious similarity in that they both have prefix rather than infix syntax. Tcl explored this design space pretty fruitfully as well.

The short answer is that the experiment didn't work very well. If you want a shell based around Lisp, the new "oh shell" is closer to that. It has homoiconic syntax, which I don't agree with. I talked about that a bit here:

http://www.oilshell.org/blog/2016/10/28.html

The oh shell is influenced by the es shell, which was almost literally a Lisp. There is also EShell (ELisp) on my Github wiki, and earlier Scheme shell. So without going into too much detail, I think the idea has been tried and it has failed not by accident, but for fundamental reasons.

That's not to say that Lisp isn't hugely influential. Julia and Elixir both have very Lisp-like metaprogramming which I hope to take some inspiration from. Without having tried it in detail yet, I think the lesson is that in 2017 you don't need homoiconity to have metaprogramming. And another lesson is that people like syntax (me included!)

The shell especially needs syntax because it will often be typed in a terminal as opposed to in an editor with help.

taeric · on Jan 26, 2017

Actually coming back to this. Eshell is an odd example for you to pick. I use eshell daily. Is my only shell, if I can get away with it. However, it is not a "lisp" shell. It is simply a shell buffer in emacs that is heavily integrated with many emacs features. Tramp, in particular, is nicely integrated such that I am beginning to forget scp and related notations. (That is actually a bad side effect.)

Yes, you can inline lisp expressions. But, I constantly paste in standard shell exports and they work exactly as you would expect them to.

Again, we agree that lisp is not necessarily a good shell language. Just, EShell seems a poor example to look at when making this argument.

Edit: Where is the wiki you referred to? In particular, where you discuss eshell.

taeric · on Jan 26, 2017

I fully agree that lisp is not the answer for a shell. My question was only for your data literal syntax.

Specifically, maps as association lists are very natural in lisp. So are many other structures. And JSON is easily seen as a crappy sexp. XML is easily a complicated sexp. Python's pickling is a complicated eval.

And I get it. Some things have a more "natural" syntax that isn't postfix. Even in common lisp, the reader macro is often used for infix things. However, when the examples used are easily translated to sexps, and back, it makes me wonder why we don't think programmers could learn to write them.

chubot · on Jan 26, 2017

The literal syntax is going to look more like JSON, like Python and JavaScript have. I don't see what's better about association lists (assoc (k v) (k2 v2)) vs. {k:v, k2: v2}.

I mentioned in that blog post that Clojure introduced meanings for [] and {} in a Lisp, and I generally view that as a good thing.

Also if that syntax is only used for data literals, and not code, then it loses the whole Lisp paradigm and I wonder why you would want that syntax.

taeric · on Jan 26, 2017

Well, the "assoc" is only the function to traverse an alist. So, that isn't needed there. And typically, for literals you'd just quote the list. So you are comparing ((k v) (k2 v2)) with {k:v, k2:v2}.

My gripe with the second will be simply that you will retraverse a ton of corner cases and other encoding issues in order to have your literal syntax. More, I will probably be required to have yet another batch of tooling to support programmatically generating these literals.

At the end of the day, probably won't matter. Odds of success here are already low. So, I can see siding with emotional aesthetics for a syntax. I just face palm with all of the tooling that goes out the door simply because people are averse to parens.

chubot · on Jan 26, 2017

I don't follow... {k:v, k2:v2} is not difficult to parse or generate. Based on the ubiquity of JSON parsers, it's been done dozens of times in every language under the sun.

I don't really see a paucity of tools for JSON or Python, unless there is something special you're looking for.

The larger issue of metaprogramming is important, but that would require code to be represented uniformly too, not just data. I'm looking into Julia and Elixir metaprogramming, which take advantage of a uniform Lisp-like representation, but also have rich syntax.

I would say it's 2017 and we should be able to have BOTH syntax and metaprogramming/tools. I have a lot of posts about about parsing on my site. If there's a problem with parsing something, I prefer to fix the parsing tools than to mangle the language's syntax to fit an ancient model.

taeric · on Jan 26, 2017

If you pick JSON explicitly, that is fine. But, I will note that {k:v, k2:v2} is already not JSON. It is JSON-like, but needs quotations and whatnot.

The year is just an appeal to emotion. It would be nice if we could have and eat cake. Empirically, things haven't shaken out in that direction. And it wasn't long ago that there were a host of face palms for JSON parsing out there. Things are getting better, in some regards, but adding another JSON like parser to the fold will only make that worse.

lispm · on Jan 26, 2017

> Clojure introduced meanings for [] and {} in a Lisp

Lisp was using [] and {} before in a lot of software.

For example this is a data structure in Connection Machine Lisp called Xapping, basically a parallel hash table:

    {moe->"Oh, a wise guy, eh?" larry->"Hey, what's the idea?" curly->"Nyuk, nyuk, nyuk!"}

Straight from the 80s.

Scheme was using [] also many years before in the standard, programs and books.

taeric · on Jan 26, 2017

My thought is that the confusion is most people are taught that lisp has no syntax. And that they should avoid the reader macro at all costs. Few, then, actually dive into large programs in lisp to see how real programs actually look.

In this example, though, what was the advantage of the special syntax? (I'm assuming there was one. Just a lot of these parallel hash tables written in literal form?)

lispm · on Jan 26, 2017

> taught that lisp has no syntax

Which is wrong. It's just that Lisp syntax looks and works a bit different. Especially there is a data syntax for s-expressions and the syntax for the programming language Lisp is defined on top of s-expressions.

> And that they should avoid the reader macro at all costs.

Common Lisp itself uses read macros for its implementation. Applications use them in various ways.

> In this example, though, what was the advantage of the special syntax?

Xappings were a central data structure in Connection Machine Lisp, thus it's not unusual that it had a printed representation.

taeric · on Jan 26, 2017

Agreed on your points. Especially that it has syntax. But look at the linked post upthread. I said "no" syntax, but that was me being admittedly lazy. Quote there was "Lisp is a language with little syntax". Which has me wondering why it has that general view with everyone.

For the representation.. So this was not simply a literal syntax, but a serialization one. Which makes sense. And I can see why the serialization syntax is easily usable as a literal one.

kazinator · on Jan 26, 2017

GCC's C parser is a 550 kilobyte file that is almost 19000 lines long. The C++ one is over 1.1 megabytes long, and close to 39,000 lines. Github refuses to display it as code, only raw.

Lisps don't have anything that even comes close.

lispm · on Jan 27, 2017

Since many/most Lisp macros implement syntax, there can be a lot of syntax in Lisp, too. The parsing of the syntax is distributed over the macro definitions, sometimes with some help of the general macro mechanism.

See for example the syntax which is implemented by macros like LOOP or ITERATE. The LOOP syntax is documented in the ANSI CL spec...

http://www.lispworks.com/documentation/lw51/CLHS/Body/m_loop...

kazinator · on Jan 27, 2017

Macros do not have to deal with syntax at the level of "how does this sequence of tokens reshape into a tree". (Not usually; exceptions are easily contrived and exist in the wild.)

Usually, the syntax is already parsed when it comes into a macro.

E,g:

   (sentence-macro subject object verb)

sentence-macro doesn't have guess what part of the utterance is the object and which is the verb. The verb is the third argument, and that is that. This is the case whether it is a single word, or a phrase.

Though this is still syntax, it is parsed syntax.

When we say that Lisp has no syntax, it means that it dosen't have that silly, counterproductive stuff naively imitated from natural languages which represents the tree in a way that requires mind-bending work to recover.

taeric · on Jan 27, 2017

You should probably check out the LOOP macro sometime. Examples include:

    (loop for i from 0 to 10 
          for j from 10 downto 0 
          collect (cons i j))

That is a fully legit expression. So, if you consider LOOP contrived, then you have a point. It is a very useful and a very powerful macro, though. Not sure why it wouldn't count.

kazinator · on Jan 27, 2017

LOOP does speak to what you can do with macros. Obviously, a macro can interpret its arguments however it wants. You can implement a language that requires GLR (or worse).

LOOP is also not universally loved in the Lisp world. It's fine if you don't need to extend it. Not only is it inherently inextensible (for no good reason), but it's hostile to wrapping. If you write a MYLOOP macro which passes most of its arguments to LOOP and adds a few clauses of its own (translated to LOOP syntax), you have to parse the LOOP syntax to see where your clauses are and in which order in relation to the standard ones. (Unless you do something hacky or ugly, like recognize your extensions by some delimiters, without regard for the surrounding syntax, so that they don't actually blend in.)

LOOP is the only high level control construct I've ever seen anywhere in which you can achieve nonportable behaviors: because of the construct itself, not because of nonportable expressions plugged into it.

The treatment of clause symbols as character strings, so that CL:FOR and MYPACKAGE:FOR can both head off a for clause is a design smell. LOOP should have been designed to use symbols in the CL package, so that if someone wants to use APPENDING, they nicely import that symbol or pick it up via :use.

LOOP asks you to use different syntax for walking a vector and list, when the rest of Lisp has a sequences library that lets you use common functions for both. Supporting the specialized syntax is fine (just like the library also has list and vector specific things), but the lack of a generic sequence iterating clause shows a discord with the rest of the language. It didn't require much imagination to have, say a FOR x OVER <list-string-or-vec>.

We can have a perfectly good Lisp dialect without any sort of mini-language parsing macro like LOOP. It illustrates what you can do with macros, not what is usually done with macros. Macros usually leverage their tree structure and destructuring not to do any parsing work, and focus on the transformation and its semantics.

lispm · on Jan 28, 2017

> Not only is it inherently inextensible (for no good reason)

Various LOOP implementations are extensible.

> Macros usually leverage their tree structure and destructuring not to do any parsing work, and focus on the transformation and its semantics.

Untrue. Many macros need custom destructuring or even walking the code in some way.

> We can have a perfectly good Lisp dialect without any sort of mini-language parsing macro like LOOP.

Sure: basic Scheme, but then Scheme has its own complex looping constructs like http://wiki.call-cc.org/eggref/4/foof-loop .

Iteration macros like LOOP, ITERATE, FOR and others are very convenient and powerful.

kazinator · on Jan 28, 2017

I mostly agree without, except that custom destructuring and code walking do not imply parsing.

Destructuring is just how we access the tree (already parsed object). Sometimes the tree isn too complicated for the simple destructuirng performed by destructuring macro lambda lists, so we have to do things like walk substructures and apply DESTRUCTURING-BIND or use some pattern matching library or whatever. Example: just because I I have some MAPCAR over a list of variable-init pairs (qualifies as "custom destructuring") doesn't mean I'm parsing syntax that hasn't been parsed. I'm just destructuring a syntax tree that hasn't been destructured. Destructuring isn't parsing. Destructuring is not even mentioned in papers and textbooks on compilers; it falls into the bucket of somehow walking the tree, which falls under semantic analysis.

Code walking is ultimately done to the expansion of every macro. COMPILE and EVAL perform code walking. No textbook on compilers will refer to code walks over an AST as part of the parsing stage.

Of course the point is valid that Lisp macros sometimes take a flat list of items and apply recursive phrase structure rules to recover a tree (or small subtree, as the case may be). This is rare, and basically a last resort device; if you're doing that, you're writinig some sort of "big deal" macro. Instances of it are rare in Common Lisp, and I don't suspect it is done frequently in Lisp programs. It's great that wan do that. It's also great that because of the way the langauge works, we can accomplish a lot without having to do that.

Speaking of ITERATE, its parsing of clauses is trivial, because it uses Lispy syntax. Dealing with (for var = expr) versus (for var initially expr then expr) versus (for var first expr then expr) is "quasi parsing". OK, we have a FOR. What isn the third item? Switch on a bunch of cases: =, INITIALLY, FIRST, .... If it's unrecognized, then signal an error. In each of these cases, certain things are fixed by position in the syntax already. This is "parsing" only in the sense that Unix people refer to simple trivial line argument processing as "parsing". (Only a few utils do actual parsing or a recursive grammar, examples being find and tcpdump.)

lispm · on Jan 27, 2017

> Macros do not have to deal with syntax at the level of "how does this sequence of tokens reshape into a tree"

That's not syntax. Syntax is concerned whether a sequence of words are valid expressions in a language and determines syntactic categories for these. You can lookup a better definition of syntax, I'm too lazy.

These are all DEFUN forms. Some are valid Lisp, some are not.

   (defun foo () ()) is valid

   (defun () foo ()) is invalid

   (defun () () foo) is invalid

   (defun (foo) () foo) is invalid

   (defun foo () foo) is valid

   (defun () foo foo) is invalid

   (defun foo foo ()) is invalid

   (defun (setf foo) (setf foo) foo) is valid

   (defun foo (setf foo) (setf foo)) is invalid

   (defun (setf foo) foo (setf foo)) is invalid

Just by changing the order of subforms in a macro form we can produce valid Lisp and invalid Lisp forms.

   (defmacro defun (name args &body forms) ...)

Does not tell you that. You need to implement that logic in the macro somewhere.

> sentence-macro doesn't have guess what part of the utterance is the object and which is the verb. The verb is the third argument, and that is that.

Macros implement more complex syntax. Check the ANSI CL specification and its EBNF syntax declarations some time.

> When we say that Lisp has no syntax

Then it's just wrong and misleading.

Take the DEFUN macro:

the EBNF (extended backus naur form) syntax definition for DEFUN is:

    defun function-name lambda-list [[declaration* | documentation]] form*

Is

    (defun (bla blub) (&rest foo &optional bar)
                  (declare fixnum) "oops" ((((fooo))))))

a valid defun expression ????????????????

The macro has to check that. It better rejects invalid expressions. It also has to look at the elements of the form to destructure them in the right way, so that it can process them and create a new valid form.

The Lisp implementation provides for the implementation of DEFUN as much as:

    (defmacro defun (spec args &body body ...) ...)

The macro language does not allow further specifications of the name the arglist or the body in the macro interface. All it gets are spec, args and body. Now the macro has to implement the syntax for spec, args and body.

Questions the macro has to answer:

* is the function name a symbol or a list of the form (setf foo) * is the arglist a valid lambda-list? Now check the EBNF syntax for lambda lists with whole/optional/key/rest options with default values and what have you. * now it has to parse the body: * is the declaration valid? Now check the EBNF syntax for declarations to see what needs to be done. * is the documentation a string at the right position? * is the body a sequence of forms? Now check the EBNF syntax for FORM.

This all has to be backed into the DEFUN macro somehow or checked from there. And not all implementations are good at it.

Various syntax errors:

The name is not valid:

    CL-USER 14 > (defun (foo bar) baz (list))

    Error: (FOO BAR) is neither of type SYMBOL nor a list of the form (SETF SYMBOL).
      1 (abort) Return to level 0.
      2 Return to top loop level 0.

A symbol is not a valid lambda list:

    CL-USER 16 > (defun (setf bar) baz (list))
    (SETF BAR)

    CL-USER 17 > (compile '(setf bar))

    Error: Invalid lambda list: BAZ

Keyword argument is wrong:

    CL-USER 19 > (defun foo (&key ((aa))) (list))
    FOO

    CL-USER 20 > (compile 'foo)

    Error: Malformed (keyword variable) form in &key argument ((AA))

Wrong declaration:

    CL-USER 24 > (defun foo (&key (aa)) (declare inline-function foo))

    Error: Alist element INLINE-FUNCTION is not a cons or NIL

Wrong form:

    CL-USER 26 > (defun foo (&key (aa))
                   (declare (inline foo))
                   (((lambda ()
                       (lamda () ())))))
    FOO



    CL-USER 27 > (compile 'foo)

    Error: Illegal car ((LAMBDA NIL (LAMDA NIL NIL)))
              in compound form (((LAMBDA NIL #))).

And so on.

An INFIX macro:

    (infix 3 + 2 ^ 5)

It has to implement infix syntax.

If it is still not clear, below is the syntax for the LOOP macro. The LOOP implementation has to implement the syntax, so that

    (loop for i below 70 do (print i))

is recognized as a valid program and it better detect that

    (loop do (print i) for i below 70 )

is not a valid program, because it violates the syntax below.

    The ``simple'' loop form:

    loop compound-form* => result*

    The ``extended'' loop form:

    loop [name-clause] {variable-clause}* {main-clause}* => result*

    name-clause::= named name 
    variable-clause::= with-clause | initial-final | for-as-clause 
    with-clause::= with var1 [type-spec] [= form1] {and var2 [type-spec] [= form2]}* 
    main-clause::= unconditional | accumulation | conditional | termination-test | initial-final 
    initial-final::= initially compound-form+ | finally compound-form+ 
    unconditional::= {do | doing} compound-form+ | return {form | it} 
    accumulation::= list-accumulation | numeric-accumulation 
    list-accumulation::= {collect | collecting | append | appending | nconc | nconcing} {form | it}  
                         [into simple-var] 
    numeric-accumulation::= {count | counting | sum | summing | } 
                             maximize | maximizing | minimize | minimizing {form | it} 
                            [into simple-var] [type-spec] 
    conditional::= {if | when | unless} form selectable-clause {and selectable-clause}*  
                   [else selectable-clause {and selectable-clause}*]  
                   [end] 
    selectable-clause::= unconditional | accumulation | conditional 
    termination-test::= while form | until form | repeat form | always form | never form | thereis form 
    for-as-clause::= {for | as} for-as-subclause {and for-as-subclause}* 
    for-as-subclause::= for-as-arithmetic | for-as-in-list | for-as-on-list | for-as-equals-then | 
                        for-as-across | for-as-hash | for-as-package 
    for-as-arithmetic::= var [type-spec] for-as-arithmetic-subclause 
    for-as-arithmetic-subclause::= arithmetic-up | arithmetic-downto | arithmetic-downfrom 
    arithmetic-up::= [[{from | upfrom} form1 |   {to | upto | below} form2 |   by form3]]+ 
    arithmetic-downto::= [[{{from form1}}1  |   {{{downto | above} form2}}1  |   by form3]] 
    arithmetic-downfrom::= [[{{downfrom form1}}1  |   {to | downto | above} form2 |   by form3]] 
    for-as-in-list::= var [type-spec] in form1 [by step-fun] 
    for-as-on-list::= var [type-spec] on form1 [by step-fun] 
    for-as-equals-then::= var [type-spec] = form1 [then form2] 
    for-as-across::= var [type-spec] across vector 
    for-as-hash::= var [type-spec] being {each | the}  
                   {{hash-key | hash-keys} {in | of} hash-table  
                    [using (hash-value other-var)] |  
                    {hash-value | hash-values} {in | of} hash-table  
                    [using (hash-key other-var)]} 
    for-as-package::= var [type-spec] being {each | the}  
                      {symbol | symbols | 
                       present-symbol | present-symbols | 
                       external-symbol | external-symbols} 
                      [{in | of} package] 
    type-spec::= simple-type-spec | destructured-type-spec 
    simple-type-spec::= fixnum | float | t | nil 
    destructured-type-spec::= of-type d-type-spec 
    d-type-spec::= type-specifier | (d-type-spec . d-type-spec) 
    var::= d-var-spec 
    var1::= d-var-spec 
    var2::= d-var-spec 
    other-var::= d-var-spec 
    d-var-spec::= simple-var | nil | (d-var-spec . d-var-spec) 
    Arguments and Values:

    compound-form---a compound form.

    name---a symbol.

    simple-var---a symbol (a variable name).

    form, form1, form2, form3---a form.

    step-fun---a form that evaluates to a function of one argument.

    vector---a form that evaluates to a vector.

    hash-table---a form that evaluates to a hash table.

    package---a form that evaluates to a package designator.

    type-specifier---a type specifier. This might be either an atomic type specifier or a compound type specifier, which introduces some additional complications to proper parsing in the face of destructuring; for further information, see Section 6.1.1.7 (Destructuring).

    result---an object.

chubot · on Jan 27, 2017

If you think that's bad, just representing C++, not even parsing it, in Clang, takes 61K lines of headers!

If you look at these files it's mostly declarations and one-line functions. It's not even doing anything. The lexers and parsers are a whole other story, certainly greater than 19K lines combined.

    ~/src/cfe-3.8.0.src/include/clang/AST$ wc -l *.h
       2742 DeclObjC.h
       2809 RecursiveASTVisitor.h
       2927 DeclTemplate.h
       3198 OpenMPClause.h
       3249 DeclCXX.h
       3800 Decl.h
       4154 ExprCXX.h
       4942 Expr.h
       5723 Type.h
      61525 total

taeric · on Jan 26, 2017

But that is a different claim, isn't it? I would expect that the general parser for a CL implementation is also quite large.

My question is why does the teaching often go that there is practically no syntax for lisp. Now, it is often followed quickly with "in lisp, you can write your own language."

But then, those two thoughts are hard to justify next to each other. Again, just see this thread. The complaint is that there is not enough syntax in lisp. The argument appearing that you have to have another language to express some ideas. And again, I'm not even disagreeing with this claim. Just trying to understand why it has the foothold that it does. That you drop to lisp if you want to use no syntax.

kazinator · on Jan 27, 2017

Just checked an old clisp-2.32 source tree I have laying around on an old hard drive. The raw line count in the src/ directory over all .d and .lisp files is 212K. Compiler, CLOS, all in there. The io.d module (preprocessed C source) which contains the reader and printer, plus some other cruft, is 10K.

taeric · on Jan 27, 2017

Just wow. I would expect it to be smaller. But I am surprised it is that much smaller.

Vendan · on Jan 25, 2017

as an interesting note, Lua actually started out as a data entry language (actually called DEL), that eventually morphed into Lua due to demand for more complex logic and such.

taeric · on Jan 25, 2017

Awesome posts, best of luck in this! Any suggestions/requests for users to help?

And I realize I made an omission in my post. I meant to say not just learnings, but use. I would think the biggest hurdle for you will be building a user base.

chubot · on Jan 25, 2017

Thanks a lot! I'm hoping to set things up for contributions in the next couple months.

I'm want to publish the test matrix and get some help because filling out the corners takes a lot of time. The general architecture is pretty sound though -- I hit all the "major areas" of shell, although I may need to make another pass at globbing.

If you are adventurous you should be able to clone it, follow the instructions for ./pybuild.sh. Then ./spec.sh install-shells, and then run './spec.sh smoke'. That test should pass, but there are like 15-20 other categories in spec.sh in various states of disarray!

Of course this is in Python, but to me it makes sense to fill out a sketch before porting it to C++ (only the runtime; I'm hoping to bootstrap the parsers in oil itself). Not sure if that will inhibit or help contribution.