Hacker News new | comments | show | ask | jobs | submit login
Lisp: It's Not About Macros, It's About Read (jlongster.com)
158 points by jlongster on Feb 18, 2012 | hide | past | web | favorite | 99 comments

It's not really about read either, it's about this:

> Wait a second, if you take a look at Lisp code, it’s really all made up of lists:

Haskell has `read`, most of your data types can just derive Read and Show and they'll "magically" get a representation allowing you to `read` and `show` them.

But that only works for datatypes, you can't do that with code.

In Lisp you trivially can, because code is represented via basic datatypes (through a reader macro if needed).

It's not macros. It's not read. It's much more basic than that: it's homoiconicity. From that everything falls out, and without that you need some sort of separate, special-purpose preprocessor (whether it's a shitty textual transformer — as in C — or a more advanced and structured one — as with Camlp4 or Template Haskell — does not matter).

And I don't get why the author got the gist (and title) so wrong when he himself notes read is irrelevant and useless in and of itself:

> Think of read like JSON.parse in Javascript. Except since Javascript code isn’t the same as data, you can’t parse any code

`read` does not matter if the language isn't homoiconic, you can `read` all you want it won't give you anything.

Yep, you're absolutely right. I was mainly offering a different viewpoint that might be refreshing to those who have always heard the "code is data and data is code" statement, but never really understood it.

Focusing on `read` was a way to anchor my article, even if it truly isn't about read either. I tried to tie that together at the end.

isnt every language that takes string homoiconic? because code is string and , string is also valid data type in that language? (by wikipedia definition)

Sort of, but only in a trivial and uninteresting sense - similar to how f(x) = 0 is its own derivative, but is much less interesting in that sense than f(x) = e^x.

A string has no additional structure, so if you want to do any transformations beyond simple string/regex substitutions you have to parse it into a more suitable format.

I disagree with this article's perspective, but I understand what the author intended to say and to a certain extent I like how the idea is presented.

Yes, Lisp's power comes from the embodiment of code and data together in one manner, and the ability to treat them this way when writing code is good, but `read` is a coincidence of that power, not a demonstration of how it is used. Macros are the method by which we harness the power of homoiconicity in an efficient, powerful manner.

As masklinn put it, it's really about the homoiconicity of Lisp. But with that said, I'm not convinced at how great that is. While macros in general can be useful, is homoiconicity generally a good thing?

It's rarely the case that I want a single representation for all my data -- and if we treat code as data, do I want a representation that is indistinguisable from all my other data?

For example, the distinction between data that specifies layout (html, xaml, etc...) and that which performs logically computation (javascript, c#, etc...) seems like a useful distinction to have.

While I can appreciate the AST form of s-exprs I also do like the richness of many standard languages -- and the semantic richness of their ASTs.

Lastly, treating code as data (and vice-versa) has been the bane of many programmers of days past. Go back 40 years and you can find many developers who did treat code as data (it was all actually viewed as sequence of bits by many) and this caused no end of problems. In most modern systems there are often safeguards to specify data and code segments and ensure that you don't treat one as the other. While not completely analogous to Lisp macros, it does show that you tread dangerous ground when you attempt to treat all forms of data as indistinguishable.

Given the special purpose nature of code, I don't mind (and actually appreciate) a well thought through syntax, and a special set of functionality to interact with it -- as I do most special purpose forms of data.

In emacs I represent everything in Lisp and I use separate color schemes for SXML documents and code. This effectively allows me to distinguish between these different classes of data. Furthermore, emacs has different sets of functionality for different types of data, so I feel that I have all the features you mentioned already.

It's unsurprising you mention sxml given that XML and s-exprs have a trivial homomorphism. With that said, I'm also not a big fan for representing relational data in XML (or sexprs).

Having a uniform representation for data isn't a big enough win to trump having data represented in a way that is more natural for me to think about.

With that said, if your brain thinks in XML (or sexprs) maybe Lisp will always work best for you.

Can you provide a link to some Clojure/Scheme/CL code that you've written?

(curious about how you use the language, not interested in scoring points)

Unfortunately, in practice, it's not really the case that "it’s really easy to parse Lisp code". Yes, you can do it in some cases, but to do it correctly in general, at least in CL, is the somewhat infamous "code-walker" problem, which needs to do all sorts of strange things:

Do you handle all varieties of lambda lists? recognize and descend into all the special forms? what do you do with macros? expand them (a mess)? try to walk into special-cased standard ones like 'loop' and ignore user-defined ones?

The closest you can come to doing it sanely is to use a code-walking library like the one in arnesi: http://common-lisp.net/project/bese/docs/arnesi/html/A_0020C...

These are all issues of compilation, not reading. Reading Lisp code into cons-pairs, symbols, numbers, etc. is not that hard at all: you can do it in about 100 lines of code. Lisp can be trivially parsed by a recursive-descent parser. "'Recursive-descent'," as the UNIX-HATERS Handbook says, "is computer science jargon for 'simple enough to write on a liter of Coke.'"

No, you don't READ Lisp. You READ S-Expressions. The Lisp syntax is defined on top of that. To parse Lisp you need to understand the complex Lisp syntax - sometimes done with a code-walker. Trivial parsing can be done with primitive Lisp functions, but a complete parser is not that easy.

Usually resolving things like which identifiers refer to which kinds of things is considered part of the parsing step, not compilation; for example, the is-it-a-typedef-or-not resolution problem is frequently cited as the reason C's grammar isn't context-free. And you can't do even that level of parsing---determining which symbols are function calls and which aren't---for Lisp without expanding macros, or special-casing how to descend into known ones.

Macro-expansion is sometimes considered part of compilation, sometimes a separate stage between parsing and compilation. I've never seen it referred to in Lisp as a part of parsing, though I guess it might be possible to do that with C's text-based macros.

`read` does not perform macro-expansion: that would break data reading and the reading of quoted forms. macroexpand expands macros at a later stage. Once expanded, macros either refer to primitive special forms or function calls, and it's trivial to determine which. Primitive special forms can have their components macroexpand'd as appropriate. Function calls can be left as-is to be compiled (well, the arguments can be macroexpand'd). Eventually there will just be primitive special forms and raw function calls left, ready to be handed to the compiler.

I can buy that in terms of definitions. My objection is more based on the common use-case of "parsing" Lisp in this fashion to do some kind of source-to-source transformation, like the example in this post, which I've also wanted to do. The post saves itself by only replacing the first symbol in a form. But what if you wanted to do something fairly simple like replace every call to foo with some other bit of code?

In idiomatic Lisp code, you either miss lots of the calls, or you have to complicate your code-walker significantly. This is especially the case if you write CL in the (common, but not universal) style that makes significant use of the loop macro, because you either ignore it as a macro, and consider anything inside it opaque until macro-resolution time (because you don't know what it does to its arguments), or you special-case it as a new bit of CL syntax, in which case your parsing is now fancier. Usually you want something like the latter, because source-to-source transformations expect to also replace things inside loops. Same with, say, special-casing setf forms, if you want source-to-source transformations to "do what I mean" in a large number of cases.

It's true that it's very easy to literally get the list representing the code, but there's precious little sensible you can do with that list unless you're willing to descend into some of the more commonly used built-in macros that most CLers treat as de-facto syntax, which requires knowing something about the syntax they in effect define.

I agree with you and lispm about the complexity of code walking in CL.

For your specific example of replacing calls to foo with another bit of code you may be able to get away with macrolet. (Example: http://letoverlambda.com/index.cl/guest/chap5.html#sec_4 )

That's all non-trivial. Common Lisp tried to have a very small Lisp syntax, but it still has about 25 special operators, function calls, macro calls, symbol macros, lambda expressions, lexical and dynamic binding, ...

A code walker then is mildly complex:


With reader macros, reading can be made arbitrarily complex:

  (read-from-string "#.(print :foo)")

> You can implement a macro system in 30 lines of Lisp. All you need is read, and it’s easy.

The linked pastebin isn't a macro system. It's merely a macroexpansion system, it needs to be evaluated. And it's not as simple as merely wrapping it in 'eval' because of subtleties in getting at the right lexical scope.

More generally, no fair claiming macros are easy because you managed to build them atop a lisp. You're using all the things other comments here refer to; claiming it's all 'read' is disingenuous.

I'm still[1] looking for a straightforward non-lisp implementation of real macros. The clearest I've been able to come up with is an fexpr-based interpreter: http://github.com/akkartik/wart

[1] From nearly 2 years ago: http://news.ycombinator.com/item?id=1468345

The currently front-paged Julia claims to have macros in a non-lisp. https://news.ycombinator.com/item?id=3606380

And there's Dave Moon's PLOT (Programming Language for Old Timers). http://news.ycombinator.com/item?id=537652

With the interesting critique that "objects" are better than s-expressions for representing sourcecode. (BTW, Moon did a lot of work on Lisp.)

I've too thought that s-expressions don't necessarily contain as much information as you'd want. Using Rich Hickey's word from "Simple Made Easy", maybe they're used to "complect" visual presentation and internal representation.

Then again, there's metadata...

Thanks for those links (wes-exp as well). But I meant a non-lisp implementation of lisp macros. Obviously common lisp and racket qualify, but I'd love to see an implementation that's as simple as possible without needing to be production-quality.

Also, is PLOT actually available? I think I must have come across it 3 times and searched for a download link without luck.

Last I heard, it's not — Moon didn't think his work was high enough quality to release, and invited others to do so.

Right now there is a divide among programmers. One one side you have people like the author who crave the power of code-as-data more than they care about nice syntax and therefore love Lisp. On the other side you have people who like more conventional syntax more than they care about code-as-data and therefore don't love Lisp.

Neither side can understand the other: one side says "why do you resist ultimate power?" and the other side says "how can you possibly think that your code is readable?"

My belief (and what I am starting to consider my life's work) is that the gap can be bridged. Lisp's power comes from treating code as data. But all code becomes data eventually; turning code into data is exactly what parsers do, and every language has a parser. The author says "it's about read," but "read" (in his example) is just a parser.

The author asks "How would you do that in Python?" The answer is that it would be something like this:

  import ast
  class MyTransformer(ast.NodeTransformer):
    pass  # Implement transformation logic here.
  node = MyTransformer().visit(ast.parse("x = 1"))
  print ast.dump(node)
This works alright, but what I'm after is a more universal solution. With syntax trees there's a lot of support functionality you frequently want: a way to specify the schema of the tree, convenient serialization/deserialization, and ideally a solution that is not specific to any one programming language.

My answer to this question might surprise some people, but after spending a lot of time thinking about this problem, I'm quite convinced of it. The answer is Protocol Buffers.

It's true that Protocol Buffers were originally designed for network messaging, but they turn out to be an incredibly solid foundation on which to build general-purpose solutions for specifying and manipulating trees of strongly-typed data without being tied to any one programing language. Just look at a system like http://scottmcpeak.com/elkhound/sources/ast/index.html that was specifically designed to store AST's and look how similar it is to .proto files.

(As an aside, programmers have spent the last 15 years or so attempting to use XML in this role of "generic language-independent tree structured serialization format," but it wasn't the right fit because most data is not markup. Protocol Buffers can deliver on everything people wanted XML to be).

Why should manipulating syntax trees require us to write in syntax trees? The answer is that it shouldn't, but this is non-obvious because of how inconvenient parsers currently are to use. One of my life's goals is to help change that. If you find this intriguing, please feel free to follow:


One one side you have people like the author who crave the power of code-as-data more than they care about nice syntax and therefore love Lisp.

I crave both the power of code-as-data and nice syntax, which is why I love Lisp.

Yes, I don't see how mainstream languages can possibly be considered aesthetically pleasant, with the possible exception of Python. I wouldn't visit a gallery that shows them off.

I find Lisp (particularly Clojure) much more aesthetically pleasant, in that it communicates better with me. With Paredit, it's even better to the touch.

(If there one day came to exist something even better on these metrics, then I'm sure I would start to prefer it aesthetically.)

I agree I don't get the syntax argument. I am sure that as one of the other posters mentioned that one is more favored than the other, but once you know the limited syntax of a Lisp it becomes fairly readable. For me it's all about figuring out scope, once I know what denoted scope, it is fairly easy to format the code into a readable format for my mind. The thing about the Lisp dialects is the rules for syntax are so simple that once understood they become perfectly readable, at least to me they do.

> I agree I don't get the syntax argument

Doubly agreed! I learned Clojure just over a year ago and will never look back. My attitude to people complaining about parents is that that should just get over it. That one hang up is actually holding them back.

That should have been 'parens', not 'parents', of course. Although, in another context, it's completely valid!

I'll explain it to you: Where is the bug in..

(defun substitute-in-replacement ($-value replacement) (cond ((null $-value) replacement) ((null replacement) ()) ((eq (car replacement) '$) (cons $-value (cdr replacement))) (T (cons (car replacement)) (substitute-in-replacement $-value (cdr replacement)))))

From: http://www.csc.villanova.edu/~dmatusze/resources/lisp/lisp-e... with one paren moved.

I remember, when I was taking a class on AI, looking for some sort of style guideline that would help me get through the learning curve, but the FAQ (I want to say it was comp.lang.lisp) just had "coming soon." So this would be the allegro editor in 2001 or 2002. It may be obvious to an experienced hand, and perhaps if there was some sort of best practices when I was learning it I wouldn't have had the same problem, but I just remember the frustration of my mind playing tricks on me and (even with syntax highlighting) trying to match parens that I thought were there.

1 paren moved and all line breaks and indentation removed. This is intentionally obfuscated. Is there any language that is easy to read when you put five lines of code together like This?

I haven't read a lisp style guide, Emacs just takes care of indentation - it is immediately clear when a paren is wrong because the shape of the function is wrong. If you are writing lisp with an editor that doesn't do this, get a better editor, don't blame the language.

Oy. The mistake was actually in posting too late and not using the proper code tag. Don't drink and post, kids. ;) When I brought the code into vim I used the same indentation as the example, I saw that the indentation changed, but you're saying "The shape of the function is wrong."

It's quite possible my experience as a programmer today would be different than when I started -- I mean, I made it through a few chapters of SICP without such troubles, but in the back of my head was the memory of trying to figure out my logic error in a bit of code when it was really a misplaced paren.

>>Is there any language that is easy to read when you put five lines of code together like This?

Arguably Python -- but to get that, Python sacrificed the possibility of both usable anonymous functions and the possibility to cut-paste a code fragment and just ask the editor to reindent.

Hardly worth the price.

The Lisp compiler tells you:

    (defun substitute-in-replacement ($-value replacement)
      (cond ((null $-value) replacement)
            ((null replacement) ())
            ((eq (car replacement) '$)
             (cons $-value (cdr replacement)))
            (T (cons (car replacement))
               (substitute-in-replacement $-value (cdr replacement)))))

    CL-USER 5 > (compile 'substitute-in-replacement)
    ;;;*** Warning in SUBSTITUTE-IN-REPLACEMENT: CONS is called with the wrong
    ;;;     number of arguments: Got 1 wanted 2
Lisp compilers able to present these error messages are in use since more than 40 years. Common Lisp has them since day one.

Took me less than a minute to find it. Just pasted the code into my Lisp editor (I use CCL), added a few line breaks in the obvious places, hit TAB a few times, and it was immediately obvious.

Actually, it's pretty obvious even without doing all that. CONS always takes two arguments.

My code doesn't look like that. For 30-50% of my code, I try to use a style more like:

  (defn thingies [id]
    (->> id
         (map :thingy)))
(Of course, my code is often more complex and messier than that, even when using ->>, but some fairly significant percentage of my code does look that simple.)

I'm sure there's stuff to criticize about Clojure, but we can look at real-world code in another mainstream language (Javascript+node.js? PHP? Java?) and point out readability problems too. (Python maybe being an exception in terms of readability-in-the-small, for things that fit in the mainstream style. Though as someone pointed out, there's maybe some problems with manipulability.)

Some people like putting salt on grapefruit. I'm not saying that it's impossible to like Lisp's syntax, but empirically most people prefer the ALGOL-like syntax, which is why I referred to it in the next sentence as "conventional syntax."

I could attempt to prove to you that "conventional syntax" is inherently superior to Lisp syntax, but that would be a waste of both of our time.

Some people like putting salt on grapefruit.

You sound as if you think that putting salt on grapefruit is inherently strange, while in actuality there's a very good reason to do so: it reduces the perception of bitterness.

I'm not saying that it's impossible to like Lisp's syntax, but empirically most people prefer the ALGOL-like syntax

Empirically, most people prefer what they are already familiar with, so I'm not sure what this is supposed to prove, other than most people are already more familiar with Algol-like syntax.

For me, Lisp syntax has the definitive advantage that the first identifier in every expression tells me what to expect. I.e., I don't have to scan to the right to figure out what kind of expression this is. For me, this makes code much more readable. And this makes Lisp syntax more "nice".

Presumably salt on grapefruit is thought to be weird since grapefruit is thought to be good for your heart, while the perception of salt is quite the opposite.

(+ 1 2 3) =

add 1 and 2 and 3

The syntax is a bit terse but if you teach people a good way to read it it becomes much more readable than 1+2+3

The only reason we prefer that way is that we are thought that syntax when we do math in school, I have found it much easier to teach people lisp who have no or very little formal education in math.

This argument is based on too shallow an analysis and doesn't stand up to closer examination.

  (/ (+ (- b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a))
Yeah, so it divides (the addition of (-b and the (sqrt of (the difference between (the product of b and b) and (the product of 4, a and c))) by (the multiplication of 2 and a))

Right, that's much easier than

  (-b + sqrt(b*b - 4*a*c)) / (2*a)
(-b plus the sqrt of ((b times b) - (4 times a times c))) divided by (2 times a)

New lines in the Lisp, PLEASE :)

I see you omitted some parenthesis in the "conventional" expression, relying on the fact that multiplication takes priority over substraction. Making this fact explicit is exactly what makes Lisp better, especially for more complex domains: delegating priorities to the notation, freeing brain capacity for the actual problem.

If math is a problem for the user, there is the option to use a modified parser. For example using an infix parser called by a readmacro:

    (defun foo (a b c)

          (-b + sqrt(b*b - 4*a*c)) / (2*a)


    CL-USER 8 > (foo 1 2 3)
    #C(-1.0 1.4142135)

Sure, there are solutions and it's awesome that they're both possible and easy to use. I'm not arguing against Lisp; I just don't agree that its syntax is better. I agree it is not worse, if you survey a sufficiently large variety of cases.

It may be bikeshedding, but I would not let 'blue is better than red, because the sky is blue' pass either.

Once you add indentation, and know the simple rule that args line up vertically (unless they're so short that you'd rather leave them), the following is pretty easy to read:

    (/ (+ (- b)
          (sqrt (- (* b b)
                   (* 4 a c))))
       (* 2 a))
It tells me:

* there's a quotient of 2 things

* the first thing is a sum of -b and a sqrt

* the second thing is a product

and so on. Pretty nice. Of course, mathematical notation is more terse.

Weak arguement. So rather than form a language around our existing learnings (in Math as you say). We should change all our existing learnings to suit a particular language... then it's more readable... riiiiight, good luck with that. :)

Syntax is the vietnam of programming languages.

> I could attempt to prove to you that "conventional syntax" is inherently superior to Lisp syntax, but that would be a waste of both of our time.

Yes, trying to prove falsehoods is a waste of time.

Conventional syntax is neither conventional nor suited to humans. (If it's "conventional", why isn't there more agreement as to what it is? If it's suited to humans, why aren't there more than 100 who actually know it for any given language?)

Lisp is effectively the parse tree in other languages. In the gap between human and machine, Lisp is closer to the machine. Some people like that, but the majority prefer a language that is closer to the human (and perhaps the closer you get to the human the less nice it gets depending on how you define "nice").

> the majority prefer a language that is closer to the human

Closer to "the human"? Do you know more than 3 people who know C++ operator precedence?

Humans don't handle operator precedence very well.

Why then has math (which is read and written only by humans) used infix notation for hundreds of years, whereas prefix/postfix notation were only developed in the 20th century and today are used only by Computer Scientists?

You don't have to know an entire operator precedence table to read and write idiomatic infix-notation code. Precedence is defined such that common expressions evaluate as people intuitively expect (a notable counterexample is "x & y == z" in C). Parentheses are always available to clarify more complicated expressions.

Humans who do a lot of math switch notations when convenient. For example, for addition we'll sometimes put a summation sign in prefix notation. For division we like to put the numerator above the denominator, a notation that's inconvenient in a programming language.

Come to think of it, humans usually add and subtract by stacking numbers vertically. I don't think you can point at infix notation as "the" human-friendly notation.

Have you seriously seen (non-computer) people write simple arithmetic as + 1 3 ?

This feels like a discussion based in fiction...

I've seen lots of people doing 1 enter 2 + on their HP calculators. Using them with RPL, reverse polish lisp.

Postfix notation is popular in the financial industries.

One could have made the same argument in favour of Roman numerals, in the face of the less familiar Arabic numerals

Math was written on paper long before there were computers. As a result, the use of infix notation was an act of necessity not a calculated decision. Now that we have computers and keyboards we should use prefix notation.

So you think we should teach in schools (from an example above):

(/ (+ (- b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a)) ?

Seems like a great idea to me.

I take it you think there is something intrinsically wrong with that idea?

Certainly, I would hope it would have been obvious, but we might each have different opinions of obvious!

I find it very hard, without bracket counting, to see exactly what the '+' and '/' bind to. With the more traditional:

(-b + sqrt(bb-4ac)) / (2a)

I find in only a glance I can tell what everything is binding to.

> I find in only a glance I can tell what everything is binding to.

It must be nice to live in a world with only 4 infix operators and expressions that have only 3 infix operators.

For example, lots of folks think that sqrt should be a prefix operator, not yet another function. I suppose you're going to assume that the top bar will serve as parentheses.

BTW "-b + sqrt(bb-4ac) / 2a" is the interesting expression. Is it "(-b + sqrt(bb-4ac)) / (2a)" or "-b + (sqrt(bb-4ac)) / 2a)" And, are you certain what "bb-4ac" means? (There's at least one major language where it doesn't mean "(bb)-(4ac)".)

We aren't talking about programming languages. We are talking about teaching math in school. In most of school mathematics, there are only 4 infix operators (well, and also the comparison operators).

> We are talking about teaching math in school. In most of school mathematics, there are only 4 infix operators (well, and also the comparison operators).

And that's how the exceptions swallow the rule. And, it's also how we get infix programming languages where that's definitely not true, and so on. Where should we make the switch?

Also, only four? What about set operations?

I would teach it like this:

  (/ (- (sqrt (discriminant a b c)) b)
     (* 2 a))

Why exactly is it a necessity to use infix on paper? And what exactly is the argument for using prefix just because we have computers? I'd argue quite the opposite, computers give use even more convenience to use whatever we like. I think it is a rather arbitrary choice, but it may relate to the prevalence of subject-verb-object in (spoken) languages; i.e., operators act like a verbs.

if you look at math on paper, that's definitely not 'infix'.

It's even stronger, in that mathematics generalized arithmetic algebra into groups, fields, rings and other things I don't understand. Examples of specific algebras include: boolean, relational and Kleene (aka regular expressions).

Other notations are used, but with a frequency similar to pre-fix (lisp) and post-fix (forth). "Associativity" (not affected by order of evaluation) only makes sense for in-fix.

But it really could just be familiarity, I guess. I can't see how to determine it either way. But regardless of the cause, there's overwhelming evidence that people, in fact, prefer in-fix.

> But regardless of the cause, there's overwhelming evidence that people, in fact, prefer in-fix.

How many people have seen anything other than in-fix? Of those, how many got a fair shot at an alternative?

> You don't have to know an entire operator precedence table to read and write idiomatic infix-notation code.

If it's "idiomatic", why is there such disagreement?

> Why then has math (which is read and written only by humans)

Convention has a lot of value. That said, mathematicians don't have to worry about getting things wrong. It's just paper, and they're happy to let humans fix up the errors.

> Parentheses are always available to clarify more complicated expressions.

Unnecessary parentheses are how humans deal with the fact that they can't handle infix.

Closer to a virtual machine which treats most things as function, nouns, and lists; fairly close to human thoughts. Naked and abstract and systematic hence easier to process, but beside cad/cdr name, I don't see too much machinery here.

here's IPL, an influence of lisp, also a list processing language (c/p from wikipedia)

  IPL-V List Structure Example
  L1	9-1	100
  100	S4	101
  101	S5	0
  9-1	0	200
  200	A1	201
  201	V1	202
  202	A2	203
  203	V2	0
How human LISP feels now ;) ?

I don't know Lisp, I only tried Clojure but I think this applies. I don't think it's about code as data per se as it is about the philosophy that there are no "special cases" outside the few special forms needed to bootstrap the language. This simplicity allows you to shape the language the way you want without having to consider the impact of your modifications on existing code. Consider the hell C# or Java teams have to go trough when they introduce things such as async, lambda, linq, etc. and how those features interact with the existing language. Consider implementing pattern matching for C# and all the edge cases. Even when if you had a open compiler it would be difficult. Python is no better, for eg. it has a rigid class/type model that has been abused more than once to provide metadata for ORM for eg. I once tried to extend a ORM functionality that was using metaclasses and multiple inheritance, the hell you go trough with metaclass ordering is insane, compared to writing a declarative DSL in clojure with a macro and leveraging existing ORM library. And consider when libraries overload operators to implement DSL's you immediately get in to problems with operator precedence. This problems don't go away even if you can redefine the precedence because it's not consistent with the rest of the language.

So what I'm saying complexity is significantly reduced when you have a small/consistent core. As for readability I think Clojure makes this better by providing different literals for vectors and maps and those literals have consistent meaning in similar situations so it provides nice visual cues. But immutability by default, clear scoping and functional programming make things like using significant whitespace and pretty conditional expression syntax bikesheding level details.

Protocol buffers are a reimplementation of some whizzy stuff we did at a messaging start-up circa 1994.

They were great for messaging . . . but we found ourselves using them /everywhere/. And since our stuff worked in many different environments (C++, Java, Visual Basic were the ones we directly supported), you could have your choice of language.

It's flattering to see this rediscovered, several times over :-)

Yep, I believe that it will continue being "rediscovered" until some well-executed, open-source embodiment of them becomes a de facto standard. My goal is to make Protocol Buffers just that.

Another way of putting it is that I'm trying to beat Greenspun's Tenth Rule by making that "half of Common Lisp" separable from Common Lisp so that C programs (and high-level programs too) don't have to keep re-inventing it. As a bonus, this will help make languages more interoperable too.

I wrote a (batshit crazy) Python library to do exactly what you're talking about. It converts Python ASTs into S-expressions (just Python lists) and allows you to apply 'macros' to methods just by decorators.

I guess it would've been helpful to actually link the project! https://github.com/daeken/transformana

I don't agree that "the other side" cares so much about conventional syntax. Syntax is a very superficial thing; I think it's rather that most of the time you don't want to have to deal with the power and complexity of something like Lisp. If you compare it with natural language, most of the time we don't speak in poetic or literary language, even though that might be the most beautiful; rather we naturally strive for efficiency and simplicity, using a much smaller vocabulary, re-using common expressions and being redundant etc.

I'm also in some kind of middle ground. I like the power lisp provides and dont mind the syntax, but I also like and appreciate what languages with rich syntax provide. On the other hand, I dont particularly love either one. I think the only way I'll truly be happy is if the gap is bridged without really giving up on either sides advantages.

But there are other factors which would make me happier with a language than closing the gap between expressive power and great syntax. For example, I would love if there were a language with nice syntax and good metaprogramming (eg python) that also had an unambiguous visual representation (something like eg Max) that you can switch between at will. Dunno how realistic that would be without adding complexity or ambiguity or ruining code formatting)

TXL is the best I've seen for transforming ASTs (sample: http://www.txl.ca/tour/tour9.html). As you can see, it's a little complex, which is because the problem is complex. I think they do a really good job. (TXL homepage http://www.txl.ca/)

Also: I recall Matz said Ruby was lisp with friendlier syntax (but I can't find the quote right now, so maybe he didn't).

I also think there is a way to have the best of both worlds, but I have so far taken a rather different approach. What do you think of what I've got here?

It's a very much a prototype, and my ideas have evolved a lot (towards lisp), but I'm at least curious what you think of the ideas in the README.

In your example you say "Once in memory, this structure can be easily outputted as CSS." I think you may be underestimating how hard this will be. CSS may look like it's just a bunch of maps, but there is more to it than that. Consider something funky like:

  #id > p a.red:visited {
    background: url(foo.png) white;
    margin: 0 3px 5em 80% ! important;
There's a lot going on here. CSS isn't just key/value maps.

Also, I don't think you want a data language to be Turing-complete. PostScript was Turing-complete but PDF is not; this makes PDF easier to deal with because it's easier to analyze and there's no risk of it getting into an infinite loop.

True, I'll confess I didn't know much about the more arcane CSS selectors when I wrote that (still don't). Complicated properties, though, are not too hard, and the important thing is that it starts off in a form that's easy to process.

I don't intend it to be just a data language. What I've been moving toward in my daydreams is a DAG of, for lack of a better word, function calls (some interesting data doesn't really fit in a tree), some of which are generators. If Turing-completeness is a problem in your context, you can reject some or all generators and/or just not evaluate them, i.e. take them as pure data. But I don't want to limit myself. I would have no problem if it turned into a general purpose language with a nice data-oriented subset.

You can use the protocol buffer schema language to define your ASTs if you want, but I think that addresses only a relatively small part of the problem.

There are two larger problems in adding Lisp-style macros to non-Lisp languages, one social and one technical.

The social problem is that language designers must be persuaded to publish a specification of the internal representation of the AST of their language. This makes the AST a public interface, one which they are committed to and can't easily change. People don't like to do this without a good reason.

The technical problem is more difficult, though. To make a non-Lisp language as extensible as Lisp would require making the parser itself extensible. This is not too hard to implement, but perhaps not so easy to use. If you've ever tried to add productions to a grammar written by someone else, you know it can be nontrivial. You have to understand the grammar before you can modify it.

And if you overcome the difficulties of having one user in isolation add productions to the grammar, what happens when you try to load multiple subsystems written by different people using different syntax extensions which, together, make the grammar ambiguous?

I don't know that these problems are insurmountable, but a few people have taken a crack at them, and AFAIK no one has produced a system that any significant number of people want to use.

It's worth taking a look at how Lisp gets around these problems. Lisp has not so much a syntax as a simple, general metasyntax. Along with the well-known syntax rules for s-expressions, it adds the rule that a form is a list, and the meaning of the form is determined by the car of the list -- and if it's a macro, even the syntax of the form is determined thereby.

Add a package system like CL's, and you get pretty good composability of subsystems containing macros. You can get conflicts, but only when you explicitly create a new package and attempt to import macros from two or more existing packages into it.

Applying these ideas to a conventional language gives us, I think, the following:

() While the grammar is extensible, all user-added productions must be "left-marked": they must begin with an "extension keyword" that appears nowhere else in the grammar.

() Furthermore, those extension keywords are scoped: they are active only within certain namespaces; elsewhere they are just ordinary names. This requires parsing itself to be namespace-relative, which is a bit weird, but probably workable.

I think that by working along these lines it might be possible to add extensible syntax to a conventional language in a way that avoids both the grammatical difficulty and the composition problem. And if you do that, maybe you can then get the relevant committees or whoever to standardize the AST representation for the language.

I've never taken a crack at all this myself, though, because I'm happy writing Lisp :-)

My goal is not to add Lisp-like macros to every language. That would be a bit bit presumptuous; not all languages want Lisp-like macros.

My goal is to make AST's as available and easy to traverse/transform as they are in Lisp. This is the foundation that makes things like Lisp's macros as powerful as they are. And easy access to AST's enables so many other things like static analysis, real syntax highlighting, and detecting syntax errors as you type.

In a way, Lisp-like macros are just a special-case of tree transformation that puts the tree transformer inline with the source tree itself. But this is not the only possible approach. You could easily imagine an externally-implemented tree transformer that implemented GCC's -finstrument-functions. This tree transformer could be written in any language; there's no inherent need to write it in C just because it's transforming C.

It's true that a complier/interpreter could be reluctant to expose their internal AST format. But there's no reason that the AST being traversed/transformed has to use the same AST schema that is used internally; if you can translate the transformed AST back to text it could then be re-parsed into a completely different format. And with a correctly implemented AST->text component, this would not be a perilous and fragile process like pure-text substitution is.

The author of Magpie has some interesting ideas about designing a language with extensible syntax. Sorry I can't find you a more specific link right away.


Have you looked at Nemerle?

So, I have a question. After reading this article and thinking about homoiconicity and macros, I remembered the Python source for the `namedtuple` function: http://dpaste.com/704870/

Is this considered a macro? Is it homoiconic? It's code as data and using input variables to generate code based on that input. It struck me as weird the first time I read through it but figured since I'm pretty stupid that there's a good reason for it.

It's not as powerful as what you could do with Lisp. The code is not a first-class object, so you can't for example substitute variable a for variable b. The "data" in code as data refers to an abstract syntax tree, not just a string which happens to contain code. Python does give you access to abstract syntax trees, but because of the complexity of the language this is less useful than with Lisp.

Thanks for helping with the distinction

Macros are a method to do programmatically what read can do lexically. Using macros avoids having to load strings into sexps and modify them and build callable constructs out of the modifications. With macros, you just define the modification as a transform on s-expressions and let the macro facility do the rest. The point is to modify the AST and while read can be abused to do that macros are designed to do just that.

Am I missing something, or should that be read-from-string, not read? My understanding is that read only operates on streams.

Technically, you are correct. Some schemes implement read as a simple function which takes a string. "Lisp" isn't referring to any specific lisp and more just the idea of it.

Hm, guile doesn't seem to like it either Anyway, the original article mentioned Common Lisp, so I assumed that's what you were working with in the examples.

I do a lot of data-driven development. I'm sure some of the time, 20%?, it'd be nicer if the code/syntax was directly data. But, 80%, it's all much easier to deal with it for me being separate.

I'm willing to give up that 20-25% to enjoy and be happy writing the other 80%.

I tried the example code in cygwin/clisp and it doesn't work. Way to come up with examples. At least mention which version it does work with?

The examples aren't valid Lisp or Scheme. "read" takes an input port not a string, as someone else mentioned, so you should replace it with "read-from-string" in Common Lisp or something like "(read (string->input-port "(1 (0) 1 (0) 0)"))" in Scheme. The second and third block of code is Scheme, not CL, so it won't work in clisp at all.

Thank you for taking the time to explain. In my opinion, these types off attitudes are what are causing beautiful languages to die horrible deaths. The blog post extolls the virtues of "Lisp", but fails to mention which dialect. Extremely off-putting to newcomers if I may say so. At least put a tl;dr up top saying "This is for experts, noobs GTFO!" ...

I believe the examples are in "outlet" the dude's lisp->js compiler (https://github.com/jlongster/outlet)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact