Right now there is a divide among programmers. One one side you have people like the author who crave the power of code-as-data more than they care about nice syntax and therefore love Lisp. On the other side you have people who like more conventional syntax more than they care about code-as-data and therefore don't love Lisp.
Neither side can understand the other: one side says "why do you resist ultimate power?" and the other side says "how can you possibly think that your code is readable?"
My belief (and what I am starting to consider my life's work) is that the gap can be bridged. Lisp's power comes from treating code as data. But all code becomes data eventually; turning code into data is exactly what parsers do, and every language has a parser. The author says "it's about read," but "read" (in his example) is just a parser.
The author asks "How would you do that in Python?" The answer is that it would be something like this:
This works alright, but what I'm after is a more universal solution. With syntax trees there's a lot of support functionality you frequently want: a way to specify the schema of the tree, convenient serialization/deserialization, and ideally a solution that is not specific to any one programming language.
My answer to this question might surprise some people, but after spending a lot of time thinking about this problem, I'm quite convinced of it. The answer is Protocol Buffers.
It's true that Protocol Buffers were originally designed for network messaging, but they turn out to be an incredibly solid foundation on which to build general-purpose solutions for specifying and manipulating trees of strongly-typed data without being tied to any one programing language. Just look at a system like http://scottmcpeak.com/elkhound/sources/ast/index.html that was specifically designed to store AST's and look how similar it is to .proto files.
(As an aside, programmers have spent the last 15 years or so attempting to use XML in this role of "generic language-independent tree structured serialization format," but it wasn't the right fit because most data is not markup. Protocol Buffers can deliver on everything people wanted XML to be).
Why should manipulating syntax trees require us to write in syntax trees? The answer is that it shouldn't, but this is non-obvious because of how inconvenient parsers currently are to use. One of my life's goals is to help change that. If you find this intriguing, please feel free to follow:
Some people like putting salt on grapefruit. I'm not saying that it's impossible to like Lisp's syntax, but empirically most people prefer the ALGOL-like syntax, which is why I referred to it in the next sentence as "conventional syntax."
I could attempt to prove to you that "conventional syntax" is inherently superior to Lisp syntax, but that would be a waste of both of our time.
You sound as if you think that putting salt on grapefruit is inherently strange, while in actuality there's a very good reason to do so: it reduces the perception of bitterness.
I'm not saying that it's impossible to like Lisp's syntax, but empirically most people prefer the ALGOL-like syntax
Empirically, most people prefer what they are already familiar with, so I'm not sure what this is supposed to prove, other than most people are already more familiar with Algol-like syntax.
For me, Lisp syntax has the definitive advantage that the first identifier in every expression tells me what to expect. I.e., I don't have to scan to the right to figure out what kind of expression this is. For me, this makes code much more readable. And this makes Lisp syntax more "nice".
Weak arguement. So rather than form a language around our existing learnings (in Math as you say). We should change all our existing learnings to suit a particular language... then it's more readable... riiiiight, good luck with that. :)
I see you omitted some parenthesis in the "conventional" expression, relying on the fact that multiplication takes priority over substraction. Making this fact explicit is exactly what makes Lisp better, especially for more complex domains: delegating priorities to the notation, freeing brain capacity for the actual problem.
Sure, there are solutions and it's awesome that they're both possible and easy to use. I'm not arguing against Lisp; I just don't agree that its syntax is better. I agree it is not worse, if you survey a sufficiently large variety of cases.
It may be bikeshedding, but I would not let 'blue is better than red, because the sky is blue' pass either.
> I could attempt to prove to you that "conventional syntax" is inherently superior to Lisp syntax, but that would be a waste of both of our time.
Yes, trying to prove falsehoods is a waste of time.
Conventional syntax is neither conventional nor suited to humans. (If it's "conventional", why isn't there more agreement as to what it is? If it's suited to humans, why aren't there more than 100 who actually know it for any given language?)
I agree I don't get the syntax argument. I am sure that as one of the other posters mentioned that one is more favored than the other, but once you know the limited syntax of a Lisp it becomes fairly readable. For me it's all about figuring out scope, once I know what denoted scope, it is fairly easy to format the code into a readable format for my mind. The thing about the Lisp dialects is the rules for syntax are so simple that once understood they become perfectly readable, at least to me they do.
Doubly agreed! I learned Clojure just over a year ago and will never look back. My attitude to people complaining about parents is that that should just get over it. That one hang up is actually holding them back.
I remember, when I was taking a class on AI, looking for some sort of style guideline that would help me get through the learning curve, but the FAQ (I want to say it was comp.lang.lisp) just had "coming soon." So this would be the allegro editor in 2001 or 2002. It may be obvious to an experienced hand, and perhaps if there was some sort of best practices when I was learning it I wouldn't have had the same problem, but I just remember the frustration of my mind playing tricks on me and (even with syntax highlighting) trying to match parens that I thought were there.
1 paren moved and all line breaks and indentation removed. This is intentionally obfuscated. Is there any language that is easy to read when you put five lines of code together like This?
I haven't read a lisp style guide, Emacs just takes care of indentation - it is immediately clear when a paren is wrong because the shape of the function is wrong. If you are writing lisp with an editor that doesn't do this, get a better editor, don't blame the language.
Oy. The mistake was actually in posting too late and not using the proper code tag. Don't drink and post, kids. ;) When I brought the code into vim I used the same indentation as the example, I saw that the indentation changed, but you're saying "The shape of the function is wrong."
It's quite possible my experience as a programmer today would be different than when I started -- I mean, I made it through a few chapters of SICP without such troubles, but in the back of my head was the memory of trying to figure out my logic error in a bit of code when it was really a misplaced paren.
My code doesn't look like that. For 30-50% of my code, I try to use a style more like:
(defn thingies [id]
(Of course, my code is often more complex and messier than that, even when using ->>, but some fairly significant percentage of my code does look that simple.)
Lisp is effectively the parse tree in other languages. In the gap between human and machine, Lisp is closer to the machine. Some people like that, but the majority prefer a language that is closer to the human (and perhaps the closer you get to the human the less nice it gets depending on how you define "nice").
Why then has math (which is read and written only by humans) used infix notation for hundreds of years, whereas prefix/postfix notation were only developed in the 20th century and today are used only by Computer Scientists?
You don't have to know an entire operator precedence table to read and write idiomatic infix-notation code. Precedence is defined such that common expressions evaluate as people intuitively expect (a notable counterexample is "x & y == z" in C). Parentheses are always available to clarify more complicated expressions.
Humans who do a lot of math switch notations when convenient. For example, for addition we'll sometimes put a summation sign in prefix notation. For division we like to put the numerator above the denominator, a notation that's inconvenient in a programming language.
Come to think of it, humans usually add and subtract by stacking numbers vertically. I don't think you can point at infix notation as "the" human-friendly notation.
Math was written on paper long before there were computers. As a result, the use of infix notation was an act of necessity not a calculated decision. Now that we have computers and keyboards we should use prefix notation.
> I find in only a glance I can tell what everything is binding to.
It must be nice to live in a world with only 4 infix operators and expressions that have only 3 infix operators.
For example, lots of folks think that sqrt should be a prefix operator, not yet another function. I suppose you're going to assume that the top bar will serve as parentheses.
BTW "-b + sqrt(bb-4ac) / 2a" is the interesting expression. Is it "(-b + sqrt(bb-4ac)) / (2a)" or "-b + (sqrt(bb-4ac)) / 2a)" And, are you certain what "bb-4ac" means? (There's at least one major language where it doesn't mean "(bb)-(4ac)".)
Why exactly is it a necessity to use infix on paper? And what exactly is the argument for using prefix just because we have computers? I'd argue quite the opposite, computers give use even more convenience to use whatever we like. I think it is a rather arbitrary choice, but it may relate to the prevalence of subject-verb-object in (spoken) languages; i.e., operators act like a verbs.
It's even stronger, in that mathematics generalized arithmetic algebra into groups, fields, rings and other things I don't understand. Examples of specific algebras include: boolean, relational and Kleene (aka regular expressions).
Other notations are used, but with a frequency similar to pre-fix (lisp) and post-fix (forth). "Associativity" (not affected by order of evaluation) only makes sense for in-fix.
But it really could just be familiarity, I guess. I can't see how to determine it either way. But regardless of the cause, there's overwhelming evidence that people, in fact, prefer in-fix.
Closer to a virtual machine which treats most things as function, nouns, and lists; fairly close to human thoughts. Naked and abstract and systematic hence easier to process, but beside cad/cdr name, I don't see too much machinery here.
here's IPL, an influence of lisp, also a list processing language (c/p from wikipedia)
IPL-V List Structure Example
Name SYMB LINK
L1 9-1 100
100 S4 101
101 S5 0
9-1 0 200
200 A1 201
201 V1 202
202 A2 203
203 V2 0
You can use the protocol buffer schema language to define your ASTs if you want, but I think that addresses only a relatively small part of the problem.
There are two larger problems in adding Lisp-style macros to non-Lisp languages, one social and one technical.
The social problem is that language designers must be persuaded to publish a specification of the internal representation of the AST of their language. This makes the AST a public interface, one which they are committed to and can't easily change. People don't like to do this without a good reason.
The technical problem is more difficult, though. To make a non-Lisp language as extensible as Lisp would require making the parser itself extensible. This is not too hard to implement, but perhaps not so easy to use. If you've ever tried to add productions to a grammar written by someone else, you know it can be nontrivial. You have to understand the grammar before you can modify it.
And if you overcome the difficulties of having one user in isolation add productions to the grammar, what happens when you try to load multiple subsystems written by different people using different syntax extensions which, together, make the grammar ambiguous?
I don't know that these problems are insurmountable, but a few people have taken a crack at them, and AFAIK no one has produced a system that any significant number of people want to use.
It's worth taking a look at how Lisp gets around these problems. Lisp has not so much a syntax as a simple, general metasyntax. Along with the well-known syntax rules for s-expressions, it adds the rule that a form is a list, and the meaning of the form is determined by the car of the list -- and if it's a macro, even the syntax of the form is determined thereby.
Add a package system like CL's, and you get pretty good composability of subsystems containing macros. You can get conflicts, but only when you explicitly create a new package and attempt to import macros from two or more existing packages into it.
Applying these ideas to a conventional language gives us, I think, the following:
() While the grammar is extensible, all user-added productions must be "left-marked": they must begin with an "extension keyword" that appears nowhere else in the grammar.
() Furthermore, those extension keywords are scoped: they are active only within certain namespaces; elsewhere they are just ordinary names. This requires parsing itself to be namespace-relative, which is a bit weird, but probably workable.
I think that by working along these lines it might be possible to add extensible syntax to a conventional language in a way that avoids both the grammatical difficulty and the composition problem. And if you do that, maybe you can then get the relevant committees or whoever to standardize the AST representation for the language.
I've never taken a crack at all this myself, though, because I'm happy writing Lisp :-)
My goal is not to add Lisp-like macros to every language. That would be a bit bit presumptuous; not all languages want Lisp-like macros.
My goal is to make AST's as available and easy to traverse/transform as they are in Lisp. This is the foundation that makes things like Lisp's macros as powerful as they are. And easy access to AST's enables so many other things like static analysis, real syntax highlighting, and detecting syntax errors as you type.
In a way, Lisp-like macros are just a special-case of tree transformation that puts the tree transformer inline with the source tree itself. But this is not the only possible approach. You could easily imagine an externally-implemented tree transformer that implemented GCC's -finstrument-functions. This tree transformer could be written in any language; there's no inherent need to write it in C just because it's transforming C.
It's true that a complier/interpreter could be reluctant to expose their internal AST format. But there's no reason that the AST being traversed/transformed has to use the same AST schema that is used internally; if you can translate the transformed AST back to text it could then be re-parsed into a completely different format. And with a correctly implemented AST->text component, this would not be a perilous and fragile process like pure-text substitution is.
I don't know Lisp, I only tried Clojure but I think this applies. I don't think it's about code as data per se as it is about the philosophy that there are no "special cases" outside the few special forms needed to bootstrap the language. This simplicity allows you to shape the language the way you want without having to consider the impact of your modifications on existing code. Consider the hell C# or Java teams have to go trough when they introduce things such as async, lambda, linq, etc. and how those features interact with the existing language. Consider implementing pattern matching for C# and all the edge cases. Even when if you had a open compiler it would be difficult. Python is no better, for eg. it has a rigid class/type model that has been abused more than once to provide metadata for ORM for eg. I once tried to extend a ORM functionality that was using metaclasses and multiple inheritance, the hell you go trough with metaclass ordering is insane, compared to writing a declarative DSL in clojure with a macro and leveraging existing ORM library. And consider when libraries overload operators to implement DSL's you immediately get in to problems with operator precedence. This problems don't go away even if you can redefine the precedence because it's not consistent with the rest of the language.
So what I'm saying complexity is significantly reduced when you have a small/consistent core. As for readability I think Clojure makes this better by providing different literals for vectors and maps and those literals have consistent meaning in similar situations so it provides nice visual cues. But immutability by default, clear scoping and functional programming make things like using significant whitespace and pretty conditional expression syntax bikesheding level details.
Protocol buffers are a reimplementation of some whizzy stuff we did at a messaging start-up circa 1994.
They were great for messaging . . . but we found ourselves using them /everywhere/. And since our stuff worked in many different environments (C++, Java, Visual Basic were the ones we directly supported), you could have your choice of language.
It's flattering to see this rediscovered, several times over :-)
Yep, I believe that it will continue being "rediscovered" until some well-executed, open-source embodiment of them becomes a de facto standard. My goal is to make Protocol Buffers just that.
Another way of putting it is that I'm trying to beat Greenspun's Tenth Rule by making that "half of Common Lisp" separable from Common Lisp so that C programs (and high-level programs too) don't have to keep re-inventing it. As a bonus, this will help make languages more interoperable too.
I wrote a (batshit crazy) Python library to do exactly what you're talking about. It converts Python ASTs into S-expressions (just Python lists) and allows you to apply 'macros' to methods just by decorators.
I don't agree that "the other side" cares so much about conventional syntax. Syntax is a very superficial thing; I think it's rather that most of the time you don't want to have to deal with the power and complexity of something like Lisp. If you compare it with natural language, most of the time we don't speak in poetic or literary language, even though that might be the most beautiful; rather we naturally strive for efficiency and simplicity, using a much smaller vocabulary, re-using common expressions and being redundant etc.
I'm also in some kind of middle ground. I like the power lisp provides and dont mind the syntax, but I also like and appreciate what languages with rich syntax provide. On the other hand, I dont particularly love either one. I think the only way I'll truly be happy is if the gap is bridged without really giving up on either sides advantages.
But there are other factors which would make me happier with a language than closing the gap between expressive power and great syntax. For example, I would love if there were a language with nice syntax and good metaprogramming (eg python) that also had an unambiguous visual representation (something like eg Max) that you can switch between at will. Dunno how realistic that would be without adding complexity or ambiguity or ruining code formatting)
In your example you say "Once in memory, this structure can be easily outputted as CSS." I think you may be underestimating how hard this will be. CSS may look like it's just a bunch of maps, but there is more to it than that. Consider something funky like:
There's a lot going on here. CSS isn't just key/value maps.
Also, I don't think you want a data language to be Turing-complete. PostScript was Turing-complete but PDF is not; this makes PDF easier to deal with because it's easier to analyze and there's no risk of it getting into an infinite loop.
True, I'll confess I didn't know much about the more arcane CSS selectors when I wrote that (still don't). Complicated properties, though, are not too hard, and the important thing is that it starts off in a form that's easy to process.
I don't intend it to be just a data language. What I've been moving toward in my daydreams is a DAG of, for lack of a better word, function calls (some interesting data doesn't really fit in a tree), some of which are generators. If Turing-completeness is a problem in your context, you can reject some or all generators and/or just not evaluate them, i.e. take them as pure data. But I don't want to limit myself. I would have no problem if it turned into a general purpose language with a nice data-oriented subset.
It's not really about read either, it's about this:
> Wait a second, if you take a look at Lisp code, it’s really all made up of lists:
Haskell has `read`, most of your data types can just derive Read and Show and they'll "magically" get a representation allowing you to `read` and `show` them.
But that only works for datatypes, you can't do that with code.
In Lisp you trivially can, because code is represented via basic datatypes (through a reader macro if needed).
It's not macros. It's not read. It's much more basic than that: it's homoiconicity. From that everything falls out, and without that you need some sort of separate, special-purpose preprocessor (whether it's a shitty textual transformer — as in C — or a more advanced and structured one — as with Camlp4 or Template Haskell — does not matter).
And I don't get why the author got the gist (and title) so wrong when he himself notes read is irrelevant and useless in and of itself:
`read` does not matter if the language isn't homoiconic, you can `read` all you want it won't give you anything.
Yep, you're absolutely right. I was mainly offering a different viewpoint that might be refreshing to those who have always heard the "code is data and data is code" statement, but never really understood it.
Focusing on `read` was a way to anchor my article, even if it truly isn't about read either. I tried to tie that together at the end.
I disagree with this article's perspective, but I understand what the author intended to say and to a certain extent I like how the idea is presented.
Yes, Lisp's power comes from the embodiment of code and data together in one manner, and the ability to treat them this way when writing code is good, but `read` is a coincidence of that power, not a demonstration of how it is used. Macros are the method by which we harness the power of homoiconicity in an efficient, powerful manner.
As masklinn put it, it's really about the homoiconicity of Lisp. But with that said, I'm not convinced at how great that is. While macros in general can be useful, is homoiconicity generally a good thing?
It's rarely the case that I want a single representation for all my data -- and if we treat code as data, do I want a representation that is indistinguisable from all my other data?
While I can appreciate the AST form of s-exprs I also do like the richness of many standard languages -- and the semantic richness of their ASTs.
Lastly, treating code as data (and vice-versa) has been the bane of many programmers of days past. Go back 40 years and you can find many developers who did treat code as data (it was all actually viewed as sequence of bits by many) and this caused no end of problems. In most modern systems there are often safeguards to specify data and code segments and ensure that you don't treat one as the other. While not completely analogous to Lisp macros, it does show that you tread dangerous ground when you attempt to treat all forms of data as indistinguishable.
Given the special purpose nature of code, I don't mind (and actually appreciate) a well thought through syntax, and a special set of functionality to interact with it -- as I do most special purpose forms of data.
In emacs I represent everything in Lisp and I use separate color schemes for SXML documents and code. This effectively allows me to distinguish between these different classes of data. Furthermore, emacs has different sets of functionality for different types of data, so I feel that I have all the features you mentioned already.
Unfortunately, in practice, it's not really the case that "it’s really easy to parse Lisp code". Yes, you can do it in some cases, but to do it correctly in general, at least in CL, is the somewhat infamous "code-walker" problem, which needs to do all sorts of strange things:
Do you handle all varieties of lambda lists? recognize and descend into all the special forms? what do you do with macros? expand them (a mess)? try to walk into special-cased standard ones like 'loop' and ignore user-defined ones?
These are all issues of compilation, not reading. Reading Lisp code into cons-pairs, symbols, numbers, etc. is not that hard at all: you can do it in about 100 lines of code. Lisp can be trivially parsed by a recursive-descent parser. "'Recursive-descent'," as the UNIX-HATERS Handbook says, "is computer science jargon for 'simple enough to write on a liter of Coke.'"
Usually resolving things like which identifiers refer to which kinds of things is considered part of the parsing step, not compilation; for example, the is-it-a-typedef-or-not resolution problem is frequently cited as the reason C's grammar isn't context-free. And you can't do even that level of parsing---determining which symbols are function calls and which aren't---for Lisp without expanding macros, or special-casing how to descend into known ones.
Macro-expansion is sometimes considered part of compilation, sometimes a separate stage between parsing and compilation. I've never seen it referred to in Lisp as a part of parsing, though I guess it might be possible to do that with C's text-based macros.
`read` does not perform macro-expansion: that would break data reading and the reading of quoted forms. macroexpand expands macros at a later stage. Once expanded, macros either refer to primitive special forms or function calls, and it's trivial to determine which. Primitive special forms can have their components macroexpand'd as appropriate. Function calls can be left as-is to be compiled (well, the arguments can be macroexpand'd). Eventually there will just be primitive special forms and raw function calls left, ready to be handed to the compiler.
I can buy that in terms of definitions. My objection is more based on the common use-case of "parsing" Lisp in this fashion to do some kind of source-to-source transformation, like the example in this post, which I've also wanted to do. The post saves itself by only replacing the first symbol in a form. But what if you wanted to do something fairly simple like replace every call to foo with some other bit of code?
In idiomatic Lisp code, you either miss lots of the calls, or you have to complicate your code-walker significantly. This is especially the case if you write CL in the (common, but not universal) style that makes significant use of the loop macro, because you either ignore it as a macro, and consider anything inside it opaque until macro-resolution time (because you don't know what it does to its arguments), or you special-case it as a new bit of CL syntax, in which case your parsing is now fancier. Usually you want something like the latter, because source-to-source transformations expect to also replace things inside loops. Same with, say, special-casing setf forms, if you want source-to-source transformations to "do what I mean" in a large number of cases.
It's true that it's very easy to literally get the list representing the code, but there's precious little sensible you can do with that list unless you're willing to descend into some of the more commonly used built-in macros that most CLers treat as de-facto syntax, which requires knowing something about the syntax they in effect define.
That's all non-trivial. Common Lisp tried to have a very small Lisp syntax, but it still has about 25 special operators, function calls, macro calls, symbol macros, lambda expressions, lexical and dynamic binding, ...
No, you don't READ Lisp. You READ S-Expressions. The Lisp syntax is defined on top of that. To parse Lisp you need to understand the complex Lisp syntax - sometimes done with a code-walker. Trivial parsing can be done with primitive Lisp functions, but a complete parser is not that easy.
> You can implement a macro system in 30 lines of Lisp. All you need is read, and it’s easy.
The linked pastebin isn't a macro system. It's merely a macroexpansion system, it needs to be evaluated. And it's not as simple as merely wrapping it in 'eval' because of subtleties in getting at the right lexical scope.
More generally, no fair claiming macros are easy because you managed to build them atop a lisp. You're using all the things other comments here refer to; claiming it's all 'read' is disingenuous.
I'm still looking for a straightforward non-lisp implementation of real macros. The clearest I've been able to come up with is an fexpr-based interpreter: http://github.com/akkartik/wart
With the interesting critique that "objects" are better than s-expressions for representing sourcecode. (BTW, Moon did a lot of work on Lisp.)
I've too thought that s-expressions don't necessarily contain as much information as you'd want. Using Rich Hickey's word from "Simple Made Easy", maybe they're used to "complect" visual presentation and internal representation.
Thanks for those links (wes-exp as well). But I meant a non-lisp implementation of lisp macros. Obviously common lisp and racket qualify, but I'd love to see an implementation that's as simple as possible without needing to be production-quality.
So, I have a question. After reading this article and thinking about homoiconicity and macros, I remembered the Python source for the `namedtuple` function: http://dpaste.com/704870/
Is this considered a macro? Is it homoiconic? It's code as data and using input variables to generate code based on that input. It struck me as weird the first time I read through it but figured since I'm pretty stupid that there's a good reason for it.
It's not as powerful as what you could do with Lisp. The code is not a first-class object, so you can't for example substitute variable a for variable b. The "data" in code as data refers to an abstract syntax tree, not just a string which happens to contain code. Python does give you access to abstract syntax trees, but because of the complexity of the language this is less useful than with Lisp.
Macros are a method to do programmatically what read can do lexically. Using macros avoids having to load strings into sexps and modify them and build callable constructs out of the modifications. With macros, you just define the modification as a transform on s-expressions and let the macro facility do the rest. The point is to modify the AST and while read can be abused to do that macros are designed to do just that.
The examples aren't valid Lisp or Scheme. "read" takes an input port not a string, as someone else mentioned, so you should replace it with "read-from-string" in Common Lisp or something like "(read (string->input-port "(1 (0) 1 (0) 0)"))" in Scheme. The second and third block of code is Scheme, not CL, so it won't work in clisp at all.
Thank you for taking the time to explain. In my opinion, these types off attitudes are what are causing beautiful languages to die horrible deaths. The blog post extolls the virtues of "Lisp", but fails to mention which dialect. Extremely off-putting to newcomers if I may say so. At least put a tl;dr up top saying "This is for experts, noobs GTFO!" ...