
Just what does “code as data” mean anyway? (2014) - jxub
https://adambard.com/blog/what-is-homoiconicity/
======
platz
Code is data does not come from lisp or macros, those are simply affordances.

It comes from computer hardware architecture.

The von Neumann Architecture is named after the mathematician and early
computer scientist John von Neumann. von Neumann machines have shared signals
and memory for code and data. Thus, the program can be easily modified by
itself since it is stored in read-write memory.

Any self-modifying program, even if it is written in Borland Delphi, expresses
this code is data capability.

~~~
AnimalMuppet
It's even deeper than that. It's not just how code is handled in a von Neumann
architecture, it's how code is created.

Data is just a sequence of bytes. Source code is also just a sequence of bytes
- it's just a data file with a particular structure and intent. And then an
executable is just a sequence of bytes - it's another data file, though on
Unix it's one with the executable permission set. A compiler reads in a data
file and writes a data file - they just happen to be source code and
executable programs, respectively.

(Now, when Lisp people say "code is data", they mean much more than compile
time. For other languages, that is less true, other than self-modifying
code...)

~~~
platz
The hardware is more fundamental. (Unless you want to invoke mathematics e.g.
term rewriting)

The compiler/seq of bytes example requires von neumann, because in that case
it is the behavior of the _whole system_ that is self-modifying.

You could not have the beautiful executable and interchangable files of bytes
without von neumann.

~~~
AnimalMuppet
I don't think that's right. There have been systems that were Harvard
architecture (Motorola 88000, I think, and definitely others). But the code to
run on them still started out as a sequence of bytes in a text file, that is,
as data, and a compiler still took that data file and wrote another data file
that was the executable image.

~~~
platz
Sure, I guess

------
DannyB2
Code is data.

In Java, I could use a bytecode manipulation library to read in a class file,
manipulate it by using the library's API to add, modify, delete members of the
class. Then write the class back out as a file.

Code becomes data and is manipulated as date and written back out.

In Lisp 'code is data' is so much more natural because code _IS_ the most
elementary data structures of the language. No need for a library to
manipulate code behind an API. The API is the language primitives. The code
_is_ the data structure you manipulate using language primitives.

------
peatmoss
And, as I wrote in another thread yesterday, it allows you to work with your
text editor at a higher level of abstraction than non-lisp-coding people may
have seen before
([https://news.ycombinator.com/item?id=16386380](https://news.ycombinator.com/item?id=16386380)).

Rather than thinking in letters, words, lines, or paragraphs, editor modes
like paredit let you edit in terms of the structure of your code. I find this
really hard to give up after lispy sessions.

~~~
jstimpfle
I really value C syntax. I'm convinced it's much easier to navigate with the
eyes than "oatmeal with fingerclips mixed in" (quoting Larry Wall).

And >90% of the time the relevant syntactic unit is on its own line or ranges
of lines, and that's really super easy to handle in vim. I don't think there
is much to improve by building more complex abstractions on top.

If you use vim as a C programmer, you should know the basic line operations
like dd, p, and Shift+V (line range select). Also, I often use just { and } to
navigate to the next/previous empty line. These keyboard shortcuts get those
>90% covered.

Beyond that, you can use Control+V (block select), and XaY where X can be c
(change) or d (delete) or v (visual select), and Y can be things like "
(string literals), w (identifier), { (braced block) or ( (parenthesized
expression).

I don't disagree that if all you have is parens, then you want something like
paredit. But for a C programmer there is no need for such a thing.

~~~
peatmoss
I find C syntax awkward, bloated, inconsistent, and gross compared to lisp
syntax, but that’s an aesthetic preference.

Beyond the personal preferences of individuals (individuals who probably
started with one kind of syntax or another) there are characteristics that
have some objective utility.

I appreciate that a lot of people start with C-ish syntax, and a similarly
large number of people prefer that syntax, but I’ve never heard anything that
demonstrated an intrinsic benefit of C syntax.

~~~
nickpsecurity
"I appreciate that a lot of people start with C-ish syntax, and a similarly
large number of people prefer that syntax, but I’ve never heard anything that
demonstrated an intrinsic benefit of C syntax."

Short stuff is easier to type and maybe to read. There's that. Far as its
"design," it was made by tweaking BCPL to make it run on a PDP-7 and then
PDP-11. The assignment change was admitted as personal preference. BCPL itself
was an ALGOL with LISP features that had every feature for safety,
maintainability, etc chopped off to compile on a terrible piece of hardware
they were stuck with. There was little to no design: can't overemphasize they
literally just kept what that one machine could compile. Far as C, even
structs originally weren't in it but got added after their failed attempts to
port UNIX from assembly. Presentation below has proof from historical papers
written by BCPL and C inventors.

[https://vimeo.com/132192250](https://vimeo.com/132192250)

People just assume there was sensible design because of all the C code out
there (argument from popularity). The brain then starts rationalizing
attributes about it that were designed or hacked in for totally different
reasons in a past of constrained hardware lacking knowledge or tools of
modern, language design. That context no longer applies to most users of C.
It's just myth-making by users reinforcing use of it.

~~~
jstimpfle
> The brain then starts rationalizing attributes about it that were designed
> or hacked in for totally different reasons in a past of constrained hardware
> lacking knowledge or tools of modern, language design. That context no
> longer applies to most users of C. It's just myth-making by users
> reinforcing use of it.

That's just _your_ rationalization. I don't think it's commonly claimed that
all was set in stone from the beginning. The history is there for everyone to
read.

Another possible explanation is that C is so minimal (in spirit) and doesn't
get in the way, people are able to pull of impressive things, which makes them
love C.

And now why exactly isn't (the gist of) C sensible design? I fail to see your
argument. By the way, please enjoy this cool video:
[https://www.youtube.com/watch?v=khmFGThc5TI](https://www.youtube.com/watch?v=khmFGThc5TI)

------
tombert
When I was first learning C, I wondered the point of macros; especially with
inline functions.

It wasn't until I taught myself Racket and Clojure a few years later that I
realized the utility, and it was immediately one of those "the world is
different now" moments; I could actually augment the language without having
to contribute to the core compiler.

~~~
yetanotheruser
I'm still wrapping my head around macros, but I have been able to come up with
a couple use cases that I quickly realized were already covered by existing
macros, the most recent one being a less flexible -> macro.

Anyways, I kept reading places that lisp macros and c macros were completely
distinct, which seems untrue after a couple years of trying clojure.

~~~
kbp
> I kept reading places that lisp macros and c macros were completely
> distinct, which seems untrue after a couple years of trying clojure.

They're similar in that they're both compile-time functions that generate
code, but they're very different in practice, because C macros are written in
a text substitution language that knows virtually nothing about C, which
naturally makes writing even simple macros very error-prone. Lisp macros, on
the other hand, are written in plain old Lisp and receive regular Lisp data
structures that you can use the whole language to operate on.

A lot of people learn from C that macros are very difficult to get right, so
it's important when they move to Lisp that they give them a second look,
because Lisp is a much more capable language for code transformation than cpp.

~~~
yetanotheruser
Right, that makes sense

------
mpweiher
The Elixir getting started link is now at: [https://elixir-lang.org/getting-
started/introduction.html](https://elixir-lang.org/getting-
started/introduction.html)

------
dogfishbar
M-LISP: a representation-independent dialect of LISP with reduction semantics

In this paper we introduce M-LISP, a dialect of LISP designed with an eye
toward reconciling LISP's metalinguistic power with the structural style of
operational semantics advocated by Plotkin [28]. We begin by reviewing the
original definition of LISP [20] in an attempt to clarify the source of its
metalinguistic power. We find that it arises from a problematic clause in this
definition. We then define the abstract syntax and operational semantics of
M-LISP, essentially a hybrid of M-expression LISP and Scheme. Next, we tie the
operational semantics to the corresponding equational logic. As usual,
provable equality in the logic implies operational equality. Having
established this framework we then extend M-LISP with the metalinguistic eval
and reify operators (the latter is a nonstrict operator that converts its
argument to its metalanguage representation). These operators encapsulate the
metalinguistic representation conversions that occur globally in S-expression
LISP. We show that the naive versions of these operators render LISP's
equational logic inconsistent. On the positive side, we show that a naturally
restricted form of the eval operator is confluent and therefore a conservative
extension of M-LISP. Unfortunately, we must weaken the logic considerably to
obtain a consistent theory of reification.

------
divs1210
PigPen is an amazing wrapper over Pig that shows why and how macros are
important and useful.

[https://github.com/Netflix/PigPen](https://github.com/Netflix/PigPen)

------
quadcore
_See what I did there?_

Despite the good intention, no, because you're trying to explain lisp using
lisp. If I knew lisp, I wouldnt need you to explain what code-as-data means.

Maybe it's time for a different strategy. As a matter of fact, I've tried to
explain code-as-data using OOP just this morning:
[https://news.ycombinator.com/item?id=16389514](https://news.ycombinator.com/item?id=16389514)

~~~
3pt14159
Your linked comment is great but it sweeps one thing under the rug.

There is a reason lisp didn't win. We put up with it (I use emacs, so I have
to write it at times) but in general it's harder to reason about transformable
lists of code than it is about plain old objects and their methods. Almost
every problem elegantly solved in lisp has a parallel elegant solution in Ruby
(using blocks, dynamic method definition, or otherwise) and I generally find
the resulting code both more maintainable as well as more readable. Sometimes
with lisp things get "spooky" in a way that just doesn't happen in practice
with Ruby, even if it theoretically possible to say redefine String#inspect.
Though I completely agree that inheritance gets clunky and I strongly favour
composition.

~~~
quadcore
I agree, OO is ultimately good enough. Problem is, coding using OOP is like
printing ideas on paper: you dont wanna do that because it's hard to discard
ideas that's been printed on paper. You wanna use _post-its_ instead because
post-its are easy to discard. _Refactoring_ is hard using OO and _refactoring_
is the key difference between waterfall and good software development.
Refactoring is easier, even fun, using functional programming because the
medium is more malleable.

~~~
Retric
Refactoring lisp gets harder the more complex the program gets. Approachs that
are clever and reasonable for a 1000 line program can become hell in a 100,000
line program.

Effectively programming languages get less powerful the longer the program
becomes. Breaking up programs into more powerful little pieces is never a
clean separation and you need to make real trade offs between more separation
and more power.

~~~
DannyB2
> Refactoring lisp gets harder the more complex the program gets.

I have found that to be true in dynamically or duck typed languages.

I have found it to be not true in strongly typed languages with good tooling.
Example: refactoring large java program in, say, Eclipse.

> Breaking up programs into more powerful little pieces is never a clean
> separation

I would tend to disagree, to some extent.

In Lisp(s), there are lots of small, sometimes obviously correct functions
that do powerful operations on abstract data structures. Part of the reason
for this is that unlike OOP (and I write Java daily) which has lots of data
structures, Lisp has very few. In OOP you model your domain with new classes.
In Lisp you model your domain with combinations of a few basic structures. (In
Clojure...) List, Set, Array, Map.

Maybe you have layers of separation. With one layer being between primitives
that manipulate your domain objects, and another layer being the logic that
decides how and when to do the manipulations. I want to solve a puzzle. The
puzzle board has various states, pieces, and operations that transform a board
into a new board state. Then the logic layer is an algorithm or algorithms
written using the layer of primitive manipulations. That logic layer may, in
fact, use canned algorithms, because the data structures are common
primitives. Say, sort, search, A* search, depth first, breadth first, reduce,
map fn over a sequence, etc.

You can end up with a large program. But it can be a lot easier to reason
about than, say, a large Java program. It seems to take more deliberate effort
to keep a large java program easy to reason about.

~~~
Retric
> layers of separation

Layers are exactly the kind of power trade-off I am talking about. If code X
can't(1) impact code Y, you have less freedom in code X because it can't do
some things, but can more easily reason about code Y.

(1) either as an actual limit or a self imposed one.

Granted, in theory you may be able to find perfect points of separation such
that you don't lose power by doing so.

------
accatyyc
I think there’s a much simpler example of how code is data:

    
    
        (hello world)
    

That’s a list containing two symbols. So it’s data. However, if I evaluate
that list, it will call the function “hello” with the argument “world”. So
it’s also code.

Incidentally, lists and function calls are identical in lisp, hence code is
data.

I think the article just complicates it by introducing macros.

~~~
rootlocus

        '(hello world)
    

is a list

    
    
        (hello world)
    

is a function call

~~~
Karunamon
Well, technically they're both lists, it's just that the ' prefix makes it a
list literal rather than having the compiler try to macro expand it. A bit
pedantic, but of massive importance when macros are in play.

~~~
tincholio
Evaluate it, not macro expand it ;)

------
jbrennan
See also Mark Miller’s excellent blog post on the same topic, which really
turned on a few lightbulbs for me:
[https://tekkie.wordpress.com/2010/07/05/sicp-what-is-
meant-b...](https://tekkie.wordpress.com/2010/07/05/sicp-what-is-meant-by-
data/)

------
err4nt
That example was great - I've been inspired by LISP and have been playing
around with my own idea of an HTML-building set of functions already this
week: [http://staticresource.com/html-
js-4.html](http://staticresource.com/html-js-4.html)

It's great to see such a relevant example :D

------
whitten
For me, "code as data" means that the code I write provides a particular
structure (whether to create a report, or go through an editing task or find
particular information in the computer). I then use a table-driven "data"
approach to finding the particulars that are significant for this particular
class of information. That might mean that I have the unique code stored in a
table which I lookup using a relevant key, hence "table-driven code", or I
have "setup code", "one iteration", and "teardown code" accessible through an
indirect call because the structure of the code to do the task can stay the
same. It is also common to have a general "event" fire at the end of the code
to inform any other programs that need to know this data structure or real-
world event has been accounted for. I think this approach is amentable to
better testing, and can be viewed as a higher-level abstraction similar to
LISP macros. (Even though my language of choice - MUMPS) doesn't have LISP-
style syntactic macros.

------
sbov
As someone who uses Clojure for only side projects, macros seem to get a lot
of attention, both good and bad, for something I very rarely write. Maybe I'm
missing something though, and programmers using these languages professionally
resort to them more often than I do.

~~~
dustingetz
code-is-data also enables structural editing–instead of typing strings we can
transform data–, there are gifs of that here: [https://cursive-
ide.com/userguide/paredit.html](https://cursive-
ide.com/userguide/paredit.html)

------
gameswithgo
Does anyone have some examples of great products that have been built largely
by leveraging lisp macros?

emacs?

~~~
flavio81
>Does anyone have some examples of great products that have been built largely
by leveraging lisp macros?

Common Lisp itself.

The Common Lisp language is not only made up by data types, functions and
control constructs: Many of the typical keywords a Common Lisp programmer
would use, like defun (for defining a function), are macros themselves.

Many control constructs are macros as well.

So a good part of the language is built using macros as well.

The compiler itself (compiling Lisp to machine language) is also mostly made
up of macros in many Lisp implementations.

Thus, the answer is: Common Lisp implementations are built largely by
leveraging lisp macros. And i'd say those are "great products."

------
scottmsul
Would it be possible to do the same thing in Haskell using partial function
application, where the first argument is the name of the tag and the second
argument makes the "<" ">" tags and applies string concatenation?

~~~
kamaal
Haskell comes from a family of languages called ML. And ML was known as Lisp
without parentheses.

So its again Lisp all over again.

~~~
andolanra
I don't know that I've ever seen someone call ML "Lisp without parentheses".
Regardless, even with some familial relationship and a few principles in
common (like a commitment to functional programming) the two lineages are far
more different than they are alike: Lisp has a powerful dynamic core and
intricate metaprogramming capabilities, while Haskell builds on a powerful
type system and non-strict evaluation. Saying that Haskell is "Lisp all over
again" is sort of like saying that cars are "trains all over again": a
statement so reductive, it's somewhere between wrong and nonsensical!

~~~
kamaal
>>I don't know that I've ever seen someone call ML "Lisp without parentheses".

My bad. It's called Lisp with types. From:
[https://en.wikipedia.org/wiki/ML_(programming_language)](https://en.wikipedia.org/wiki/ML_\(programming_language\))

Now. Haskell does look like a impractical scheme. Beyond that, today if you
want it, you have typed racket.

For a lot of people you could just go ahead use Racket/Lisp instead of Haskell
instead.

------
marknadal
This is __exactly __what we mean as "code as data" and why it is so profound
and has such exciting potential:

vimeo.com/208899228/b9bc9eaaa4#t=13m50s

"The Future of Programming and Databases" ^ JSRemote Conf / NodeJS Italy

------
segmondy
Best language to understand this is prolog.

dog(fido).

This is data and it's also code.

It's data and code that state's that there's a dog called fido.

I could have written it as

exists_dog(fido).

------
junke
Why call `(map eval inner)`?

------
didibus
In Java, you have code as string. That is, your code is represented using a
big string. In Lisp, your code is represented using the list data structure.

So its really code-as-data-structure.

A string is difficult to parse and modify, inserting things in the middle,
removing elements, changing the order of the words, that's all really
difficult with a string. So if you want to transform Java code, its going to
be hard and error prone.

A list is easy to manipulate in contrast. Inserting elements in the middle is
trivial, so are deletes and swaps. So in Lisp, if you want to transform code
its pretty easy.

Meta-programming is when you write a program that writes a program. An example
is say you want to you wanted to add a semi-colon at the end of all your lines
of code. You need a macro to do it for you. A macro is a program that acts
upon your code to transform it. Eclipse has them. So now its really easy to
add a semi-colon at the end of each line. What if you wanted to add a comma
between all words in a selection? Now its trickier if your macro operates over
a big string, you might need a regex for example. This is meta programming
though. Instead of adding the commas yourself, which are required for the
program to run, you write another program to add them for you.

Now if all the words were elements in a list, that macro would be a lot easier
to write.

This is in essence what code-as-data(structure) means. Its in contrast with
code-as-string. You don't have to choose lists as your datastructure either,
as long as its something that allows you to represent a turing complete
program and is easy to manipulate.

Now, homoiconicity is the fact that your text of code looks like a data-
structure too. Making it trivial to parse it into one. So back to my example,
you could parse the java code string into a list, and then add commas, and
then convert it back to a string. But Java code doesn't map logically into a
list. Some construct don't nest like lists, and how do you define what goes
into each node of the list? Do you group public and String together? In Lisp,
the syntax is an unambiguous AST already, its thus trivial to parse into a
list of lists. So you can easily get that AST you need to easily add commas
where it make sense.

Finally, there's a third aspect. Code-as-data also implies that your language
can accept code as argument in the form of raw data. The best way to think of
it is, how would you send a function over the wire to a program and have that
program run the code I sent? You need a way to serialize that function, which
is code, into raw data that can be transmitted over the wire. The receiving
program doesn't have that function defined, so it needs to know how to
deserialize it, but also at runtime it must be able to take this raw data,
which represents code, and be able to parse it, compile/interpet it and
execute it.

Think of SQL, SQL is often used in a code-as-data way. You want "select * from
%s". Now you'd take this as a string, and you'd use a string replace, and
replace %s with something the user picked in a drop down. At runtime, you are
dynamically creating the SQL code, and once you have it, running it. You might
have methods that accept SQL and return SQL. Now again, your SQL is a big
string, which isn't ideal. But this is still an example of code as data. Now
in Java, you can not do that with Java code. Java does not have this concept
of code-as-data. In other words, there's no eval.

So when you combine all these three aspects, a homoiconic syntax that parses
easily and logically into a data-structure which is easy to manipulate, and
where you can then execute data which represent code at runtime and pass it
around to other functions, even over the wire, you get a very powerful combo
that turns into a Meta-programming powerhouse. This is the strength of Lisps,
the one strength all Lisps share.

------
PaulHoule
One of the failures of conventional programming libraries is that parsing
libraries such as yacc work in only one direction. In 2018 we could easily
have libraries that work both ways by default, but we don't. Bidirectional
parsing is great for code generation and opens up a lot of things you could do
easily, but because common parsing libraries are unidirectional, people aren't
aware of what you can do and don't clamor for bidirectional parsing.

It is very possible and practical to parse conventional languages down to an
AST tree, work on the tree, and run that code. See

[https://github.com/lihaoyi/macropy](https://github.com/lihaoyi/macropy)

Sometimes I wonder if the LISP cult is just trying to pretend Noam Chomsky was
never born.

~~~
icebraining
If by code you mean actual text (as opposed to macropy, which generates ASTs
that get directly compiled), what does that open up? I've found code
generation to be mostly annoying, unless it's just fixing up code written by
humans. What's the point of generating a bunch of code, so that it can be re-
parsed to be executed? Might as well skip the intermediate step.

To me it seems like having the computer physically press its own keys rather
than just generating virtual events.

~~~
kevas
Can you please elaborate a bit more on what was such a nuisance when working
with this method. I'm currently building something out that is using this
technique to (hopefully) save lines in code, dynamic user form generation
based on with permissions, etc...

~~~
icebraining
I oversimplified a bit, but the usual flow I've experience was:

The programmer runs the generator, which outputs some file(s) with code, which
the programmer proceeds to edit: some areas, removing others, etc. Then for
some reason the input to the generator (in your case, the permissions) or the
generator itself change, and so it has to be run again. Now the programmer is
left with the task of having to merge all the changes made to the original
files with the new generated output. Alternatively, the programmer says "to
hell with that" and updates the code manually instead of using the generator,
in which case the tool was useful exactly one time.

As PaulHoule correctly pointed out, though, there are many exceptions to this;
generally, if the programmer has no reason to manually edit the files, then
the problem is avoided. In some of those cases, though, there's no point in
generating textual code, which will then be converted to some other format;
you can output that format directly.

------
RaycatRakittra
(2014)

"Code as data" is a wonderful thing but I also enjoy writing Lisp, so take
that with a grain of salt.

~~~
ZenoArrow
> "(2014)"

Thought it was odd when I saw the name "Nimrod", that explains why.

------
dmitriid
One of the greatest lies that Lispers have is: "Lisp has no syntax". Syntax is
defined as "the structure of statements in a computer language."

What Lisp has, and is, is a syntax to describe an AST. If you get the syntax
wrong, your program won't run. And even that syntax isn't uniform across the
various Lisps (some will throw in weird chars and constructs here and there to
make dealing with common structures easier etc.)

It does make it somewhat easier to manipulate code as data if you wish to. And
it does somewhat make reasoning about some parts of your code somewhat easier.
Is it that good as it's glorified to be?

The author links to Korma as an example of the power of code as data, and
macros:

    
    
        (select users 
          (where {:active true})
          (order :created)
          (limit 5)
          (offset 3))
    

:-\

To me, it's a chain of 5 functions which are neither shorter to write nor
better than Java's jOOQ [1]:

    
    
           create.select(a.FIRST_NAME, a.LAST_NAME, countDistinct(s.NAME))
                 .from(a)
                 .join(b).on(b.AUTHOR_ID.eq(a.ID))
                 .join(t).on(t.BOOK_ID.eq(b.ID))
                 .join(s).on(t.BOOK_STORE_NAME.eq(s.NAME))
                 .groupBy(a.FIRST_NAME, a.LAST_NAME)
                 .orderBy(countDistinct(s.NAME).desc())
                 .fetch();
    

Look, it even has a similar number of parentheses ;)

[1] Example from [https://www.jooq.org/doc/3.10/manual-single-page/#sql-
buildi...](https://www.jooq.org/doc/3.10/manual-single-page/#sql-building)

~~~
kazinator
That particular _select_ instance could indeed just be an ordinary function,
whereby the _(where ...)_ and _(order ...)_ are just evaluated argument
expressions, also calling ordinary constructors for objects that influence the
query. It doesn't really demonstrate the ability to manipulate syntax.

Things start to get more interesting when some of the inputs need to be
lambdas. Still, the sugaring of those can be in the arguments, and _select_
can remain a function. Now suppose that a clause can, say, bind a lexical
variable that a later clause can somehow usefully refer to; things like that.

~~~
dmitriid
Unless I’m missing something everything you said applies to jOOQ code.

