

Automatic programming: write code that writes code - smallhands
http://programmers.stackexchange.com/questions/103840/automatic-programming-write-code-that-writes-code

======
colomon
I use straightforward techniques to automatically generate about a third of my
C++ code (108,000 lines of code or so). When I'm writing code, I keep an eye
out for things that I'm doing which are very repetitious.

For instance, I have nine libraries which do the same things for different
file formats. Ideally I'd like the main API for each library to be as similar
as possible. So the files which implement the API are generated by a Perl
template script. Each library implements a couple of core functions; then the
code generated from the template calls those functions in various ways. So I
implement LowLevelStepImport; the automatically generated code uses that to
implement ArrayStepImport, ClassStepImport, CCallableStepImport, etc. Then I
implement LowLevelJTImport, and it implements ArrayJtImport, ClassJTImport,
CCallableStepImport, etc.

There are lots of advantages to this. It means I don't have to write a bunch
of boring code which is essentially the same. It makes it very easy to keep
the APIs consistent across all the libraries. If I need to change part of the
API, I just change the template and all the libraries update automatically.
And if I add a new library, it is trivial to get it added to the collection.

------
espeed
See Philip Greenspun's problem set 4 from MIT course 6.916: Software
Engineering of Innovative Web Services
(<http://philip.greenspun.com/teaching/psets/ps4/ps4.adp>).

Its objective says, "Teach students the virtues of metadata. More
specifically, they learn how to formally represent the requirements of a Web
service and then build a computer program to generate the computer programs
that implement that service."

This is one of the problem sets potential ArsDigita
(<http://en.wikipedia.org/wiki/ArsDigita>) recruits were required to solve
during the first bubble.

UPDATE: Philip explains automatic code generation here
(<http://philip.greenspun.com/seia/metadata>), and the "SQL for Web Nerds"
book he references in the pset has been moved to
(<http://philip.greenspun.com/sql/>).

------
pudakai
I'm probably biased because of my years ago experiences w/writing
compilers/vm's, but I'm a big fan of "code that writes code". Like any
technique, there is a time and place for it though.

One use that is highly relevant to many shops is creating client tools for
their APIs. That is, creating libraries in common/target client languages for
higher order abstractions of some of the granular/repetitive stuff and/or some
of the mechanical access actions.

Keeping this sort of stuff straight by hand as an API evolves is tedious and
error prone. Creating the infrastructure to generate these sorts of libraries
is a largish cost up front but pays off hugely downstream.

We are still working (very early stage) on our tools, but our tools code
generator(s) works/will work off of the same infrastructure that generates our
API model metadata (<https://sigkat.com/services/api/model>)

------
reacweb
I have worked on military communication software in Ada a few years ago. We
had to encode and decode messages with a nested structure and irregular binary
structures. These binary format were very well specified.<br> We have
developed manually encoding/decoding of a couple of message types, then I have
coded in perl a code generator. I can not give you the code (classified), but
the structure was trivial:<br> \- first part: a copy of the format
specification. Perl allows to have syntax very close to the structure of the
specifications, reducing the risk of mistakes.<br> \- second part: reorganize
these specification in data structures more adapted for code generation<br> \-
third part: generate each source file.<br> <br> It has been very effective.
Perl is ideal for generation of text files. Ada is very verbose. With the big
number of messages, coding manually so many lines of code would introduce many
bugs, hard to find. <br><br> The generated code was very clean, fast and
easily tested. It has been developed very fast. The business logic was added
via inheritance, as a result it was possible to update the generator and
regenerate everything each time a missing feature or a defect was identified.

~~~
dkersten
FYI - use _two_ newlines to separate paragraphs. Eg " _abc\n\ndef_ ":

abc

def

Other handy formatting:

 _italics_ are done by placing * around the text

    
    
      indented text by putting two space characters before each line.
      Useful for code snippets and other preformatted text
      Or very long lines of text like this this this this this this this this this this this this this this this

------
rwmj
We really go for this in libguestfs, generating about 300,000 lines of C
boilerplate (mainly bindings for languages, and RPC generation). We use about
20,000 lines of OCaml to do this. This has proved itself over and over again
to have been a smart move.

[https://github.com/libguestfs/libguestfs/tree/e275786cb2bce7...](https://github.com/libguestfs/libguestfs/tree/e275786cb2bce728b1a5c8c705cdf76a904be386/generator)

    
    
        $ ./generator/generator 
        generated 300518 lines of code

~~~
_delirium
FFTW takes a similar approach, generating huge gobs of specialized FFT code in
C using an Ocaml program.

------
tluyben2
Metaprogramming is a good example of this, but it's not enough 'automatic' for
my taste. I find inductive logic programming (ILP) and genetic programming a
lot more interesting. Most researchers dropped these notions (in the 90s)
mostly (for generic software purposes) as they are too slow and impractical,
but I think, with faster computers, new insights and completely new hardware
(memristors) on the horizon, they might get back in fashion to enable actual
automatic programming instead of 'just' DSLs and DSL related code generation.

~~~
nano_o
Here's an example of a recent tool that automatically generates code from a
declarative specification written in linear integer arithmetic with operations
on sets: <http://lara.epfl.ch/w/comfusy>. There is a plugin available for the
Scala (2.7.7) compiler.

Assuming you want to break a number of seconds contained in variable secnum
into hours, minutes, and seconds, you would write something like this:

    
    
      val secnum: Int = Console.readInt
         val (hours, minutes, seconds) = choose((h: Int, m: Int, s: Int) => (
             h * 3600 + m * 60 + s == secnum
            && 0 <= m
            && m < 60
            && 0 <= s
            && s < 60
          ) )
    

Code that computes the hours, minutes, and seconds will be generated at
compile time (it's not interpreted).

~~~
dxbydt
This is super interesting but I am really curious about its performance
assuming they dont just bruteforce it to exhaust all combinationss. Tons of
portfolio optimization problems can be specified very succinctly just like
above but the actual computation can be quite tricky. If you had to break up a
hundred dollars to invest in google amazon and rest in a riskfree interest
bearing bond the optimal constraints look just like the choose function
above...so i can declaratively write out the sharpe but maximizing the sharpe
is fairly nontrivial. I am going to check this out right now.

~~~
tluyben2
It's such a shame though that this kind of research usually stops when the
author is done with his/her thesis. A lot of worthwhile papers and software
are out there and together probably enough to create a small revolution in
software development, but the problems are very hard and usually it's not even
concrete enough to define the problem that well. So it would need a lot of
these researchers combined to do something great; besides maybe DARPA, I don't
see much interest in that at the moment unfortunately.

------
cubicle67
Ruby's attr_accessor is a great example of this. When you add the line

    
    
        attr_accessor :age
    

a bit of nifty metaprogramming kicks in and generates the following
getter/setter methods in your class

    
    
        def age= value
          @age = value
        end
    
        def age
          @age
        end

~~~
JeanPierre
Isn't that just syntactic sugar[1]? I cannot see any kind of metaprogramming
going on here - unless you can define attr_accessor and similar parts
yourself.

[1] <http://en.wikipedia.org/wiki/Syntactic_sugar>

~~~
stcredzero
_Isn't that just syntactic sugar[1]?_

Actually, it's runtime-during-parsing magic, IIRC.

------
gouranga
I'm a firm believer that if you write code that writes code, you are using
inadequate tooling.

The primary concern of a problem should always be possible to solve without
metaprogramming.

Metaprogramming is one of those things which sounds good but rapidly increases
complexity beyond what is humanly manageable (even if you do it right i.e.
LISP macros etc).

Building successive abstractions is the right way of doing things. I think
even SICP tries to nail that into people's heads

~~~
jwilliams
You can build successive abstractions using meta-programming.

All you're talking about is taking your abstraction and calling it a "tool".

~~~
gouranga
You can but you introduce a "forward-only" semantic translation prior to
compilation which changes the meaning of the code irreversably.

A basic abstraction remains untouched in the underlying implementation. It's
just a call away.

There are semantic and technical differences between abstraction and meta-
programming.

------
dkersten
I once wrote a Python program that would generate Java classes for sending
SNMP notifications. It would take a text file describing the messages and
generate a MIB file and Java code for each message from it. A message
description looked something like this (don't quite remember exactly now):

    
    
      oid  "1.3.6.1.4.1.12345"
      group "1"
      message "MyNotification"
      field string message_body
    

and it would generate a Java class called MyNotification which contained about
100 lines of code to handle sending and receiving of these notification
messages, plus the MIB equivalent so that network management tools can
understand the message.

Once the messages were defined, they rarely changed since customers would have
to change their network management tool configuration, so it was mainly only
used to add new notification types once the original set stabilized, but it
saved a ton of time over having to hand craft each message type.

------
skrebbel
don't forget the entire 'model-driven software engineering' world, which is
very much about smart code generation from DSLs, and very much not anymore
about that 'executable UML' nonsense that was once associated with the term.

I think MDSD is very undeservedly underrated in the web world.

------
softbuilder
Caveat: There's a difference between code that writes code to dynamically
expand/fulfill an abstraction, and code that is merely generated to satisfy
what would otherwise be a copy-and-paste need. I know I missed this when I
first was exposed to the concept of code generation.

I was ecstatic that I could write a large chunk of my code automatically. What
I eventually realized was that I had missed abstractions that would have
allowed me to achieve the same ends with generics and dynamic programming.
Also a change of language (C++ at the time) would have drastically changed my
perspective.

YMMV.

~~~
scott_w
I did a similar thing when writing Java code for university assignments.
Usually, generating test data required either writing a handler to pull it in
from a file, or just writing a lot of Java code to manually load it into
memory.

I wrote Python scripts which spat out the method calls with the appropriate
arguments.

Yes, I could probably have written an abstraction to create the data set in
Java, but it worked out faster to write it in Python, and one doesn't get
marks for generating test data.

------
ericHosick
When we write source code, we are describing a process (ship an order) and
providing the support for that process (persistence, move, copy, etc.). They
are tightly coupled to each other via source code.

I am guessing code that writes code is an attempt to further decouple a
process from the code providing the support for that process (data driven
programming is another example but is often very domain specific).

It seems that full decoupling of a processes behavior from the code supporting
that process would be the end goal. I don't think this is something that meta-
programming fully solves.

------
ubershmekel
At first glance the idea of "code that writes code" sounds so advanced and
high tech but once you see generated source you want to puke.

E.g.
[http://sourceforge.net/projects/malclassifier.adobe/files/Ad...](http://sourceforge.net/projects/malclassifier.adobe/files/AdobeMalwareClassifier.py/view)

I daydream of a programming language to which I can fuzzily explain my needs
and it automagically solves problem. Natural language processing might be hard
to debug as part of a programming language though.

~~~
jbri
Of course the idea behind "generated code" is that you _don't_ look at it.
Looking at generated code is like disassembling an .o file - unless you need
to debug your compiler (or code generator), there's no reason to treat it as
anything other than an opaque blob.

~~~
dochtman
That's a real problem with generated code, though. It almost always ends up
that there's some issue with either your original code or the code that
generates the generated code, and you end up having to dive into the ugly
generated code to figure out what the problem is. I'm pretty sure this is the
major reason code generation isn't used more often.

~~~
reddit_clone
Well, I guess you need a meta debugger here.

------
ZaftcoAgeiha
it's called a compiler :)

------
atdt
UglifyJS (<https://github.com/mishoo/UglifyJS>) comes with a JavaScript code
generator that is very handy, particularly for compiling DSLs into JavaScript.

------
Tycho
Interesting. I've often heard people on Hacker News, eg. edw519, express
appreciation for code generators but I've never seen much documentation or
blogging specifically about it.

------
stcredzero
Never use this for any code you'd have to debug. (Though, it can be perfectly
fine for code that you might debug-through.)

------
sanxiyn
Compiler (that's what code that writes code is) is still a relatively
expensive thing to design, implement, and maintain. I think you should avoid
it in general, although there are exceptions.

Protocol Buffers and Thrift seem to be good examples of code generation.

~~~
lucian1900
In languages with full macros (like all lisps), code generation becomes
trivial.

~~~
sanxiyn
Macros definitely do not make code generation trivial. Rather, macros make
code generation that should be trivial trivial, but generating (say)
serializer/deserializer does not become trivial.

~~~
warmfuzzykitten
It could be that macros are too low-level for the serializer/deserializer
case. What you really want is a definition of the thing to be (de)serialized
in some language and two different translations of the same definition. The
translator might need to make multiple passes over the same definition in the
course of even a single translation.

Another case that illustrates this point: Suppose you are defining a wire
protocol between two communicating processes. The tool you want is a language
in which the protocol is defined, from which sender and receiver code can be
generated. A flexible tool would be able to generate sender and receiver in
different languages, or, in fact, in multiple languages. It would be able to
generate language-independent documentation of the protocol in multiple
formats. It's not easy to see how macros embedded in a single programming
language could produce these results.

------
shane_mcd
Rails' method_missing is a great example of this and awesome to read through
for a thorough mind fuck.

~~~
gnaritas
Rails doesn't have a method_missing, Ruby does, it copied it from Smalltalk's
#doesNotUnderstand:. Rails is a framework, not a language.

~~~
shane_mcd
Oh come on. I meant how it was implemented with AR

~~~
gnaritas
I have no idea what you meant, I can only comment on what you said. Had you
said activerecord's implementation of method_missing, I'd have understood, but
you said Rails.

------
vimota
I see the ascent of this type of coding to lead to some applications of AI and
ML in common-day code.

------
hackermom
This method is not the least uncommon or "advanced" with those who still
program assembler - in particular with the coders in the so called demoscene
of lower level machines like the C-64 and Amiga - most often for creating
unrolled loops to save a few cycles on removing the increase/compare/branch
portion (or similar) of the loop.

~~~
dbaupp
If one was to be a smartass, one could say that all code is code that writes
assembler. (Possibly with many levels of indirection.)

