Automatic programming: write code that writes code

colomon · on May 12, 2012

I use straightforward techniques to automatically generate about a third of my C++ code (108,000 lines of code or so). When I'm writing code, I keep an eye out for things that I'm doing which are very repetitious.

For instance, I have nine libraries which do the same things for different file formats. Ideally I'd like the main API for each library to be as similar as possible. So the files which implement the API are generated by a Perl template script. Each library implements a couple of core functions; then the code generated from the template calls those functions in various ways. So I implement LowLevelStepImport; the automatically generated code uses that to implement ArrayStepImport, ClassStepImport, CCallableStepImport, etc. Then I implement LowLevelJTImport, and it implements ArrayJtImport, ClassJTImport, CCallableStepImport, etc.

There are lots of advantages to this. It means I don't have to write a bunch of boring code which is essentially the same. It makes it very easy to keep the APIs consistent across all the libraries. If I need to change part of the API, I just change the template and all the libraries update automatically. And if I add a new library, it is trivial to get it added to the collection.

espeed · on May 12, 2012

See Philip Greenspun's problem set 4 from MIT course 6.916: Software Engineering of Innovative Web Services (http://philip.greenspun.com/teaching/psets/ps4/ps4.adp).

Its objective says, "Teach students the virtues of metadata. More specifically, they learn how to formally represent the requirements of a Web service and then build a computer program to generate the computer programs that implement that service."

This is one of the problem sets potential ArsDigita (http://en.wikipedia.org/wiki/ArsDigita) recruits were required to solve during the first bubble.

UPDATE: Philip explains automatic code generation here (http://philip.greenspun.com/seia/metadata), and the "SQL for Web Nerds" book he references in the pset has been moved to (http://philip.greenspun.com/sql/).

pudakai · on May 12, 2012

I'm probably biased because of my years ago experiences w/writing compilers/vm's, but I'm a big fan of "code that writes code". Like any technique, there is a time and place for it though.

One use that is highly relevant to many shops is creating client tools for their APIs. That is, creating libraries in common/target client languages for higher order abstractions of some of the granular/repetitive stuff and/or some of the mechanical access actions.

Keeping this sort of stuff straight by hand as an API evolves is tedious and error prone. Creating the infrastructure to generate these sorts of libraries is a largish cost up front but pays off hugely downstream.

We are still working (very early stage) on our tools, but our tools code generator(s) works/will work off of the same infrastructure that generates our API model metadata (https://sigkat.com/services/api/model)

reacweb · on May 12, 2012

I have worked on military communication software in Ada a few years ago. We had to encode and decode messages with a nested structure and irregular binary structures. These binary format were very well specified. We have developed manually encoding/decoding of a couple of message types, then I have coded in perl a code generator. I can not give you the code (classified), but the structure was trivial: - first part: a copy of the format specification. Perl allows to have syntax very close to the structure of the specifications, reducing the risk of mistakes. - second part: reorganize these specification in data structures more adapted for code generation - third part: generate each source file. It has been very effective. Perl is ideal for generation of text files. Ada is very verbose. With the big number of messages, coding manually so many lines of code would introduce many bugs, hard to find. The generated code was very clean, fast and easily tested. It has been developed very fast. The business logic was added via inheritance, as a result it was possible to update the generator and regenerate everything each time a missing feature or a defect was identified.

dkersten · on May 12, 2012

FYI - use two newlines to separate paragraphs. Eg "abc\n\ndef":

abc

def

Other handy formatting:

italics are done by placing * around the text

  indented text by putting two space characters before each line.
  Useful for code snippets and other preformatted text
  Or very long lines of text like this this this this this this this this this this this this this this this

rwmj · on May 12, 2012

We really go for this in libguestfs, generating about 300,000 lines of C boilerplate (mainly bindings for languages, and RPC generation). We use about 20,000 lines of OCaml to do this. This has proved itself over and over again to have been a smart move.

https://github.com/libguestfs/libguestfs/tree/e275786cb2bce7...

    $ ./generator/generator 
    generated 300518 lines of code

_delirium · on May 12, 2012

FFTW takes a similar approach, generating huge gobs of specialized FFT code in C using an Ocaml program.

tluyben2 · on May 12, 2012

Metaprogramming is a good example of this, but it's not enough 'automatic' for my taste. I find inductive logic programming (ILP) and genetic programming a lot more interesting. Most researchers dropped these notions (in the 90s) mostly (for generic software purposes) as they are too slow and impractical, but I think, with faster computers, new insights and completely new hardware (memristors) on the horizon, they might get back in fashion to enable actual automatic programming instead of 'just' DSLs and DSL related code generation.

nano_o · on May 12, 2012

Here's an example of a recent tool that automatically generates code from a declarative specification written in linear integer arithmetic with operations on sets: http://lara.epfl.ch/w/comfusy. There is a plugin available for the Scala (2.7.7) compiler.

Assuming you want to break a number of seconds contained in variable secnum into hours, minutes, and seconds, you would write something like this:

  val secnum: Int = Console.readInt
     val (hours, minutes, seconds) = choose((h: Int, m: Int, s: Int) => (
         h * 3600 + m * 60 + s == secnum
        && 0 <= m
        && m < 60
        && 0 <= s
        && s < 60
      ) )

Code that computes the hours, minutes, and seconds will be generated at compile time (it's not interpreted).

dxbydt · on May 12, 2012

This is super interesting but I am really curious about its performance assuming they dont just bruteforce it to exhaust all combinationss. Tons of portfolio optimization problems can be specified very succinctly just like above but the actual computation can be quite tricky. If you had to break up a hundred dollars to invest in google amazon and rest in a riskfree interest bearing bond the optimal constraints look just like the choose function above...so i can declaratively write out the sharpe but maximizing the sharpe is fairly nontrivial. I am going to check this out right now.

tluyben2 · on May 13, 2012

It's such a shame though that this kind of research usually stops when the author is done with his/her thesis. A lot of worthwhile papers and software are out there and together probably enough to create a small revolution in software development, but the problems are very hard and usually it's not even concrete enough to define the problem that well. So it would need a lot of these researchers combined to do something great; besides maybe DARPA, I don't see much interest in that at the moment unfortunately.

nano_o · on May 15, 2012

I think it's much better than brute force search. The example above produces the following code:

  val (hours, minutes, seconds) = {
    val loc1 = secnum div 3600
    val num2 = secnum + ((−3600) ∗ loc1)
    val loc2 = min(num2 div 60, 59)
    val loc3 = secnum + ((−3600) ∗ loc1) + (−60 ∗ loc2)
    (loc1, loc2, loc3)
  }

Note that it works for constraints expressed in (parameterized) linear arithmetic and with operations on sets (like taking cardinality, union, intersection...).

tluyben2 · on May 12, 2012

Very interesting, thanks for that.

_delirium · on May 12, 2012

Another intriguing approach in the same family as ILP and GP is some recent work on just doing exhaustive search, but over heavily type-constrained Haskell code, with the semantic information being used to greatly narrow the search space: http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.ht...

and paper: http://nautilus.cs.miyazaki-u.ac.jp/~skata/skatayama_pricai2...

cubicle67 · on May 12, 2012

Ruby's attr_accessor is a great example of this. When you add the line

    attr_accessor :age

a bit of nifty metaprogramming kicks in and generates the following getter/setter methods in your class

    def age= value
      @age = value
    end

    def age
      @age
    end

SagelyGuru · on May 12, 2012

This is not what I would call real code, or what the questioner was after, I feel. It is just mandatory syntactical verbiage to appease the idol of objective programming.

cubicle67 · on May 12, 2012

not sure what makes code "real" or not to you, but I used this as a very simple example of "code that writes code" that probably all Ruby users have used, possibly without thinking about what it's doing

and yes, pretty much all code we write is mandatory syntactical verbiage

SagelyGuru · on May 12, 2012

Fair enough. I should be more precise. I don't think automatic programming is (or should be) just about syntactical translations. Just as nowadays we smile about the fact that the first Fortran compiler was considered to be AI.

JeanPierre · on May 12, 2012

Isn't that just syntactic sugar[1]? I cannot see any kind of metaprogramming going on here - unless you can define attr_accessor and similar parts yourself.

[1] http://en.wikipedia.org/wiki/Syntactic_sugar

stcredzero · on May 12, 2012

Isn't that just syntactic sugar[1]?

Actually, it's runtime-during-parsing magic, IIRC.

philwelch · on May 12, 2012

You can.

FuzzyDunlop · on May 12, 2012

This is pretty much the same as @synthesize in Objective-C.

I think a better example of metaprogramming might be the usage of method_missing in Ruby, or, say, __call and __callStatic in PHP, which all allow one to do interesting things with non-existent method invocations.

cubicle67 · on May 12, 2012

I'd argue that method_missing isn't metaprogramming at all.

What about ActiveRecord (Rails' default ORM)? That makes great use of code-writing-code to generate a ton of methods on your class all based on db fields.

gouranga · on May 12, 2012

I'm a firm believer that if you write code that writes code, you are using inadequate tooling.

The primary concern of a problem should always be possible to solve without metaprogramming.

Metaprogramming is one of those things which sounds good but rapidly increases complexity beyond what is humanly manageable (even if you do it right i.e. LISP macros etc).

Building successive abstractions is the right way of doing things. I think even SICP tries to nail that into people's heads

jwilliams · on May 12, 2012

You can build successive abstractions using meta-programming.

All you're talking about is taking your abstraction and calling it a "tool".

gouranga · on May 13, 2012

You can but you introduce a "forward-only" semantic translation prior to compilation which changes the meaning of the code irreversably.

A basic abstraction remains untouched in the underlying implementation. It's just a call away.

There are semantic and technical differences between abstraction and meta-programming.

derrida · on May 12, 2012

Most lisp hackers would use Macros (code that writes code) because it is easier. How could this be using 'inadequate tooling'?

gouranga · on May 13, 2012

Easier doesn't imply correctness, ease of maintenance and preservation of semantics.

Macros are the least desirable "quality" of various LISPs if you ask me.

A suitable abstraction can always be built cleanly and efficiently without relying on the macro system.

swannodette · on May 12, 2012

I guess you haven't been following Alan Kay's latest research very closely - http://www.vpri.org/pdf/tr2011004_steps11.pdf

We are so very far away from understanding the best approaches for writing software that solves problems.

anamax · on May 13, 2012

> I'm a firm believer that if you write code that writes code, you are using inadequate tooling.

> Building successive abstractions is the right way of doing things.

You seem to think that "code that writes code" only addresses problems that are solvable by building successive abstractions.

Most "code that writes code" that I've seen had nothing to do with implementing abstractions. Most were implementing code based on data, database-driven-code generation if you will.

ams6110 · on May 13, 2012

Ever use a compiler?

gouranga · on May 13, 2012

A compiler is a translator, not a meta-programming system.

webnrrd2k · on May 15, 2012

When it comes down to it, isn't "translating" and "meta-programming" the same thing? It's an honest question -- how do you make the distinction?

To add some context, I'm in the "interpreters all the way down" camp. On other words, it seems to me that hardware vs software is an arbitrary distinction. After all, someone had to decide how the hardware would respond to code, in other words, someone had to physically program the hardware to behave a certain way. In that sense, isn't the code we write (in lisp, C, C++, etc...) just a way to meta-program the hardware?

I'm not sure if that was very clear... Another way of saying it is that simply declaring "software should never modify software" or "software should never be treated as data" is an arbitrary and almost meaningless declaration.

gcr · on May 13, 2012

Isn't a tool just a piece of code that writes code?

dkersten · on May 12, 2012

I once wrote a Python program that would generate Java classes for sending SNMP notifications. It would take a text file describing the messages and generate a MIB file and Java code for each message from it. A message description looked something like this (don't quite remember exactly now):

  oid  "1.3.6.1.4.1.12345"
  group "1"
  message "MyNotification"
  field string message_body

and it would generate a Java class called MyNotification which contained about 100 lines of code to handle sending and receiving of these notification messages, plus the MIB equivalent so that network management tools can understand the message.

Once the messages were defined, they rarely changed since customers would have to change their network management tool configuration, so it was mainly only used to add new notification types once the original set stabilized, but it saved a ton of time over having to hand craft each message type.

skrebbel · on May 12, 2012

don't forget the entire 'model-driven software engineering' world, which is very much about smart code generation from DSLs, and very much not anymore about that 'executable UML' nonsense that was once associated with the term.

I think MDSD is very undeservedly underrated in the web world.

softbuilder · on May 12, 2012

Caveat: There's a difference between code that writes code to dynamically expand/fulfill an abstraction, and code that is merely generated to satisfy what would otherwise be a copy-and-paste need. I know I missed this when I first was exposed to the concept of code generation.

I was ecstatic that I could write a large chunk of my code automatically. What I eventually realized was that I had missed abstractions that would have allowed me to achieve the same ends with generics and dynamic programming. Also a change of language (C++ at the time) would have drastically changed my perspective.

YMMV.

scott_w · on May 12, 2012

I did a similar thing when writing Java code for university assignments. Usually, generating test data required either writing a handler to pull it in from a file, or just writing a lot of Java code to manually load it into memory.

I wrote Python scripts which spat out the method calls with the appropriate arguments.

Yes, I could probably have written an abstraction to create the data set in Java, but it worked out faster to write it in Python, and one doesn't get marks for generating test data.

ericHosick · on May 12, 2012

When we write source code, we are describing a process (ship an order) and providing the support for that process (persistence, move, copy, etc.). They are tightly coupled to each other via source code.

I am guessing code that writes code is an attempt to further decouple a process from the code providing the support for that process (data driven programming is another example but is often very domain specific).

It seems that full decoupling of a processes behavior from the code supporting that process would be the end goal. I don't think this is something that meta-programming fully solves.

ubershmekel · on May 12, 2012

At first glance the idea of "code that writes code" sounds so advanced and high tech but once you see generated source you want to puke.

E.g. http://sourceforge.net/projects/malclassifier.adobe/files/Ad...

I daydream of a programming language to which I can fuzzily explain my needs and it automagically solves problem. Natural language processing might be hard to debug as part of a programming language though.

jbri · on May 12, 2012

Of course the idea behind "generated code" is that you don't look at it. Looking at generated code is like disassembling an .o file - unless you need to debug your compiler (or code generator), there's no reason to treat it as anything other than an opaque blob.

dochtman · on May 12, 2012

That's a real problem with generated code, though. It almost always ends up that there's some issue with either your original code or the code that generates the generated code, and you end up having to dive into the ugly generated code to figure out what the problem is. I'm pretty sure this is the major reason code generation isn't used more often.

reddit_clone · on May 12, 2012

Well, I guess you need a meta debugger here.

vibrunazo · on May 12, 2012

> I daydream of a programming language to which I can fuzzily explain my needs and it automagically solves problem

That's the core of our startup :) We're building robots who can build games. You just give us basic simple instructions like "this type of gameplay, with these features and these characters" that can be described in 5 lines of code. And we build a whole game with it. To the user, it looks automagic. But inside, it's just a compiler that converts one data format into another.

> Natural language processing might be hard to debug as part of a programming language though.

You don't really need to go there. Just limit your input to what your compiler knows. Instead of trying to guess intent from ambiguous input, you just give them options instead. If you really want the user to have the freedom to type their ideas out, then use auto-complete and/or code suggestions. That "feels" like natural typing, but dodges the problem with NLP that is trying to guess what ambiguous input means.

CoolGuySteve · on May 12, 2012

Agreed, I've been in this situation twice and both times it was completely terrible.

The worst codebase I ever worked on was some C++ generated using a scripting language. It made awkward and unweildy looking C++ code that leaked smart pointers everywhere and barely worked. But man, the authors that wrote it thought they were so clever.

Apple's Objective-C to MSVC C++ clang compiler is a close second. I'm not sure if it ever made it out into the wild or if it was just an Apple internal thing, but there's nothing like debugging a 3000 character post-preprocessor C++ expansion of several nested obc_msgsend calls.

It's rough when you long for the C preprocessor. At least the CPP limitations prevent people from stretching themselves too far into meta territory. It's almost kind of elegant in its badness, now that I've seen what people can do to imperative/OO languages when more capable macro processors are involved.

DennisP · on May 12, 2012

When I do it, I write my own generator in C#. The resulting code looks however I want it to look.

On one occasion, this worked out better than expected, when I realized I could convert my compiler to an interpreter and get a really flexible program configurable at runtime.

ZaftcoAgeiha · on May 12, 2012

it's called a compiler :)

atdt · on May 12, 2012

UglifyJS (https://github.com/mishoo/UglifyJS) comes with a JavaScript code generator that is very handy, particularly for compiling DSLs into JavaScript.

Tycho · on May 12, 2012

Interesting. I've often heard people on Hacker News, eg. edw519, express appreciation for code generators but I've never seen much documentation or blogging specifically about it.

stcredzero · on May 12, 2012

Never use this for any code you'd have to debug. (Though, it can be perfectly fine for code that you might debug-through.)

sanxiyn · on May 12, 2012

Compiler (that's what code that writes code is) is still a relatively expensive thing to design, implement, and maintain. I think you should avoid it in general, although there are exceptions.

Protocol Buffers and Thrift seem to be good examples of code generation.

lucian1900 · on May 12, 2012

In languages with full macros (like all lisps), code generation becomes trivial.

sanxiyn · on May 12, 2012

Macros definitely do not make code generation trivial. Rather, macros make code generation that should be trivial trivial, but generating (say) serializer/deserializer does not become trivial.

warmfuzzykitten · on May 12, 2012

It could be that macros are too low-level for the serializer/deserializer case. What you really want is a definition of the thing to be (de)serialized in some language and two different translations of the same definition. The translator might need to make multiple passes over the same definition in the course of even a single translation.

Another case that illustrates this point: Suppose you are defining a wire protocol between two communicating processes. The tool you want is a language in which the protocol is defined, from which sender and receiver code can be generated. A flexible tool would be able to generate sender and receiver in different languages, or, in fact, in multiple languages. It would be able to generate language-independent documentation of the protocol in multiple formats. It's not easy to see how macros embedded in a single programming language could produce these results.

shane_mcd · on May 12, 2012

Rails' method_missing is a great example of this and awesome to read through for a thorough mind fuck.

gnaritas · on May 12, 2012

Rails doesn't have a method_missing, Ruby does, it copied it from Smalltalk's #doesNotUnderstand:. Rails is a framework, not a language.

shane_mcd · on May 12, 2012

Oh come on. I meant how it was implemented with AR

gnaritas · on May 12, 2012

I have no idea what you meant, I can only comment on what you said. Had you said activerecord's implementation of method_missing, I'd have understood, but you said Rails.

vimota · on May 12, 2012

I see the ascent of this type of coding to lead to some applications of AI and ML in common-day code.

hackermom · on May 12, 2012

This method is not the least uncommon or "advanced" with those who still program assembler - in particular with the coders in the so called demoscene of lower level machines like the C-64 and Amiga - most often for creating unrolled loops to save a few cycles on removing the increase/compare/branch portion (or similar) of the loop.

dbaupp · on May 12, 2012

If one was to be a smartass, one could say that all code is code that writes assembler. (Possibly with many levels of indirection.)