I use straightforward techniques to automatically generate about a third of my C++ code (108,000 lines of code or so). When I'm writing code, I keep an eye out for things that I'm doing which are very repetitious.
For instance, I have nine libraries which do the same things for different file formats. Ideally I'd like the main API for each library to be as similar as possible. So the files which implement the API are generated by a Perl template script. Each library implements a couple of core functions; then the code generated from the template calls those functions in various ways. So I implement LowLevelStepImport; the automatically generated code uses that to implement ArrayStepImport, ClassStepImport, CCallableStepImport, etc. Then I implement LowLevelJTImport, and it implements ArrayJtImport, ClassJTImport, CCallableStepImport, etc.
There are lots of advantages to this. It means I don't have to write a bunch of boring code which is essentially the same. It makes it very easy to keep the APIs consistent across all the libraries. If I need to change part of the API, I just change the template and all the libraries update automatically. And if I add a new library, it is trivial to get it added to the collection.
Its objective says, "Teach students the virtues of metadata. More specifically, they learn how to formally represent the requirements of a Web service and then build a computer program to generate the computer programs that implement that service."
I'm probably biased because of my years ago experiences w/writing compilers/vm's, but I'm a big fan of "code that writes code". Like any technique, there is a time and place for it though.
One use that is highly relevant to many shops is creating client tools for their APIs. That is, creating libraries in common/target client languages for higher order abstractions of some of the granular/repetitive stuff and/or some of the mechanical access actions.
Keeping this sort of stuff straight by hand as an API evolves is tedious and error prone. Creating the infrastructure to generate these sorts of libraries is a largish cost up front but pays off hugely downstream.
We are still working (very early stage) on our tools, but our tools code generator(s) works/will work off of the same infrastructure that generates our API model metadata (https://sigkat.com/services/api/model)
I have worked on military communication software in Ada a few years ago. We had to encode and decode messages with a nested structure and irregular binary structures. These binary format were very well specified.<br>
We have developed manually encoding/decoding of a couple of message types, then I have coded in perl a code generator. I can not give you the code (classified), but the structure was trivial:<br>
- first part: a copy of the format specification. Perl allows to have syntax very close to the structure of the specifications, reducing the risk of mistakes.<br>
- second part: reorganize these specification in data structures more adapted for code generation<br>
- third part: generate each source file.<br>
<br>
It has been very effective. Perl is ideal for generation of text files. Ada is very verbose. With the big number of messages, coding manually so many lines of code would introduce many bugs, hard to find.
<br><br>
The generated code was very clean, fast and easily tested. It has been developed very fast. The business logic was added via inheritance, as a result it was possible to update the generator and regenerate everything each time a missing feature or a defect was identified.
FYI - use two newlines to separate paragraphs. Eg "abc\n\ndef":
abc
def
Other handy formatting:
italics are done by placing * around the text
indented text by putting two space characters before each line.
Useful for code snippets and other preformatted text
Or very long lines of text like this this this this this this this this this this this this this this this
We really go for this in libguestfs, generating about 300,000 lines of C boilerplate (mainly bindings for languages, and RPC generation). We use about 20,000 lines of OCaml to do this. This has proved itself over and over again to have been a smart move.
Metaprogramming is a good example of this, but it's not enough 'automatic' for my taste. I find inductive logic programming (ILP) and genetic programming a lot more interesting. Most researchers dropped these notions (in the 90s) mostly (for generic software purposes) as they are too slow and impractical, but I think, with faster computers, new insights and completely new hardware (memristors) on the horizon, they might get back in fashion to enable actual automatic programming instead of 'just' DSLs and DSL related code generation.
Here's an example of a recent tool that automatically generates code from a declarative specification written in linear integer arithmetic with operations on sets: http://lara.epfl.ch/w/comfusy.
There is a plugin available for the Scala (2.7.7) compiler.
Assuming you want to break a number of seconds contained in variable secnum into hours, minutes, and seconds, you would write something like this:
val secnum: Int = Console.readInt
val (hours, minutes, seconds) = choose((h: Int, m: Int, s: Int) => (
h * 3600 + m * 60 + s == secnum
&& 0 <= m
&& m < 60
&& 0 <= s
&& s < 60
) )
Code that computes the hours, minutes, and seconds will be generated at compile time (it's not interpreted).
This is super interesting but I am really curious about its performance assuming they dont just bruteforce it to exhaust all combinationss. Tons of portfolio optimization problems can be specified very succinctly just like above but the actual computation can be quite tricky. If you had to break up a hundred dollars to invest in google amazon and rest in a riskfree interest bearing bond the optimal constraints look just like the choose function above...so i can declaratively write out the sharpe but maximizing the sharpe is fairly nontrivial. I am going to check this out right now.
It's such a shame though that this kind of research usually stops when the author is done with his/her thesis. A lot of worthwhile papers and software are out there and together probably enough to create a small revolution in software development, but the problems are very hard and usually it's not even concrete enough to define the problem that well. So it would need a lot of these researchers combined to do something great; besides maybe DARPA, I don't see much interest in that at the moment unfortunately.
I think it's much better than brute force search.
The example above produces the following code:
val (hours, minutes, seconds) = {
val loc1 = secnum div 3600
val num2 = secnum + ((−3600) ∗ loc1)
val loc2 = min(num2 div 60, 59)
val loc3 = secnum + ((−3600) ∗ loc1) + (−60 ∗ loc2)
(loc1, loc2, loc3)
}
Note that it works for constraints expressed in (parameterized) linear arithmetic and with operations on sets (like taking cardinality, union, intersection...).
Another intriguing approach in the same family as ILP and GP is some recent work on just doing exhaustive search, but over heavily type-constrained Haskell code, with the semantic information being used to greatly narrow the search space: http://nautilus.cs.miyazaki-u.ac.jp/~skata/MagicHaskeller.ht...
This is not what I would call real code, or what the questioner was after, I feel. It is just mandatory syntactical verbiage to appease the idol of objective programming.
not sure what makes code "real" or not to you, but I used this as a very simple example of "code that writes code" that probably all Ruby users have used, possibly without thinking about what it's doing
and yes, pretty much all code we write is mandatory syntactical verbiage
Fair enough. I should be more precise. I don't think automatic programming is (or should be) just about syntactical translations. Just as nowadays we smile about the fact that the first Fortran compiler was considered to be AI.
Isn't that just syntactic sugar[1]? I cannot see any kind of metaprogramming going on here - unless you can define attr_accessor and similar parts yourself.
This is pretty much the same as @synthesize in Objective-C.
I think a better example of metaprogramming might be the usage of method_missing in Ruby, or, say, __call and __callStatic in PHP, which all allow one to do interesting things with non-existent method invocations.
I'd argue that method_missing isn't metaprogramming at all.
What about ActiveRecord (Rails' default ORM)? That makes great use of code-writing-code to generate a ton of methods on your class all based on db fields.
I'm a firm believer that if you write code that writes code, you are using inadequate tooling.
The primary concern of a problem should always be possible to solve without metaprogramming.
Metaprogramming is one of those things which sounds good but rapidly increases complexity beyond what is humanly manageable (even if you do it right i.e. LISP macros etc).
Building successive abstractions is the right way of doing things. I think even SICP tries to nail that into people's heads
> I'm a firm believer that if you write code that writes code, you are using inadequate tooling.
> Building successive abstractions is the right way of doing things.
You seem to think that "code that writes code" only addresses problems that are solvable by building successive abstractions.
Most "code that writes code" that I've seen had nothing to do with implementing abstractions. Most were implementing code based on data, database-driven-code generation if you will.
When it comes down to it, isn't "translating" and "meta-programming" the same thing? It's an honest question -- how do you make the distinction?
To add some context, I'm in the "interpreters all the way down" camp. On other words, it seems to me that hardware vs software is an arbitrary distinction. After all, someone had to decide how the hardware would respond to code, in other words, someone had to physically program the hardware to behave a certain way. In that sense, isn't the code we write (in lisp, C, C++, etc...) just a way to meta-program the hardware?
I'm not sure if that was very clear... Another way of saying it is that simply declaring "software should never modify software" or "software should never be treated as data" is an arbitrary and almost meaningless declaration.
I once wrote a Python program that would generate Java classes for sending SNMP notifications. It would take a text file describing the messages and generate a MIB file and Java code for each message from it. A message description looked something like this (don't quite remember exactly now):
oid "1.3.6.1.4.1.12345"
group "1"
message "MyNotification"
field string message_body
and it would generate a Java class called MyNotification which contained about 100 lines of code to handle sending and receiving of these notification messages, plus the MIB equivalent so that network management tools can understand the message.
Once the messages were defined, they rarely changed since customers would have to change their network management tool configuration, so it was mainly only used to add new notification types once the original set stabilized, but it saved a ton of time over having to hand craft each message type.
don't forget the entire 'model-driven software engineering' world, which is very much about smart code generation from DSLs, and very much not anymore about that 'executable UML' nonsense that was once associated with the term.
I think MDSD is very undeservedly underrated in the web world.
Caveat: There's a difference between code that writes code to dynamically expand/fulfill an abstraction, and code that is merely generated to satisfy what would otherwise be a copy-and-paste need. I know I missed this when I first was exposed to the concept of code generation.
I was ecstatic that I could write a large chunk of my code automatically. What I eventually realized was that I had missed abstractions that would have allowed me to achieve the same ends with generics and dynamic programming. Also a change of language (C++ at the time) would have drastically changed my perspective.
I did a similar thing when writing Java code for university assignments. Usually, generating test data required either writing a handler to pull it in from a file, or just writing a lot of Java code to manually load it into memory.
I wrote Python scripts which spat out the method calls with the appropriate arguments.
Yes, I could probably have written an abstraction to create the data set in Java, but it worked out faster to write it in Python, and one doesn't get marks for generating test data.
When we write source code, we are describing a process (ship an order) and providing the support for that process (persistence, move, copy, etc.). They are tightly coupled to each other via source code.
I am guessing code that writes code is an attempt to further decouple a process from the code providing the support for that process (data driven programming is another example but is often very domain specific).
It seems that full decoupling of a processes behavior from the code supporting that process would be the end goal. I don't think this is something that meta-programming fully solves.
I daydream of a programming language to which I can fuzzily explain my needs and it automagically solves problem. Natural language processing might be hard to debug as part of a programming language though.
Of course the idea behind "generated code" is that you don't look at it. Looking at generated code is like disassembling an .o file - unless you need to debug your compiler (or code generator), there's no reason to treat it as anything other than an opaque blob.
That's a real problem with generated code, though. It almost always ends up that there's some issue with either your original code or the code that generates the generated code, and you end up having to dive into the ugly generated code to figure out what the problem is. I'm pretty sure this is the major reason code generation isn't used more often.
> I daydream of a programming language to which I can fuzzily explain my needs and it automagically solves problem
That's the core of our startup :) We're building robots who can build games. You just give us basic simple instructions like "this type of gameplay, with these features and these characters" that can be described in 5 lines of code. And we build a whole game with it. To the user, it looks automagic. But inside, it's just a compiler that converts one data format into another.
> Natural language processing might be hard to debug as part of a programming language though.
You don't really need to go there. Just limit your input to what your compiler knows. Instead of trying to guess intent from ambiguous input, you just give them options instead. If you really want the user to have the freedom to type their ideas out, then use auto-complete and/or code suggestions. That "feels" like natural typing, but dodges the problem with NLP that is trying to guess what ambiguous input means.
Agreed, I've been in this situation twice and both times it was completely terrible.
The worst codebase I ever worked on was some C++ generated using a scripting language. It made awkward and unweildy looking C++ code that leaked smart pointers everywhere and barely worked. But man, the authors that wrote it thought they were so clever.
Apple's Objective-C to MSVC C++ clang compiler is a close second. I'm not sure if it ever made it out into the wild or if it was just an Apple internal thing, but there's nothing like debugging a 3000 character post-preprocessor C++ expansion of several nested obc_msgsend calls.
It's rough when you long for the C preprocessor. At least the CPP limitations prevent people from stretching themselves too far into meta territory. It's almost kind of elegant in its badness, now that I've seen what people can do to imperative/OO languages when more capable macro processors are involved.
When I do it, I write my own generator in C#. The resulting code looks however I want it to look.
On one occasion, this worked out better than expected, when I realized I could convert my compiler to an interpreter and get a really flexible program configurable at runtime.
UglifyJS (https://github.com/mishoo/UglifyJS) comes with a JavaScript code generator that is very handy, particularly for compiling DSLs into JavaScript.
Interesting. I've often heard people on Hacker News, eg. edw519, express appreciation for code generators but I've never seen much documentation or blogging specifically about it.
Compiler (that's what code that writes code is) is still a relatively expensive thing to design, implement, and maintain. I think you should avoid it in general, although there are exceptions.
Protocol Buffers and Thrift seem to be good examples of code generation.
Macros definitely do not make code generation trivial. Rather, macros make code generation that should be trivial trivial, but generating (say) serializer/deserializer does not become trivial.
It could be that macros are too low-level for the serializer/deserializer case. What you really want is a definition of the thing to be (de)serialized in some language and two different translations of the same definition. The translator might need to make multiple passes over the same definition in the course of even a single translation.
Another case that illustrates this point: Suppose you are defining a wire protocol between two communicating processes. The tool you want is a language in which the protocol is defined, from which sender and receiver code can be generated. A flexible tool would be able to generate sender and receiver in different languages, or, in fact, in multiple languages. It would be able to generate language-independent documentation of the protocol in multiple formats. It's not easy to see how macros embedded in a single programming language could produce these results.
I have no idea what you meant, I can only comment on what you said. Had you said activerecord's implementation of method_missing, I'd have understood, but you said Rails.
This method is not the least uncommon or "advanced" with those who still program assembler - in particular with the coders in the so called demoscene of lower level machines like the C-64 and Amiga - most often for creating unrolled loops to save a few cycles on removing the increase/compare/branch portion (or similar) of the loop.
For instance, I have nine libraries which do the same things for different file formats. Ideally I'd like the main API for each library to be as similar as possible. So the files which implement the API are generated by a Perl template script. Each library implements a couple of core functions; then the code generated from the template calls those functions in various ways. So I implement LowLevelStepImport; the automatically generated code uses that to implement ArrayStepImport, ClassStepImport, CCallableStepImport, etc. Then I implement LowLevelJTImport, and it implements ArrayJtImport, ClassJTImport, CCallableStepImport, etc.
There are lots of advantages to this. It means I don't have to write a bunch of boring code which is essentially the same. It makes it very easy to keep the APIs consistent across all the libraries. If I need to change part of the API, I just change the template and all the libraries update automatically. And if I add a new library, it is trivial to get it added to the collection.