But it appears that what they're actually pitching is a simple and flexible code generation environment. It's a way to generate statically-typed code at runtime that targets LLVM but looks nicer than this: http://llvm.org/releases/2.6/docs/tutorial/JITTutorial2.html (C++ code that conjures up LLVM SSA IR directly). You could almost think of this as a high-level API for LLVM code generation and execution that is exceptionally well-integrated into Lua.
For example, in their example where they create a Terra function from a BF program, the equivalent in plain Lua would be to compile the BF program into a Lua program (represented as a big string), load it into the interpreter, and then let LuaJIT JIT it. But with Terra, you can represent the code you're generating symbolically with the "quote" construct instead of having to compile it to a big string. Of course you could just writer a BF interpreter in Lua directly, but if you compile it instead you'll get better performance because you won't pay an interpreter overhead and the optimizer can analyze the program flow to look for optimization opportunities.
[EDIT: removed incorrect criticism about the BF codegen being incomplete]
It's an interesting approach and I look forward to learning more about it.
Since we are primarily using it for dynamic code generation, I haven't done much benchmarking against LuaJIT directly. Instead, we have compared it C by implementing a few of the language benchmarks (nbody and fannkuchredux, performance is normally within 5% of C), and comparing it against ATLAS, which implements BLAS routines by autotuning x86 assembly. In the case of ATLAS, we're 20% slower, but we are comparing auto-tuned Terra and auto-tuned x86 assembly.
Small note, the BF description on the website does go on to implement the '[' and ']' operators below. I just left them out of the initial code so it was easier to grok what was going on. The full implementation is at (https://github.com/zdevito/terra/blob/master/tests/bf.t).
I'm taking a different albeit related approach for dynamic runtime code gen, but either way this is rock solid work, though I'm pretty terrible at deciphering the lua + macro heavy code that is your code examples.
edit: I'm doing something more akin to the Accelerate haskell EDSL approach, with some changes
Just what they've done is a pretty solid. That said, it's not really done as part of a framework for numerics, which just means its a great validation benchmark of their code Gen.
Here's one perhaps relevant paper about LuaJIT for dynamic code generation in QEMU-esque instruction set simulation: http://ieee-hpec.org/2012/index_htm_files/Steele.pdf
It would be better to comparing it to LuaJIT with ffi