Hacker News new | past | comments | ask | show | jobs | submit login
Cog: Use pieces of Python code as generators in your source files (nedbatchelder.com)
121 points by ingve 49 days ago | hide | past | web | favorite | 72 comments

To be honest, what I don't like about this is that it once again operates on the character level. I feel this brings us back to all the issues we had with the C preprocessor and, in addition, makes any IDE analysis/assistance hard to impossible.

I feel the tool would be more useful if you could process the target language's AST instead. This would result in hygienic macros as well as making the code easier to analyse (and might solve the whitespace problems as well, as a formatter could render the whole tree in the end, after any code generation was already applied)

What you want is something akin to Terra: http://terralang.org/

What you want is access to all the intermediate forms in the compiler, not just the AST.

Looks similar to https://ivorylang.org/

Didn't know about that yet. That indeed looks interesting. Thanks a lot!

There is a big difference between Cog and the C preprocessor -- the generated code actually lives within the file, and your IDE / type checker / linter will process it. Sure, they aren't hygienic, but for lightweight macros (which, in non lisps, is the majority of the macros people write imo) hygiene isn't a must-have.

I have found Cog to be very helpful while writing Rust code. I love Rust, but I'd prefer to write my macros in a different language (like Python). I also find debugging macro expansions to be much more painful than debugging Cog macros.

Maybe I don't understand the nuances here, but I don't see how Cog code isn't hygenic: the Python namespace it operates in has nothing to do with the namespaces in the containing file as a whole. There's no way for names chosen by the Cog -writer to conflict with the names in the larger file.

Rather language-specific, but this is exactly the use case of Lisp's homoiconicity. The language's code and data are structured identically, so you can write code that takes other code as input.

The file containing the code is text. What AST do you mean? Cog doesn't have to understand anything about the structure of the containing file, so it can be used on any text file.

Processing the language’s abstract syntax tree is much less error-prone, for one. C macros (and any other text based macro system, including Cog) can be harder to debug than an AST version. While you could implement something complex, like list comprehensions in Cog, it would be a lot more complex and buggy than the AST-modifying equivalent[0].

That’s probably not what the author (you?) had in mind for Cog, but it is effectively a macro system so tying into the AST could be a big advantage. The only nice part about the text version is that it doesn’t presuppose a host language.


Less error-prone in one sense, more error-prone in another. Any time you have an AST integration, you will run into versioning and integration issues. Consider for example a modern ES6 stack with Babel -- will all your third party tooling recognize the latest syntaxes recognized by Babel? And if so, will it all output AST to text in the correct way? Probably not. Same goes for versioning in languages like Python 2 vs 3 where the AST is only slightly different.

It's much simpler for a tool like this to be "dumb" -- leave the correct syntax to the human, since it's a lot easier for a human to deal with on a one-off basis than to have a group of developers writing and maintaining dozens of AST parsers and code generators.

Given the number of code injection vulnerabilities and escaping confusions we see, I disagree with the assertion that humans are good at understanding code in the same way that parsers are.

Also, any modern IDE already needs to parse the AST anyway to provide any half-decent inspection features. The grammars/specs of most popular languages are also readily available and well-maintained.

On the contrary, if you add search/replace style macro processing, you actually make the code more difficult to formally analyze because, then not even an IDE could build a meaningful AST without actually expanding the macros. (which in this case would mean executing arbitrary python code)

> Also, any modern IDE already needs to parse the AST anyway to provide any half-decent inspection features. The grammars/specs of most popular languages are also readily available and well-maintained.

But do those IDEs actually execute the in-language generators? Which AST are they reporting, the preprocessed or postprocessed?

I've heavily used M4 with C code before and what I liked was the ability to see and run tools against the postprocessed code. Textual replacement can be thought of as a worse is better approach, a principle behind many of the systems that people enjoy and even find to be elegant (at least as long as they don't look too closely).

The point with cog is that you will have the output in the file also. The ide doesn't need to understand the generator.

I like Cog. It is usefull for a set of problems. Eg. writing big enums for cpp with enum-to-string and back functions etc.

I can understand the priority of making the tool language-neutral, but by making it agnostic of the underyling structure, this can also cause lots of "code injection" like problems where the generated code behaves in many unintuitive ways. See e.g. https://stuff.mit.edu/afs/athena/project/rhel-doc/3/rhel-cpp... for examples.

But cog can be used for other file types. I can imagine adding it to documents to get a kind of poor man's Mathcad.

I wonder if this could be integrated with Emacs Org-mode?

The Swift code base uses something similar that the core team wrote: GYB, Generate Your Boilerplate. It’s used to generate several variants of similar code that would be cumbersome to maintain otherwise.


> GYB, Generate Your Boilerplate

Love the name :)

I use this extensively in a production codebase to help facilitate keeping things DRY across multiple languages/filetypes (java, xml, less, html) while not locking me into a framework since, at the end of the day, if I want to stop using cog, I'm still left with completely normal code.

I've layered a kind of DSL (more Python in comments with a different marker) on top of cog so multiple files can reference the same metadata (domain model fields in my case) when doing the codegen.

Having worked with script generated C++ in the past, this looks annoying as hell to debug.

Whereas macros and templates have compiler support to give line numbers inside the macro/template, generated code errors have an extra step of having to look at the C++, find the error in the generator, rinse, repeat.

Lambdas are better, but if you have to repeat yourself in a way that's too syntactically weird for a template or lambda, you can "#define MY_MACRO(...)" and end it with "#undef MY_MACRO" to keep the namespace clean.

Funny you say that, as we use Cog to avoid having to deal with C++ templates and their associated pains. The nice thing about Cog is that it operates like any Linux command-line tool, just at the text level, and as a result, if something has gone wrong at the compilation phase, you can see what was fed into the compiler to see what exactly was generated. Interpreting C++ template error outputs is an art in and of itself.

Sometimes you're working with a language that doesn't support a sophisticated preprocessor, or where it's convenient to be able to use a higher-level language to generate constants/do math at compile time.

AFAIK, VC doesn't support line numbers in macros. If you want better line numbers generated code, you can generate #line pragmas.

Typically, I don't debug any generated code though, because it's only boilerplate that has been tested a million times already. So that's a bit of a non-issue in the real world.

This is cool and I could see myself using this, but I wonder why it's necessary to `import cog` in the examples? Seems like it'd be better to just include cog implicitly in the namespace by design for these snippets since you're practically always going to use it.

Explicit is better than implicit. Automatically importing it makes the code look like magic.

As it happens, "cog" is also implicitly imported, though tbh I forget why....!

Awesome, that makes a lot more sense to me. I'm a long-time fan and reader, keep up the great work!

Oh geeze, this reminds me of the people who use perl to autogen a bunch of verilog (which is super common for whatever reason).

Just don't, instead find a macro system that actually understands your language.

What macro system should I use if I want to define a data schema in one place, and then generate C code and SQL code from it?

I wrote a generic version of cog that can use any language as the generator code. It's called gocog, because it's written in go, but once compiled, it's a static binary, and you don't need go on the host machine.


It's directly built off of cog's ideas and mimics much of cog's interface. (I worked with Ned, cog's author back in the day, and really enjoyed having cog to write boilerplate for me).

gocog is some of the first code I wrote in Go, so it's not super pretty code, but it's a very useful little tool for generating boilerplate.

This is better in a sense that the user doesn't need to learn something new except for the language the user is programming in and python (even though theoretically manipulating the AST will be less error prone, etc...). I use javascript everywhere, but I still didn't learn how to make babel plugins/macros because copy/pasting snippets of code two or three times is easier than learning. It's a pity that people still couldn't make a language that has a super intuitive macro system like lisp (homoiconicity, the AST is the language), and a intuitive syntax like python. I actually believe that this is partly because most Lisp users don't like the idea of new syntax and that all major lisps (CL, Clojure, Scheme) doesn't have syntax sugar as default. I would appreciate if a new CL tutorial appears that uses infix notation with reader macros(`#I`) or a Clojure tutorial that uses the infix package(https://github.com/rm-hull/infix). It will be great to beginners because 1. they wouldn't be scared of prefix notation and 2. it shows (a part of) what lisp macros can do (introduce syntax sugar in a way that is natural to the language).

I built something like this in ~2006 called "PHPinPHP" because I wanted to generate PHP classes from my mysql schema. It even used "[[[...]]]" blocks like cog does.

I eventually realized that A) generated code is completely unmaintainable; and B) the reason I thought I needed code generation is because my base language wasn't flexible enough.

Later on I switched to python and haven't yet hit a problem that I need code generation to solve.

So basically what you want out of your jinja templates in python.

Does it handle line numbers correctly on errors?

Python errors in your Cog-generator will report the correct file and line number in the larger containing file.

Reminds me of lips ( https://github.com/zc1036/lips and https://github.com/rbryan/guile-lips ).

Cog looks a little over-the-top for my purposes, but still looks saner than M4 ;)

I wrote something very similar some years back: https://github.com/zwegner/prethon/

I can't remember if I saw Cog first, and wrote a different version that fit my needs better, or if I only found Cog afterwards...

It uses a more PHP-esque syntax for inserting Python code. It has inline expression syntax, and quote functions, which IMO make it nicer than Cog for using as a code preprocessor--it's easy to make e.g. function specializations, or loop over code blocks. It's not very well documented though, and is probably missing some nice features.

Why does this need to be a standalone tool? I can already do this in Emacs by pasting emacs-lisp code snippets and executing then while editing the file, inserting the output into it. Do other editors not have this feature?

What I like about it (I use it) is that it lets me drive code in low-level, somewhat math-incapable languages at compile time. I have a project right now that does real-time signal processing in an FPGA. The actual project itself is organized in Verilog, which is cumbersome to do math raw math in. I use cog in my build to do a bunch of preprocessing math in python. For example, since I use cog to calculate a bunch of constants (for coefficients and the like), I can change the sample rate in one control file and the compilation process will re-do all the math for me.

> Why does this need to be a standalone tool? I can already do this in Emacs...

It needs to be standalone so that you can make it part of your build process instead of messing around in some text editor that not everybody wants to use.

Doing it manually in an editor is fine, but it's useful to automate the process also. Just because something can be done in an editor doesn't mean you don't want to be able to do it at the command line too.

Not to mention using "C-u M-|" (shell-command-on-region).

Have you actually looked at what cog does? I don't see how you would manually use Emacs commands to do the same more than once.

I'm not exactly sure I understand the use-case, but as far as the example used in the article, shell-command-on-region accomplishes the same kind of thing. Why not leave the "generation code" as a comment in your file the same way cog does?

Template-generators like cog are meant to run periodically, for example every time you 'compile' your project. Often they contain dynamic elements which can change between each run.

Using your emacs-command would defeat that purpose, because you would need to search the region at every run and re-execute it manually again and again and again. And you would need to documentate the command anyway, because nobody can remember all those regions. So why not automate this task then?

Seems like most languages where you'd use this already have macros?

C++ macros can't read a configuration file to generate code (at least I don't want to know that they can!). And Cog works in any text file, so it can be used for languages (like HTML) that don't have macros.

They kind of can, via #include, but the configuration file has to be in a particular format and you're limited in what you can do with it.

Go doesn't, and I know that some people have suggested using code generation in lieu of generics.

it doesn't have macros, but it does have code generation.

One use case for this could be dumping generated algorithms from sympy. I was doing some constraint programming and ended up almost writing something very similar albeit poorly and ad-hoc, by generating .c files that I #included into other .c files, it was very messy. The use case was to write some mathematical relations and generate C functions to calculate their differentials. It was a lot of manual copy-pasting until I came up with the #include trick, but this would have been better.

Your example could do with some syntax highlighting. There's so much ugly punctuation in it I didn't even notice the actual code at the end. Plus my first impression was that I would never put something so unreadable in my source, whereas in a nice green comment I wouldn't care so much.

In Java I solved the problem of whitespace by just running my result through google-java-format, but I see how Python's offside rule would make that totally impossible.

Good idea: i added syntax highlighting to the first example on the page.

InGenR [1] is a similar utility I wrote sometime back.

It is similar to cog in that generated code resides alongside the source code.

Some differences/advantages are:

1. Generators can be pure JavaScript or declarative dot templates.

2. The generators can be distributed as npm packages as the generators are resolved through npm's require resolution.

[1] https://github.com/lorefnon/InGenR

Interesting stuff. Quite a while ago I wrote something vaguely similar using JScript to make a demo of a sort of 'mathematical' document editor as part of the VB Classic Wikibook, see https://en.wikibooks.org/wiki/Visual_Basic/JArithmetic

Might be interesting to revisit the idea.

I think jq would be a better DSL for this, not least because it's easy to integrate libjq into C/C++/Rust programs.

I've also been thinking of building a trivial little library to use jq for configuration files, where jq syntax is more convenient than JSON, and too where you can always write or alter configuration objects using path-based assignments, so you get to choose JSON-style or TOML-style.

We use it at work to:

* Insert generated C++ and Python boilerplate code * Generate parametrized tests based on an external data files * Copy code from one place to another and keep them up-to-date * Pinning dependencies across multiple projects using a single source of truth

So this tool is immensely useful.

I can't tell whether this is better or worse than header2whatever (piping C++ headers into Jinja2): https://github.com/virtuald/header2whatever

i've been using a similar thing, a pre-processor written in powershell, that lets me embed a few different languages in any type of file.

i've used it for writing books, and interactive checklists, and for generating static html websites.

this is the one i implemented: https://github.com/secretGeek/pre

I hear LISP has a pretty good macro system...

It's like PHP but Python, eh?

My thought exactly... Python Hypertext Preprocessor...

Or you can generate pieces of code with non-embedded Python scripts at the earliest stage of make and inline them with the host language's preprocessor.

This way you'll have the same functionality but with standard tooling for every language. This means conventional debugging, static analysis, testing etc.

And no extra dependencies, too.

I'm not sure what you mean by "standard tooling for every language." Cog will work with any text file. Or do you mean that cog itself is non-standard?

No-no, I mean if generating code is in a separate Python file, then I can debug it with pdb, I can profile it with cProfile, I can run Pylint on it, - all the standard tools.

I like the idea of everything sharing the same file, too, but it does make working with Python part a bit more difficult.

Also, with keeping it separate, the "every language" part comes up. It doesn't have to by Python if it's a separate code generator. Whatever suits you will work. You can generate code for C in C++. Or assembly in Common Lisp. Everything in anything.

I’d be interested to hear any use cases that people could imagine for this.

I use it in my presentations' HTML files to add computed content, such as the results of running code, or more complicated tables and diagrams.

We have something at work that's very similar but perl-based for our VHDL source. It saves a lot of boilerplate for conversion functions, null types, read/write cpu functions etc. Vhdl has pretty awful templating so it's really useful, you just need to use it sparingly else it becomes unreadable very quickly.

I do this using cog for Verilog code. It lets me generate signal processing constants/math in the signal processing code itself, driven by a single control file (with, i.e. system sample rate) without having to rely on Verilog's awkward (where even available) math.

Could be useful for templating config files. It sounds a lot like Jinja2. The question for me is what makes this better than Jinja2?

So, it's like m4 macro processor which author used as XSLT. Could be a good call today if Cog had access to python's AST instead of plain text and interacted with Swagger/OpenAPI or something, I mean tools like autorest.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact