And m4 works in combination with a bunch of C code. Virtually any system with m4 can come with an interpreter for a higher level language without running out of space (I can't imagine many uses for m4 at runtime on systems where, say, a perl interpreter wouldn't fit). Why is relying on such a runtime a problem?
What does “autonomous” mean? There are a ton of template engines that don’t depend on a parent framework, there is no reason you can’t implement all features in C.
Jinja2 is an API and it needs a Python program to determine the values for the substitution variable and invoke the template engine. In contrast the M4 executable is the engine.