Implementing a Type-safe printf in Rust

RcouF1uZ4gsC · on Aug 14, 2020

Note that this is a similar technique to one done in Andrei Alexandrescu's book Modern C++ Design that used C++98.

IIRC he called it TypeList and used that to actually implement tuple as library type.

When C++11 variadic templates came out, a lot of C++ meta programmers breathed a huge sigh of relief of being able to abandon that technique. In addition, to being a pain to work with, because of the extensive use of recursion, it can take a large toll on compile times.

Variadic templates and being able manipulate tuples generically are probably the two biggest things I miss when programming Rust.

pjmlp · on Aug 14, 2020

And with constexpr and its variants, alongside auto templates, fold expressions, and now concepts, yet another legion breathed a huge sigh of relief to be able to abandon all those tag dispatch based techniques.

WalterBright · on Aug 14, 2020

D simply checks the arguments to printf against the format string. This facility exists as an extension in most C and C++ compilers.

D's writefln() function uses variadic template specializations to achieve a similar result, meaning you can just use `%s` for the formats and writefln() knows how to print each type.

quietbritishjim · on Aug 14, 2020

Are you saying the compiler itself does the checking? That is a pity. Compiler magic for a checking a fixed specific format system (I don't know about D, but that's what you're describing in C printf) is significantly worse the compile-time checking being written in the language itself.

What if you, as library author (not compiler writer!), think of a much better formatting system? Or something that's maybe not better in general, but better for an unusual situation you're deploying in? Or you want to write something that's different to a formatting system but uses similar tricks? If you don't have language facilities to do this yourself then you're stuffed.

As well the original post showing it's possible in Rust, this is also possible in C++. For example, the C++ fmt [1] library allows compile-time time checking of the number of slots in format strings against the number of arguments, and even that the format specifications (.03d or whatever) match the passed in types. And it's not reliant on the compiler doing its own magic checking outside of the language.

[1] https://github.com/fmtlib/fmt

MaxBarraclough · on Aug 14, 2020

edit I see I'm too late, Walter already answered this. I'll leave this here anyway.

I don't think D's writefln is getting any special treatment from the compiler. It seems to use D's compile-time features to perform the check.

https://github.com/dlang/phobos/blob/v2.093.0/std/stdio.d#L4...

AnIdiotOnTheNet · on Aug 14, 2020

Zig does it in a similar way, using the compile time type information available to all Zig code, and is in fact implemented in the std library. No magic necessary.

renox · on Aug 14, 2020

But both fail short to Python/bash.. format strings where you can have "in place" reference to variable which IMHO is much more readable.

ben-schaaf · on Aug 14, 2020

I completely agree that string interpolation is often superior to a format string, but it's important to note that they are not equivalent. String interpolation is generally evaluated in-situ and has full access to the local scope. Format strings on the other hand can be passed around and manipulated (even at compile-time) and can only ever access the data they're formatted with. strftime and strptime are a good example of a format string usage that can't be replaced by string interpolation in python.

D also has the capability of implementing string interpolation using mixins - though the resulting code isn't quite as neat.

WalterBright · on Aug 14, 2020

I did write a proposal for "in place" reference (i.e. string interpolation) but could not reach consensus with the D community for it.

CyberDildonics · on Aug 14, 2020

Formatting text output is not often a problem in the larger scope of a program, but having performance bottlenecks can be.

rightbyte · on Aug 14, 2020

Come on ... printf is a monstrousity and I understand why you would put hard coded checks in the compiler.

quietbritishjim · on Aug 14, 2020

Sure, I'm not disputing that. In fact I think that proves my point.

There are languages (C++, Rust) where you can write your own, safe formatting library just using the language. It's a world away from printf.

Compared to that, parent comment says that in D the compiler has to have checks for formatting built in to it and compares it to printf. Yuk!

WalterBright · on Aug 14, 2020

Allow me to clarify. D does have writef() which provides a fully typesafe print formatting routine. It is unknown to the compiler (i.e. no magic).

https://dlang.org/phobos/std_stdio.html#.writef

D can also call C's printf, and checking that is magical indeed. It exactly follows the C99 Standard for printf. A major feature of D is being able to call C functions directly, and printf is one of them. (D doesn't recognize printf specifically, it has to have a special attribute for it `pragma(printf)` which the compiler recognizes. This is so the checking can be applied to user-defined functions that use conforming printf-style format strings.)

https://dlang.org/spec/pragma.html#printf

And the code to do it:

https://github.com/dlang/dmd/blob/master/src/dmd/chkformat.d

quietbritishjim · on Aug 14, 2020

OK, thanks for clarifying. Reading back your original comment, I can see that its literal wording does say that. I was confused by you leading with printf handling, which is not really relevant to the original article (certainly not as much as writef()), so I inferred you were only mentioning it as context to writef() working similarly. Your use of the word "simply" also threw me off, it sounded like you thought compiler magic was a better solution that a language-level one.

I now see you were just making an unrelated point first. And having a specific compiler feature just for printf (and co.) is totally reasonable given how widespread it.

Your first link answers what would've been a follow up question: OK, writef() is a variadic template implemented with just normal language features, but does it do compile-time checking of the format? The docs says that it does:

> fmt: The format string. When passed as a compile-time argument, the string will be statically checked against the argument types passed.

pests · on Aug 15, 2020

Walter Bright is the creator of the D language so this is the best place to ask these questions.

dthul · on Aug 14, 2020

That's actually very similar to Rust, where println is also type safe. Depending on the format string it uses different traits to print each value. For example the format specifier "{}" will use the value's "Display" trait implementation to format it, while "{:b}" will use the "Binary" trait. If a value doesn't implement the required trait, that's a compile time error. Rust doesn't have variadic argument lists (yet?) so println is implemented as a macro. This has the downside of requiring the format string to be a compile time literal but the upside of not needing to parse it at runtime.

hyperman1 · on Aug 14, 2020

One thing i really miss from C's printf or Java's MessageFormat is the ability to change argument positions: Different languages can display the argument in different orders, and can have the format string in a resource bundle.

It is of course very hard to have type safety when the format string is in a resource bundle

dthul · on Aug 14, 2020

The position of arguments is actually not required to follow the format string. From the documentation:

  println!("{1} {} {0} {}", 1, 2); // => "2 1 1 2"

But you are right, you cannot dynamically change the format string. If you need this facility, there is the runtime_fmt crate which will parse the format string at runtime instead of compile time. That of course means that it can fail at runtime if the format string does not match the arguments.

amelius · on Aug 14, 2020

Why use numbers and not names, though?

dthul · on Aug 14, 2020

You can!

  println!("{a} {c} {b}", a="a", b='b', c=3);  // => "a 3 b"

flohofwoe · on Aug 14, 2020

POSIX printf() actually supports positional arguments:

https://www.godbolt.org/z/xG15M5

Both clang and gcc support this, but unfortunately not the Microsoft C compiler.

PS: actually, it does, just not in the normal printf call: https://docs.microsoft.com/en-us/cpp/c-runtime-library/print...

andralex · on Aug 15, 2020

D also supports POSIX positional arguments: https://dlang.org/library/std/format/formatted_write.html

skohan · on Aug 14, 2020

One feature I really miss in Rust is being able to insert variables in place in strings, like you have in Swift, or JS template literals.

_hrfd · on Aug 14, 2020

This is being worked on: https://github.com/rust-lang/rust/issues/67984

skohan · on Aug 15, 2020

Neat. It's interesting that they've chosen just plain braces to denote the argument insertion; i.e: "text {value}" rather than something like "text ${value}" which seems to be what most languages use. I guess there must be a way to escape it if you just want braces in your string.

_hrfd · on Aug 15, 2020

Right, there is a way - to escape braces and print "text {value}" rather than the actual value, you'd use "text {{value}}"

Groxx · on Aug 15, 2020

Since procedural macros are non-hygenic, seems like this shouldn't be too hard? Though it may not be in the stdlib (yet?) for Reasons (e.g. exact syntax, edge cases).

lmm · on Aug 14, 2020

It'd be worth reusing the HList (and the macro for instantiating it) from frunk rather than making a custom HList type.

(This is how the Scala ecosystem ended up working with records; technically the language doesn't have proper record types, but the whole ecosystem uses Shapeless to the extent that IDEs and other tooling are expected to understand the Shapeless macro, so de facto it does)

kelnos · on Aug 14, 2020

Pretty sure that's what the author was doing. The quick HList implementation shown in the post was just to explain what HLists are; the author later mentions (and presumably uses) frunk.

dependenttypes · on Aug 14, 2020

Here is one in Idris. https://gist.github.com/chrisdone/672efcd784528b7d0b7e17ad9c...

Because Idris supports dependent types it is prettier both in the implementation and usage.

signa11 · on Aug 14, 2020

imho, with c++'s variadic templates doing the equivalent there is quite trivial.

dan-robertson · on Aug 14, 2020

The issue with templates is that you can’t really type check them at the point of definition. Instead they need to be checked each time the template is used.

This can make compilation slow but it can also make template writing very hard because a bug might only appear when a template is instantiated with certain types.

quietbritishjim · on Aug 14, 2020

The parent comment's point is just that variadic templates would make it easier to do the exact job talked about in this blog post. It sounds like you're objecting but you haven't quite said that outright. Are Rust macros less susceptible to the problem you're describing?

monadic2 · on Aug 14, 2020

Yea, traits make this much simpler to avoid. The problematic type would not match the trait, likely leading to much simpler communication of the error.

quietbritishjim · on Aug 14, 2020

I'm not really convinced. In C++, you could start your hypothetical format function with static_assert(formatter<T>:: exists) or similar and get a very similar error message. (In C++ template terminology, a class like that is also called a trait, funnily enough, but that's just a convention rather than a language feature.) I think you could use C++ concepts to check add part of the function signature, but the functional difference would be very small. And none of these – including Rust traits – help you with compile time checking of format specifiers.

signa11 · on Aug 14, 2020

> The issue with templates is that you can’t really type check them at the point of definition. Instead they need to be checked each time the template is used.

> This can make compilation slow but it can also make template writing very hard because a bug might only appear when a template is instantiated with certain types.

yup. templates are 'compiled' in 2 phases as follows:

- without instantiation at definition time, the code itself is checked for correctness, completely ignoring the template parameters.

- at instantiation time, the code is checked again to ensure that all parts depending on template parameters are valid.

the 2-phase translation breaks canonical notions about compile+link, where declarations of function is sufficient to compile its use...

saagarjha · on Aug 14, 2020

Templates are basically dynamic typing in the compiler, to be honest.

dthul · on Aug 14, 2020

Yes, they make solving such problems quite elegant. The Wikipedia page on variadic templates has a toy printf C++ implementation showcasing that.

AshamedCaptain · on Aug 14, 2020

I kind of agree. The worst part is that most of the problems with C++'s classic "typesafe_printf" implementation come from the fact that there is an extra pass over the formatting string in order to do the actual typecheck. But this rust implementation does not have this problem (no type specifiers in the format string, just placeholders as in {fmt}), and it still looks pretty ugly (and when your reference is a ugly C++ metaprogramming version, that says something).

mFixman · on Aug 14, 2020

Which raises the obvious questions: why didn't C++11 implement template printf along with variadic templates?

`cout <<` is ugly and hard to use for anything but the simplest printing. Type safe printf seems like the obvious use case of variadic templates.

signa11 · on Aug 14, 2020

> Which raises the obvious questions: why didn't C++11 implement template printf along with variadic templates?

https://en.cppreference.com/w/cpp/utility/format and the actual proposal https://www.zverovich.net/2019/07/23/std-format-cpp20.html

monadic2 · on Aug 14, 2020

Yea, but then you have c++ to deal with.

jganetsk · on Aug 14, 2020

They should have used existential types instead of having so many type parameters. I rewrote it with existential types and it was cleaner. This is probably due to the author importing an example from Haskell, language where existential types aren't as prevalent.

The_rationalist · on Aug 14, 2020

It's still a shame that rust doesn't support variadic functions and hence the hacky need of implementing printf through a macro.

irh · on Aug 14, 2020

Rust's macros aren't hacky though? I would understand this point better if you were talking about C's macros, but macros in Rust are hygienic so don't suffer the same pitfalls.

kelnos · on Aug 14, 2020

I wouldn't say they're hacky, but rust's macro specification is nearly a different language than rust itself. Writing -- and more importantly, reading and understanding -- rust macros still requires quite a bit more knowledge than just regular rust code.

methyl · on Aug 14, 2020

What are the downsides of current printf macro implementation vs a variadic fn one?

perryizgr8 · on Aug 14, 2020

One thing I don't like about the macro is that it looks like the passed variable has been borrowed and you can't use it again. But that's not true. You can use it again and again even though you passed by value.

mqus · on Aug 18, 2020

Well, the variables are only borrowed for the short time it takes to print that line. The borrow is terminated as soon as the print is finished. Because the println! macro does not return anything, there is no reason to extend the borrow.

tsimionescu · on Aug 14, 2020

One big difference is that you can't call it at runtime, e.g. to map it over a list of tuples.

lmm · on Aug 14, 2020

- Possibility of implementation errors

- A reader can't really reason about what a macro might be doing, since they're immensely powerful.

- Tooling can't necessarily understand macros correctly, for the same reason. E.g. automated refactoring like extracting a repeated expression can't be relied on to work correctly.

zaarn · on Aug 14, 2020

The thing is though that Rust macro's are hygienic; the macro must generate the syntax tree and cannot mangle outside scope. Most rust tooling I've seen handles and understands printf as a macro easily and tooling is improving to be able to fully introspect a macro.

methyl · on Aug 14, 2020

These are downsides of macros in general, but I don't see how they are relevant to a built-in macro from stdlib.

lmm · on Aug 14, 2020

They apply just the same in the standard lib surely? I suppose tooling might be expected to have support for the standard lib, but you're creating a lot more work for tool implementors if there is a large (and presumably evolving) set of "standard" macros. The concern about the reader is definitely still there unless you're expecting every Rust user to memorize which macros are in the standard library.

The_rationalist · on Aug 14, 2020

Well there might be no downsides beyond the need of putting "!", for println. But my point stand for use cases in general of variadic functions where a macro wouldn't be appropriate and anyway macros are not readable.

naavis · on Aug 14, 2020

I think only accepting string literals as the format string is a pretty big downside.

hardwaresofton · on Aug 14, 2020

tl;dr rust is cool, type level programming excites haskellers, you should take rust for a spin because it somehow manages to seemingly do it all.

Note that this is the kind of type machinery that Haskellers get excited about. If you're somewhat familiar about what type level programming in this area looks like there, check out Hlist[0], Vinyl[1][2], and if you're really interested, phantom types[3].

One of the best things about rust is that it manages to have both a near-research-grade type system, approaching C speed, and novel data safety features all rolled in one language. If those features weren't enough, it's got an extremely good module system with public package hosting, helpful compiler, relatively easy cross compilation and toolchain convenience other languages only dream of along with a welcoming community.

I'm not associated with the rust team/mozilla in any way, but there are very few reasons to not be excited about where rust will be in 10 years. Whether you're writing code to run in browsers (compiling your frontend code to wasm is a pretty intense step I'll admit -- I like my TS/JS just fine), web servers, or embedded code to run on your small ESP32 (or smaller), rust somehow fits.

[0]: https://hackage.haskell.org/package/HList

[1]: https://hackage.haskell.org/package/vinyl-0.13.0/docs/Data-V...

[2]: https://github.com/VinylRecords/Vinyl/blob/master/tests/Intr...

[3]: https://wiki.haskell.org/Phantom_type

akiselev · on Aug 14, 2020

> I'm not associated with the rust team/mozilla in any way

Is there still a Rust team at Mozilla?

hardwaresofton · on Aug 14, 2020

This came up on r/rust: https://twitter.com/rustlang/status/1294024734804508679 (Reddit thread: https://www.reddit.com/r/rust/comments/i994km/core_team_stat...)