Hacker News new | past | comments | ask | show | jobs | submit login
Implementing a Type-safe printf in Rust (willcrichton.net)
119 points by lukastyrychtr on Aug 14, 2020 | hide | past | favorite | 59 comments



Note that this is a similar technique to one done in Andrei Alexandrescu's book Modern C++ Design that used C++98.

IIRC he called it TypeList and used that to actually implement tuple as library type.

When C++11 variadic templates came out, a lot of C++ meta programmers breathed a huge sigh of relief of being able to abandon that technique. In addition, to being a pain to work with, because of the extensive use of recursion, it can take a large toll on compile times.

Variadic templates and being able manipulate tuples generically are probably the two biggest things I miss when programming Rust.


And with constexpr and its variants, alongside auto templates, fold expressions, and now concepts, yet another legion breathed a huge sigh of relief to be able to abandon all those tag dispatch based techniques.


D simply checks the arguments to printf against the format string. This facility exists as an extension in most C and C++ compilers.

D's writefln() function uses variadic template specializations to achieve a similar result, meaning you can just use `%s` for the formats and writefln() knows how to print each type.


Are you saying the compiler itself does the checking? That is a pity. Compiler magic for a checking a fixed specific format system (I don't know about D, but that's what you're describing in C printf) is significantly worse the compile-time checking being written in the language itself.

What if you, as library author (not compiler writer!), think of a much better formatting system? Or something that's maybe not better in general, but better for an unusual situation you're deploying in? Or you want to write something that's different to a formatting system but uses similar tricks? If you don't have language facilities to do this yourself then you're stuffed.

As well the original post showing it's possible in Rust, this is also possible in C++. For example, the C++ fmt [1] library allows compile-time time checking of the number of slots in format strings against the number of arguments, and even that the format specifications (.03d or whatever) match the passed in types. And it's not reliant on the compiler doing its own magic checking outside of the language.

[1] https://github.com/fmtlib/fmt


edit I see I'm too late, Walter already answered this. I'll leave this here anyway.

I don't think D's writefln is getting any special treatment from the compiler. It seems to use D's compile-time features to perform the check.

https://github.com/dlang/phobos/blob/v2.093.0/std/stdio.d#L4...


Zig does it in a similar way, using the compile time type information available to all Zig code, and is in fact implemented in the std library. No magic necessary.


But both fail short to Python/bash.. format strings where you can have "in place" reference to variable which IMHO is much more readable.


I completely agree that string interpolation is often superior to a format string, but it's important to note that they are not equivalent. String interpolation is generally evaluated in-situ and has full access to the local scope. Format strings on the other hand can be passed around and manipulated (even at compile-time) and can only ever access the data they're formatted with. strftime and strptime are a good example of a format string usage that can't be replaced by string interpolation in python.

D also has the capability of implementing string interpolation using mixins - though the resulting code isn't quite as neat.


I did write a proposal for "in place" reference (i.e. string interpolation) but could not reach consensus with the D community for it.


Formatting text output is not often a problem in the larger scope of a program, but having performance bottlenecks can be.


Come on ... printf is a monstrousity and I understand why you would put hard coded checks in the compiler.


Sure, I'm not disputing that. In fact I think that proves my point.

There are languages (C++, Rust) where you can write your own, safe formatting library just using the language. It's a world away from printf.

Compared to that, parent comment says that in D the compiler has to have checks for formatting built in to it and compares it to printf. Yuk!


Allow me to clarify. D does have writef() which provides a fully typesafe print formatting routine. It is unknown to the compiler (i.e. no magic).

https://dlang.org/phobos/std_stdio.html#.writef

D can also call C's printf, and checking that is magical indeed. It exactly follows the C99 Standard for printf. A major feature of D is being able to call C functions directly, and printf is one of them. (D doesn't recognize printf specifically, it has to have a special attribute for it `pragma(printf)` which the compiler recognizes. This is so the checking can be applied to user-defined functions that use conforming printf-style format strings.)

https://dlang.org/spec/pragma.html#printf

And the code to do it:

https://github.com/dlang/dmd/blob/master/src/dmd/chkformat.d


OK, thanks for clarifying. Reading back your original comment, I can see that its literal wording does say that. I was confused by you leading with printf handling, which is not really relevant to the original article (certainly not as much as writef()), so I inferred you were only mentioning it as context to writef() working similarly. Your use of the word "simply" also threw me off, it sounded like you thought compiler magic was a better solution that a language-level one.

I now see you were just making an unrelated point first. And having a specific compiler feature just for printf (and co.) is totally reasonable given how widespread it.

Your first link answers what would've been a follow up question: OK, writef() is a variadic template implemented with just normal language features, but does it do compile-time checking of the format? The docs says that it does:

> fmt: The format string. When passed as a compile-time argument, the string will be statically checked against the argument types passed.


Walter Bright is the creator of the D language so this is the best place to ask these questions.


That's actually very similar to Rust, where println is also type safe. Depending on the format string it uses different traits to print each value. For example the format specifier "{}" will use the value's "Display" trait implementation to format it, while "{:b}" will use the "Binary" trait. If a value doesn't implement the required trait, that's a compile time error. Rust doesn't have variadic argument lists (yet?) so println is implemented as a macro. This has the downside of requiring the format string to be a compile time literal but the upside of not needing to parse it at runtime.


One thing i really miss from C's printf or Java's MessageFormat is the ability to change argument positions: Different languages can display the argument in different orders, and can have the format string in a resource bundle.

It is of course very hard to have type safety when the format string is in a resource bundle


The position of arguments is actually not required to follow the format string. From the documentation:

  println!("{1} {} {0} {}", 1, 2); // => "2 1 1 2"
But you are right, you cannot dynamically change the format string. If you need this facility, there is the runtime_fmt crate which will parse the format string at runtime instead of compile time. That of course means that it can fail at runtime if the format string does not match the arguments.


Why use numbers and not names, though?


You can!

  println!("{a} {c} {b}", a="a", b='b', c=3);  // => "a 3 b"


POSIX printf() actually supports positional arguments:

https://www.godbolt.org/z/xG15M5

Both clang and gcc support this, but unfortunately not the Microsoft C compiler.

PS: actually, it does, just not in the normal printf call: https://docs.microsoft.com/en-us/cpp/c-runtime-library/print...


D also supports POSIX positional arguments: https://dlang.org/library/std/format/formatted_write.html


One feature I really miss in Rust is being able to insert variables in place in strings, like you have in Swift, or JS template literals.



Neat. It's interesting that they've chosen just plain braces to denote the argument insertion; i.e: "text {value}" rather than something like "text ${value}" which seems to be what most languages use. I guess there must be a way to escape it if you just want braces in your string.


Right, there is a way - to escape braces and print "text {value}" rather than the actual value, you'd use "text {{value}}"


Since procedural macros are non-hygenic, seems like this shouldn't be too hard? Though it may not be in the stdlib (yet?) for Reasons (e.g. exact syntax, edge cases).


It'd be worth reusing the HList (and the macro for instantiating it) from frunk rather than making a custom HList type.

(This is how the Scala ecosystem ended up working with records; technically the language doesn't have proper record types, but the whole ecosystem uses Shapeless to the extent that IDEs and other tooling are expected to understand the Shapeless macro, so de facto it does)


Pretty sure that's what the author was doing. The quick HList implementation shown in the post was just to explain what HLists are; the author later mentions (and presumably uses) frunk.


Here is one in Idris. https://gist.github.com/chrisdone/672efcd784528b7d0b7e17ad9c...

Because Idris supports dependent types it is prettier both in the implementation and usage.


imho, with c++'s variadic templates doing the equivalent there is quite trivial.


The issue with templates is that you can’t really type check them at the point of definition. Instead they need to be checked each time the template is used.

This can make compilation slow but it can also make template writing very hard because a bug might only appear when a template is instantiated with certain types.


The parent comment's point is just that variadic templates would make it easier to do the exact job talked about in this blog post. It sounds like you're objecting but you haven't quite said that outright. Are Rust macros less susceptible to the problem you're describing?


Yea, traits make this much simpler to avoid. The problematic type would not match the trait, likely leading to much simpler communication of the error.


I'm not really convinced. In C++, you could start your hypothetical format function with static_assert(formatter<T>:: exists) or similar and get a very similar error message. (In C++ template terminology, a class like that is also called a trait, funnily enough, but that's just a convention rather than a language feature.) I think you could use C++ concepts to check add part of the function signature, but the functional difference would be very small. And none of these – including Rust traits – help you with compile time checking of format specifiers.


> The issue with templates is that you can’t really type check them at the point of definition. Instead they need to be checked each time the template is used.

> This can make compilation slow but it can also make template writing very hard because a bug might only appear when a template is instantiated with certain types.

yup. templates are 'compiled' in 2 phases as follows:

- without instantiation at definition time, the code itself is checked for correctness, completely ignoring the template parameters.

- at instantiation time, the code is checked again to ensure that all parts depending on template parameters are valid.

the 2-phase translation breaks canonical notions about compile+link, where declarations of function is sufficient to compile its use...


Templates are basically dynamic typing in the compiler, to be honest.


Yes, they make solving such problems quite elegant. The Wikipedia page on variadic templates has a toy printf C++ implementation showcasing that.


I kind of agree. The worst part is that most of the problems with C++'s classic "typesafe_printf" implementation come from the fact that there is an extra pass over the formatting string in order to do the actual typecheck. But this rust implementation does not have this problem (no type specifiers in the format string, just placeholders as in {fmt}), and it still looks pretty ugly (and when your reference is a ugly C++ metaprogramming version, that says something).


Which raises the obvious questions: why didn't C++11 implement template printf along with variadic templates?

`cout <<` is ugly and hard to use for anything but the simplest printing. Type safe printf seems like the obvious use case of variadic templates.


> Which raises the obvious questions: why didn't C++11 implement template printf along with variadic templates?

https://en.cppreference.com/w/cpp/utility/format and the actual proposal https://www.zverovich.net/2019/07/23/std-format-cpp20.html


Yea, but then you have c++ to deal with.


They should have used existential types instead of having so many type parameters. I rewrote it with existential types and it was cleaner. This is probably due to the author importing an example from Haskell, language where existential types aren't as prevalent.


It's still a shame that rust doesn't support variadic functions and hence the hacky need of implementing printf through a macro.


Rust's macros aren't hacky though? I would understand this point better if you were talking about C's macros, but macros in Rust are hygienic so don't suffer the same pitfalls.


I wouldn't say they're hacky, but rust's macro specification is nearly a different language than rust itself. Writing -- and more importantly, reading and understanding -- rust macros still requires quite a bit more knowledge than just regular rust code.


What are the downsides of current printf macro implementation vs a variadic fn one?


One thing I don't like about the macro is that it looks like the passed variable has been borrowed and you can't use it again. But that's not true. You can use it again and again even though you passed by value.


Well, the variables are only borrowed for the short time it takes to print that line. The borrow is terminated as soon as the print is finished. Because the println! macro does not return anything, there is no reason to extend the borrow.


One big difference is that you can't call it at runtime, e.g. to map it over a list of tuples.


- Possibility of implementation errors

- A reader can't really reason about what a macro might be doing, since they're immensely powerful.

- Tooling can't necessarily understand macros correctly, for the same reason. E.g. automated refactoring like extracting a repeated expression can't be relied on to work correctly.


The thing is though that Rust macro's are hygienic; the macro must generate the syntax tree and cannot mangle outside scope. Most rust tooling I've seen handles and understands printf as a macro easily and tooling is improving to be able to fully introspect a macro.


These are downsides of macros in general, but I don't see how they are relevant to a built-in macro from stdlib.


They apply just the same in the standard lib surely? I suppose tooling might be expected to have support for the standard lib, but you're creating a lot more work for tool implementors if there is a large (and presumably evolving) set of "standard" macros. The concern about the reader is definitely still there unless you're expecting every Rust user to memorize which macros are in the standard library.


Well there might be no downsides beyond the need of putting "!", for println. But my point stand for use cases in general of variadic functions where a macro wouldn't be appropriate and anyway macros are not readable.


I think only accepting string literals as the format string is a pretty big downside.


tl;dr rust is cool, type level programming excites haskellers, you should take rust for a spin because it somehow manages to seemingly do it all.

Note that this is the kind of type machinery that Haskellers get excited about. If you're somewhat familiar about what type level programming in this area looks like there, check out Hlist[0], Vinyl[1][2], and if you're really interested, phantom types[3].

One of the best things about rust is that it manages to have both a near-research-grade type system, approaching C speed, and novel data safety features all rolled in one language. If those features weren't enough, it's got an extremely good module system with public package hosting, helpful compiler, relatively easy cross compilation and toolchain convenience other languages only dream of along with a welcoming community.

I'm not associated with the rust team/mozilla in any way, but there are very few reasons to not be excited about where rust will be in 10 years. Whether you're writing code to run in browsers (compiling your frontend code to wasm is a pretty intense step I'll admit -- I like my TS/JS just fine), web servers, or embedded code to run on your small ESP32 (or smaller), rust somehow fits.

[0]: https://hackage.haskell.org/package/HList

[1]: https://hackage.haskell.org/package/vinyl-0.13.0/docs/Data-V...

[2]: https://github.com/VinylRecords/Vinyl/blob/master/tests/Intr...

[3]: https://wiki.haskell.org/Phantom_type


> I'm not associated with the rust team/mozilla in any way

Is there still a Rust team at Mozilla?





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: