""" Wuffs' goal is to produce software libraries that are as safe as Go or Rust, roughly speaking, but as fast as C, and that can be used anywhere C libraries are used.
[...] Wuffs the Library is available as transpiled C code. Other C/C++ projects can use that library without requiring the Wuffs the Language toolchain. Those projects can use Wuffs the Library like using any other third party C library. """
It seems like this would be an interesting approach to a lot of security programming where it involves data structures, since those have typically been a source of issues. Having a memory-safe ASN.1 parser would be really nice, considering how much difficulty that has caused in the security space.
With a lot of security programming there's a reliance on constant-time algorithms, which this may not be well-suited to, however.
I prefer the annotation approach, TBH, but I'm glad to see people working on safety. Specifically, Wuffs (like Rust) has this problem where it tries to fill the same niche as C and C++ and improve on these older languages in specific ways, but in addition to being different in ways necessary to achieve the language's objective is also different from C and C++ in totally gratuitous ways that are basically just aesthetic differences --- for example, "type var" vs "var: type".
I'm a big believer in technical continuity. IMHO, a lot of recent language developments should have instead been extensions of C, C++, Java, or Python.
If that's what you really wanted, and if we always did that, then you would probably want extensions to Perl instead of Python, since it predated it slightly and was much more popular much earlier.
Instead, I think what you're really asking for is language extensions to the things you are familiar with. Those happen to be things a lot of other people are also familiar with, but that doesn't change that the same strategy results in Perl still getting the lion's share of attention. I, for one, would be perfectly happy with that, as I love Perl, but many people would not, and possibly not for any real technical reason.
Anyone that thinks the case for Rust or D is better served by extensions to C or C++ but thinks that Python is objectively a better language than Perl should probably look deeply at exactly why they hold those views.
P.S. That's not really to say you hold those views. Consider this comment less a critique of your specific statement than slight tangent spurred by it.
The C and C++ code in many relevant stacks is not going away, and many of us won't be ever be able to use D or Rust regardless of how much we like them.
UNIX clones, Windows, .NET and Java runtime implementation, WinUI/WinRT, Android (regardless of this week's annoucement)
So Microsoft Checked C, Microsoft SAL, Visual C++/clang/gcc lifetime analysers, Frama-C, hardware memory tagging, Pluton sandbox, CHERI,... are all welcomed improvements.
Even if Rust were to fully supplant C and C++ for at least systems programming, we should hope that within a few decades it's a risk of being supplanted by something else. To want otherwise is likely to inadvertently want stagnation and/or a language that while likely still improving, is having new technologies and concepts bolted on in increasingly convoluted manner, as they were not and could not have been considered when designing the language originally.
1: I think this is unlikely, if only because I don't see one language supplanting C, I see C's market cannibalized by a few different languages, with C always retaining some use.
So am I. D is designed to be an easy transition from C and C-With-Classes. For example, here is some code in C that was translated to D (it's part of the Digital Mars C++ compiler):
They look pretty much the same. The code generated is the same. In those repositories you can also see how I translated the C versions to D with plenty of examples.
The biggest impediment is the C preprocessor. You wouldn't really want to carry that forward.
After removing dependency on the C preprocessor, most of the work is global search/replacing things like `->` to `.`.
IMO this approach hasn't taken off because maintaining compatibility with C while adding safety (or really just about any property) means implementing your own sublanguage that can't arbitrarily call C functions while maintaining your safety properties. On the other hand, C being able to call into your sublanguage easier is a benefit versus jury-rigging Cargo into your build system (in the case of Rust).
On the other hand, this approach works great for adding extensions that increase the expressive power of C with new abstractions, for example algebraic data types, C++-like templating, etc.
> Software has a Peter Principle. If a piece of code is comprehensible, someone will extend it, so they can apply it to their own problem. If it’s incomprehensible, they’ll write their own code instead. Code tends to be extended to its level of incomprehensibility.
Wuffs types are not parameterized with lifetimes, so they may contain references only to items with static lifetime. This means that Wuffs cannot implement linked lists, hash tables, or other common data structures. To support those structures, Wuffs will need to either add lifetime parameters and a borrow-checker (like Rust) or add reference counting (like Rust `Arc`, C++ `shared_ptr`, Golang, Java, etc).
There are many papers that cover how to do this, starting with Alexis Beissegners thesis, up to the latest Rust belt project paper on GhostCell.
There are also many libraries implementing this, so that you don't have to do it yourself.
In practice, however, like the Wuffs project says, most code doesn't care about this. You only need this for very specific types of code, but if you need it, you have it in Rust.
On the other hand, I've asked above what happens if you try to open a png that doesn't fit in RAM, like one of those huge space photos that are >1 Tb in size.
Firefox just handles these fine, rendering only the part of the picture that you are looking at, using level of detail, etc.
This Wuff example assumes that you have enough RAM to decode the whole image at once, which is something that widely-used libraries are not allowed to assume, because they must support from embedded systems with little RAM up to absurdly huge images of our universe.
Given the existence of , I imagine this is doable.
> Did you know you can just fork() the @rustlang compiler from your proc_macro?
> Unfortunately rustc is multithreaded and fork() only forks one thread. We'll just assume the other threads weren't doing anything important.
> In today's rustc, this means we're only missing the main thread in the fork. But that one was only going to call exit() after joining our thread anyway, so not very important. It just means the process always exits with a success status. So instead we grep the output for "error".
Isn't that pretty much what `#![no_std]` is for? You can still use an allocator with no_std, but you have to explicitly import it.
* input buffer -> prepared by caller, must live as long as you are decoding it
* output buffer -> prepared by caller, must live as long decoder outputs to it
* fixed size decoder state -> allocated during initialization
* decoder state with size depending on input -> various compression formats formats typically include information about expected sizes, used dictionary sizes and similar so that required memory for it can be allocated ahead of time. You don't want memory allocations during the hot loop anyway. And if it turns out that content doesn't match sizes from header it's an invalid file.
In fact there are tons of GC implementations across all Java implementations, including reference counting.
For example, "An on-the-fly reference-counting garbage collector for java"
And here is one for Go.
There’s a famous list of NASA’s rules for safety-critical code that you’ll see pop up now and then, that subscribes to the “avoid heap memory allocation” philosophy. https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
It’s not nearly as true anymore today, but working in console games 10-20 years ago, it was very common to try to avoid dynamic memory allocation because it was viewed as slow, unsafe, and unnecessary. A common strategy was to statically pre-allocate a single block of the worst-case memory needed for a given feature, because you need to be prepared to hit the worst case without crashing, so you might as well just reserve the max and focus on reducing what the maximum possible usage is.
I once debugged a crash in a console game that was caused by someone trying to be too clever by half with their allocation and copy constructors. We had a team of a dozen people working on the crash through a weekend during crunch because the crash only occurred in a release build on the console dev kit. The cost of this one stupid bug due to using memory allocation could literally accounted for in the thousands of dollars. I basically had to write a mini one-off release-build stack-tracing sampling debugger in order to trap it. So, anyway, long story short, I’m old enough to appreciate why some people suggest that avoiding allocations is a good thing.
There are vastly fewer interacting elements when you restrict yourself to the stack: basically logic errors and pointer arithmetic become the key concerns (and Wuffs supposedly deals with the latter). Really smart people may be able to completely reason about this space.
Developers are known for imposter syndrome, but we have an unhealthy amount of overconfidence at the same time.
I'm not so sure about the usefulness for an image decoder.
But if you are designing safety critical realtime software for controlling a car, then you kind of want the extra reliability guarantees that avoiding the heap gives you.
> ...Dependent types are a way to implement compile-time bounds checking, but they're not the only way, and there's more to programming languages than their type systems. Wuffs does not use dependent types.
Wuffs looks really interesting! I don't think I've ever even heard of a language that is designed to only be used in an auxiliary role along side another general purpose programming language!
Edit: As pointed out below, scripting languages like Lua are used in an auxiliary role, but in a very different way! Wuffs is designed for use in the deepest most crucial parts of an app, where as scripting languages are typically GC'd and used at the very highest levels, often times by animations, game designers, or even end users!
> A type that depends on another value. For example, a variable n’s type might be “the length of s”, for some other slice-typed variable s. Dependent types are a way to implement compile-time bounds checking, but they’re not the only way, and there's more to programming languages than their type systems. Wuffs does not use dependent types.
Embeddedable scripting languages come to mind. E.g.:
"Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications."
Lua is used at the very top level of a program, sometimes its even user accessible!
Wuffs is the exact opposite, it is designed for use in the deepest levels of a program!
Very cool idea.
It's positioned somewhat separately as you don't codegen from it, but TLA+? It exists to model and verify a program, but you still write the program separately.
Mozilla's imagelib has a neat trick to gain performance while working with smaller memory buffers when you want to render the output image at a smaller resolution than it is encoded. They call it downscale-during-decode.
So this library pulls zlib decompression into the decode loop. Imagelib on the other hand optimizes at the other end of the pipeline and pulls downscaling into the decode loop.
imagelib's optimization has the advantage that it helps with both small images as long as they're not displayed pixel-exact and also makes decoding humungous images feasible without causing the browser to explode with OOMs.
Am I missing something, or is this statement simply wrong? Have not used Go, but Rust checks its stuff at compile time as far as I know.
Some amount of runtime checking is inherent in any program. Even dependently typed programs must, for example, check input at runtime. They can make it easier to have absolutely minimal runtime checks, however. Rust is adding a limited form of dependent types for this reason.
It would be nice if you could pass (n, t) where n is supplied at runtime, the type of t depends on the value of n, and the compiler only lets you use t if your code is valid for all reachable values of n.
eg. fn(n: usize, arr1: &[i32; n], arr2: &[i32, n]) allowed the function body to assume the two slices can be zipped without losing elements.
Note that i don't have enough experience with either dependent types, zz, or wuffs to explain much further.
Const generics aren't dependent types though; you're still dealing with known constants at compile time. For it to be dependent types, you need something like in your latter example, where a type is dependent on an actual _value_ passed to a function at runtime.
Often you can wrangle the optimizer into eliminating all but one bounds check by passing slices instead and then slicing one of the inputs in terms of length of the other. The Zip iterator adapter also has optimizations for the case of two slices where iteration will only require two length queries at the beginning and no further bounds checks.
let a = [0;4];
let a = 128u8 + 128u8;
Edit: actually, it would help if I read the actual quote in your post. I think they’re right in saying that those things are checked at runtime in Rust, though I could be mistaken.
> SMHasher is a test and benchmark suite for a variety of hash function implementations. It can provide data for claims like “our new Foo hash function is faster than the widely used Bar, Baz and Qux hash functions”. However, when comparing Foo to CRC-32, be aware that a SIMD-accelerated CRC-32 implementation can be 47x faster than SMHasher’s simple CRC-32 implementation.
In fact smhasher implements both, the slow soft crc32 he cites, and the fastest crc32_pclmul, which is not just 47x faster, but 5000x faster. Other than wuff's implementation of the crc32_pclmul variant.
crc32 392.10 vs crc32_pclmul 1972140.38 MiB/sec
Ah, I'm based on aappleby/smasher, which is what the original location (http://code.google.com/p/smhasher) redirects to. I didn't realize that you have a more active fork.
> not just 47x faster, but 5000x faster
It's 47x faster:
I'm especially pleased with his approach to practicality re: compile times.
I'm also glad it's Apache2 licensed because I'm going to be reading through its source for inspiration at a future point for reference for my own implementations.
I also see a strong likelihood that I will use it extensively for a few projects I had been working on in embedded for the ESP32 on my mobile device prototype and for non-embedded projects with my PL and otherwise (like for my Neovim UI frontend). I'm so glad this was posted.
I have been playing around with this, and it is fast enough to encode a 1080p+ 8bpp framebuffer in under 5 milliseconds. This is fast enough for interactive applications. Most web browsers likely use something similar to this, so as long as you encode quickly on the server you can expect fast decode on the client.
All you need are 2 streams: 1 from client-to-server w/ all touch/keyboard events, and another from server-to-client with JPEG frames. One could easily build client devices using FPGAs or other integrated technologies with this sort of architecture.
The only caveats are latency & bandwidth requirements, but there are strategies for managing this, as well as problem domains where sufficient constraints exist on both counts (i.e. secure local networks).
I would be interested how much their approach scales in terms of code size, since they use likely SMT solvers to guarantee having no arithmetic overflows (which is more than Rust guarantees).
Ideally as comparison with compile times of Rust programs.
It used to take a couple of seconds to compile Wuffs' standard library. It's back down to a few hundred milliseconds.
At least based on a quick skim of the docs, they use a combination of programmer assertions, interval arithmetic, and their type system for bounds checking and ensuring no arithmetic overflows.
Also loop invariants etc must all be annotated. I would be more interested how they plan to handle IO effects or if they want to omit that.
I'm sorry but I don't understand this statement. Can you re-phrase it?
> I would be more interested how they plan to handle IO effects or if they want to omit that.
I'm not sure exactly what you're asking for, but see the Coroutines, Effects and I/O pages in https://github.com/google/wuffs/tree/main/doc/note
I assume this would not be useful on a web server, unless it could be used by Image-Magick or GD.
Is this targeting Software devs (like Chrome or Affinity Designer)? Is this meant to replace the default decoder somehow, for a user that wants to speed up his system? Is there some other use I am missing?
I didn't really put any effort into optimizing it yet, though it does use miniz_oxide and it decompresses the entire image at once instead of per-scanline.
I'm curious if it is at all competitive with the Wuffs decoder, I'm guessing probably not because I haven't put any effort into using SIMD.
But overall the implementation was pretty straightforward, I just read the RFC and wrote it out over a few weeks. One quirk of my library currently is it always returns an RGBA buffer, instead of potentially letting you save memory on a grayscale or RGB image.
You could try adapting https://github.com/google/wuffs/tree/main/script/bench-rust-...
Note that statements like:
let bytes_per_scanline =
((header.width * header.bit_depth as u32 * header.color_type.sample_multiplier()) + 7) / 8;
can silently overflow a u32 if width is 0x800_0000, bit_depth is 8 and sample_multiplier is 4. Part of the point of Wuffs is to catch things like that (at compile time).
Do you accept PRs for the Rust benchmarks? My code is nowhere near competitive to the Wuffs implementation, but it would be a fun goal to work towards.
> Note that statements like ... can silently overflow a u32
Thank you for catching that, I'll go through the entire codebase and try to fix those up.
That's a part of Rust that always makes me feel uneasy - the compiler will tell you if you're doing math with numbers of different bit widths, but it won't warn you in most cases about overflow. That would be a good addition to the Rust compiler, or at least a Clippy lint.
Sure! I'll take a PR that adds script/bench-rust-png-decoder to the Wuffs repo.
I'll also take patches to script/bench-rust-png. I'm not as fluent in Rust as I am in C/C++ or Go, so I might easily be doing something silly there.
Lossy codecs with lossless modes tend to be a lot more efficient although may be harder to decode.
// image: 1165 x 859 (1943 KB) 24bpp
libpng: 19.2 ms 494.7 ms
wuffs: 14.9 ms 0.0 ms
mango: 10.0 ms 84.7 ms
// image: 312 x 442 (43 KB) 32bpp
libpng: 2.9 ms 18.3 ms
wuffs: 0.4 ms 0.0 ms
mango: 0.1 ms 7.3 ms
// image: 160 x 120 (13 KB) 8bpp
libpng: 0.7 ms 9.5 ms
wuffs: 0.1 ms 0.0 ms
mango: 0.1 ms 3.2 ms
// image: 90 x 112 (23 KB) 24bpp
libpng: 0.9 ms 3.9 ms
wuffs: 0.5 ms 0.0 ms
mango: 0.2 ms 1.3 ms
1) variables have to be declared before the code (I had to do this a long time ago and it was annoying)
2) the separated section for uninitialised variable in struct, why not =undef like all the other languages?
Also having to use base.u32 everywhere instead of just u32 must be annoying fast.
> Note that gcc10 performs slightly faster than clang9 in Wuffs’ benchmark numbers at the top of this post (as well as in an earlier non-SIMD Adler-32 implementation), although clang9 sometimes performs better for the other C mimic libraries.
Aren’t all apps (anything not one-off scripts) performance and security conscious? Being performance and security conscious seems like the rule, not the exception.
Choosy is sort of vtable-ish. It selects between method implementations (which all have the same signature), typically based on cpuid (e.g. if the CPU is SSE4.2+PCLMUL capable, set foo = foo_x86_sse42; if not, use the default foo impl).
up (the private method) just implements update (the public method). Only private methods can be choosy.
Any comparison to "slow" is generally the wrong direction.
Given the project goals, I guess most encoders don't make a lot of sense: For image encoding you basically provide an "x * y * bytes_per_pixel" memory area and an encoder does its magic on that. There isn't really any complicated untrusted input in that case.
However it isn’t obvious whether such companies have any incentive to share their code, even just because the code is too specific to their hardware?
Certainly in number of images (as opposed to number of bytes), those might tilt things the other way.
There isn't one yet, but I'd like to have one at some point.
If you're encoding PNGs from NumPY arrays you don't care at all about safety.
If, for some reason I can't quite comprehend, whatever encoder you have on hand isn't fast enough, you can just disable compression optimizations -- but at that point you may as well be writing out a bitmap.
Sad that it's a Google project, but I can look past every now and again.