Hacker News new | past | comments | ask | show | jobs | submit login
Wuffs’ PNG image decoder (nigeltao.github.io)
462 points by est31 13 days ago | hide | past | favorite | 138 comments





Interesting. First time I've head of Wuffs. As someone that still uses C, this in particular sounds neat:

(from https://github.com/google/wuffs#goals-and-non-goals)

""" Wuffs' goal is to produce software libraries that are as safe as Go or Rust, roughly speaking, but as fast as C, and that can be used anywhere C libraries are used.

[...] Wuffs the Library is available as transpiled C code. Other C/C++ projects can use that library without requiring the Wuffs the Language toolchain. Those projects can use Wuffs the Library like using any other third party C library. """


The language itself is a little unreadable at first glance, but the idea of it is a very good one. sbt [1] is an amazing project for small embeddable toy programs, but using fuzzers rapidly shows how unsafe the code is, and the benchmarks in the original article show the extent of performance compromises made to make it work.

It seems like this would be an interesting approach to a lot of security programming where it involves data structures, since those have typically been a source of issues. Having a memory-safe ASN.1 parser would be really nice, considering how much difficulty that has caused in the security space.

With a lot of security programming there's a reliance on constant-time algorithms, which this may not be well-suited to, however.

[1] https://github.com/nothings/stb


*stb; sbt is something entirely different :)

All of Windows (including the NT kernel) is written using the same basic proof model that Wuffs uses, except that the constraints are specified as annotations on top of C and C++ instead of as a new programming language.

I prefer the annotation approach, TBH, but I'm glad to see people working on safety. Specifically, Wuffs (like Rust) has this problem where it tries to fill the same niche as C and C++ and improve on these older languages in specific ways, but in addition to being different in ways necessary to achieve the language's objective is also different from C and C++ in totally gratuitous ways that are basically just aesthetic differences --- for example, "type var" vs "var: type".

I'm a big believer in technical continuity. IMHO, a lot of recent language developments should have instead been extensions of C, C++, Java, or Python.


> I'm a big believer in technical continuity. IMHO, a lot of recent language developments should have instead been extensions of C, C++, Java, or Python.

If that's what you really wanted, and if we always did that, then you would probably want extensions to Perl instead of Python, since it predated it slightly and was much more popular much earlier.

Instead, I think what you're really asking for is language extensions to the things you are familiar with. Those happen to be things a lot of other people are also familiar with, but that doesn't change that the same strategy results in Perl still getting the lion's share of attention. I, for one, would be perfectly happy with that, as I love Perl, but many people would not, and possibly not for any real technical reason.

Anyone that thinks the case for Rust or D is better served by extensions to C or C++ but thinks that Python is objectively a better language than Perl should probably look deeply at exactly why they hold those views.

P.S. That's not really to say you hold those views. Consider this comment less a critique of your specific statement than slight tangent spurred by it.


Not the OP, but I do share the sentiment of improving existing unsafe languages.

The C and C++ code in many relevant stacks is not going away, and many of us won't be ever be able to use D or Rust regardless of how much we like them.

UNIX clones, Windows, .NET and Java runtime implementation, WinUI/WinRT, Android (regardless of this week's annoucement)

So Microsoft Checked C, Microsoft SAL, Visual C++/clang/gcc lifetime analysers, Frama-C, hardware memory tagging, Pluton sandbox, CHERI,... are all welcomed improvements.


Yes, I'm not against improving existing languages. I just don't think it's always sufficient, so we shouldn't accept the false dichotomy of improving existing languages or creating and promoting the use of new ones to take advantage of new ideas and technologies to their fullest.

Even if Rust were to fully supplant C and C++ for at least systems programming[1], we should hope that within a few decades it's a risk of being supplanted by something else. To want otherwise is likely to inadvertently want stagnation and/or a language that while likely still improving, is having new technologies and concepts bolted on in increasingly convoluted manner, as they were not and could not have been considered when designing the language originally.

1: I think this is unlikely, if only because I don't see one language supplanting C, I see C's market cannibalized by a few different languages, with C always retaining some use.


Yeah, that I can go along with.

> I'm a big believer in technical continuity.

So am I. D is designed to be an easy transition from C and C-With-Classes. For example, here is some code in C that was translated to D (it's part of the Digital Mars C++ compiler):

C:

https://github.com/DigitalMars/Compiler/blob/dmc-cxx/dm/src/...

D:

https://github.com/DigitalMars/Compiler/blob/master/dm/src/d...

They look pretty much the same. The code generated is the same. In those repositories you can also see how I translated the C versions to D with plenty of examples.

The biggest impediment is the C preprocessor. You wouldn't really want to carry that forward.

After removing dependency on the C preprocessor, most of the work is global search/replacing things like `->` to `.`.


I work on (well, mostly near) an extensible C compiler, designed so extension authors can independently create extensions, and users can import them as easily as libraries: https://github.com/melt-umn/ableC/ (and I'd love to answer any questions you have about our approach or code.)

IMO this approach hasn't taken off because maintaining compatibility with C while adding safety (or really just about any property) means implementing your own sublanguage that can't arbitrarily call C functions while maintaining your safety properties. On the other hand, C being able to call into your sublanguage easier is a benefit versus jury-rigging Cargo into your build system (in the case of Rust).

On the other hand, this approach works great for adding extensions that increase the expressive power of C with new abstractions, for example algebraic data types, C++-like templating, etc.


But the C++ spec has become quite unwieldy, so I can understand the desire to start a totally new language.

The previous post on the blog has an observation that applies to languages as well

> Software has a Peter Principle. If a piece of code is comprehensible, someone will extend it, so they can apply it to their own problem. If it’s incomprehensible, they’ll write their own code instead. Code tends to be extended to its level of incomprehensibility.


As info for others, here are the annotations,

https://docs.microsoft.com/en-us/cpp/code-quality/using-sal-...


What tool/framework do they use for annotation and are there standards like SMTlib2, but for annotating those kind of things? Can this tools verify that the annotations are correct wrt the code below or above?


Wuffs touts its lack of memory allocation as a safety feature [0]. Rust could add a `#![forbid(alloc)]` declaration and achieve the same effect. I think this would be a good idea.

Wuffs types are not parameterized with lifetimes, so they may contain references only to items with static lifetime. This means that Wuffs cannot implement linked lists, hash tables, or other common data structures. To support those structures, Wuffs will need to either add lifetime parameters and a borrow-checker (like Rust) or add reference counting (like Rust `Arc`, C++ `shared_ptr`, Golang, Java, etc).

[0] https://github.com/google/wuffs/blob/v0.3.0-beta.1/doc/note/...


That is true, but on the other hand, Wuffles allows safe unchecked array indexing, which is a very hard problem that Rust doesn't attempt to tackle--in part because it spends most of its complexity budget on the borrow checker! I for one would be extremely interested in using Wuffles in the inner loop of many routines I currently write in Rust which are performance critical, and for which I know bounds checking is hurting performance (sometimes in ways it's hard to convince LLVM to remove), but for which I can't justify the use of `unsafe`. I would particularly love it as an embedded language, rather than one needing a full-blown application.

Generational indices allow safe unchecked array indexing in Rust.

There are many papers that cover how to do this, starting with Alexis Beissegners thesis, up to the latest Rust belt project paper on GhostCell.

There are also many libraries implementing this, so that you don't have to do it yourself.

In practice, however, like the Wuffs project says, most code doesn't care about this. You only need this for very specific types of code, but if you need it, you have it in Rust.

---

On the other hand, I've asked above what happens if you try to open a png that doesn't fit in RAM, like one of those huge space photos that are >1 Tb in size.

Firefox just handles these fine, rendering only the part of the picture that you are looking at, using level of detail, etc.

This Wuff example assumes that you have enough RAM to decode the whole image at once, which is something that widely-used libraries are not allowed to assume, because they must support from embedded systems with little RAM up to absurdly huge images of our universe.


> I would particularly love it as an embedded language, rather than one needing a full-blown application.

Given the existence of [0], I imagine this is doable.

[0]: https://docs.rs/inline-python/0.6.0/inline_python/


That inline Python is the closest thing to black magic I've seen in a while. I guess I have to find an excuse to play around with Rust some time soon, it seems like such a cool ecosystem.

If you like that sort of thing you'll enjoy following the creater on twitter. Here's another of her creations

> Did you know you can just fork() the @rustlang compiler from your proc_macro?

> Unfortunately rustc is multithreaded and fork() only forks one thread. We'll just assume the other threads weren't doing anything important.

> In today's rustc, this means we're only missing the main thread in the fork. But that one was only going to call exit() after joining our thread anyway, so not very important. It just means the process always exits with a success status. So instead we grep the output for "error".

https://mobile.twitter.com/m_ou_se/status/136884418660735795...


> Rust could add a `#![forbid(alloc)]` declaration and achieve the same effect.

Isn't that pretty much what `#![no_std]` is for? You can still use an allocator with no_std, but you have to explicitly import it.

https://docs.rust-embedded.org/book/intro/no-std.html


It seeems like Wuffs is not really designed to support dynamic memory allocation at all, which is typical for usecases like oneshot parsers.

As I understand wuffs goal is to be good enough for parsers coders decoders. It looks like they can still achieve that by sacrificing some of the nice features of other languages.

Rust already has a "mode" that forbids allocations: no-std without the alloc crate.

Goals and Non-Goals explain that wuffs is not a general purpose programming language. If all the allocations are done by code invoking the function implemented in wuffs there is no point analyzing lifetimes. For the intended usecase lifetime of the objects and their lifetimes fall into following categories:

* input buffer -> prepared by caller, must live as long as you are decoding it

* output buffer -> prepared by caller, must live as long decoder outputs to it

* fixed size decoder state -> allocated during initialization

* decoder state with size depending on input -> various compression formats formats typically include information about expected sizes, used dictionary sizes and similar so that required memory for it can be allocated ahead of time. You don't want memory allocations during the hot loop anyway. And if it turns out that content doesn't match sizes from header it's an invalid file.


Go & Java don't do reference counting. They have a GC instead.

Wrong, reference counting is a GC algorithm, and both language specifications don't state what exact algorithm is to be implemented.

In fact there are tons of GC implementations across all Java implementations, including reference counting.

For example, "An on-the-fly reference-counting garbage collector for java"

https://dl.acm.org/doi/10.1145/1111596.1111597

And here is one for Go.

https://github.com/sendilkumarn/gopurerc


IMHO if Rust had a "proper" way to transpile to C, then a "C-style subset" of Rust would be a perfect library-implementation language and something like Wuff wouldn't be needed.

MISRA-C(++), Ada/SPARK also forbid it, yet they are there powering lots of embedded systems, there is a lot of stuff that can still be implemented with such restrictions.

Isn't that stupidly limiting?

Some people consider avoiding heap allocations to be smartly limiting. ;) It’s not really that limiting either, when you learn the techniques for it.

There’s a famous list of NASA’s rules for safety-critical code that you’ll see pop up now and then, that subscribes to the “avoid heap memory allocation” philosophy. https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...

It’s not nearly as true anymore today, but working in console games 10-20 years ago, it was very common to try to avoid dynamic memory allocation because it was viewed as slow, unsafe, and unnecessary. A common strategy was to statically pre-allocate a single block of the worst-case memory needed for a given feature, because you need to be prepared to hit the worst case without crashing, so you might as well just reserve the max and focus on reducing what the maximum possible usage is.

I once debugged a crash in a console game that was caused by someone trying to be too clever by half with their allocation and copy constructors. We had a team of a dozen people working on the crash through a weekend during crunch because the crash only occurred in a release build on the console dev kit. The cost of this one stupid bug due to using memory allocation could literally accounted for in the thousands of dollars. I basically had to write a mini one-off release-build stack-tracing sampling debugger in order to trap it. So, anyway, long story short, I’m old enough to appreciate why some people suggest that avoiding allocations is a good thing.


Depends. AFAICT it’s designed to be embedded in a c project; in that context, it’s merely saying “the wuffs part does not allocate; do that in the c code”.

Makes sense.

Seems like an honest question. "Stupidly" is a key word there because humans are, without exception, vastly too stupid to deal with the combinatorial explosion that the heap presents. This is a key goal of languages like Rust: allow the compiler to deal with the complexity for you.

There are vastly fewer interacting elements when you restrict yourself to the stack: basically logic errors and pointer arithmetic become the key concerns (and Wuffs supposedly deals with the latter). Really smart people may be able to completely reason about this space.

Developers are known for imposter syndrome, but we have an unhealthy amount of overconfidence at the same time.


Yes.

I'm not so sure about the usefulness for an image decoder.

But if you are designing safety critical realtime software for controlling a car, then you kind of want the extra reliability guarantees that avoiding the heap gives you.


After NoSQL, now we also have NoAlloc :)

You should have a look at Frama-C (https://frama-c.com/) as well for this kind of stuff.

I don't understand why he just doesn't use Rust instead. The speed difference can't be that big.

From reading the docs, one main reason is that they want the output of compilation to be plain C. The Wuffs toolchain source-to-source compiles to plain C source, which can then be compiled and linked anywhere that has a C compiler and/or an FFI to the C ABI, which is a lot of places. Wuffs verifies its safety properties entirely at compile time, so once verified, it can translate the code to completely unchecked C code that is nonetheless known to be safe.

My favorite part of the Wuffs github page has to be the definition it gives for Dependent Types

> ...Dependent types are a way to implement compile-time bounds checking, but they're not the only way, and there's more to programming languages than their type systems. Wuffs does not use dependent types.

Wuffs looks really interesting! I don't think I've ever even heard of a language that is designed to only be used in an auxiliary role along side another general purpose programming language!

Edit: As pointed out below, scripting languages like Lua are used in an auxiliary role, but in a very different way! Wuffs is designed for use in the deepest most crucial parts of an app, where as scripting languages are typically GC'd and used at the very highest levels, often times by animations, game designers, or even end users!


I don't like that your quote elided the actual definition of dependent types. It reflects badly on the project for people who didn't bother to read the full quote. For completeness, this seems to be the full quote:

> A type that depends on another value. For example, a variable n’s type might be “the length of s”, for some other slice-typed variable s. Dependent types are a way to implement compile-time bounds checking, but they’re not the only way, and there's more to programming languages than their type systems. Wuffs does not use dependent types.


Ragel might be a related example: http://www.colm.net/open-source/ragel/

>I don't think I've ever even heard of a language that is designed to only be used in an auxiliary role along side another general purpose programming language!

Embeddedable scripting languages come to mind. E.g.:

"Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications."


Ah, of course. But opposite use case!

Lua is used at the very top level of a program, sometimes its even user accessible!

Wuffs is the exact opposite, it is designed for use in the deepest levels of a program!

Very cool idea.


There is T0, part of BearSSL. This presentation has an explanation of it (starts slide 19):

https://t1lang.github.io/NorthSec-20190516.pdf


Ah - forgot about this. Thx for posting this. These little languages really keep the system footprint tiny compared to OpenSSL.

Halide (https://halide-lang.org/) is one example, designed for implementing low-level image processing functions.

Futhark is a auxillary language that targets parallel processors (e.g. GPUs) to be used alongside a general purpose language.

Wasm does at runtime what Wuffs does at the compile time for a typical slowdown of 2-3 times. As with Wuffs there is no allocation in the basic Wasm, the program works on pre-allocated buffers.

You can grow wasm memory while running a wasm module, can't you?

> Wuffs looks really interesting! I don't think I've ever even heard of a language that is designed to only be used in an auxiliary role along side another general purpose programming language!

It's positioned somewhat separately as you don't codegen from it, but TLA+? It exists to model and verify a program, but you still write the program separately.


It compiles to C.

I don't think TLA+ compiles to C, not as part of the normal usage / workflow.

Right, but Wuffs does. I suspect I misunderstood your previous comment.

lex, yacc?

Regex?

> decompressing the entire image all-at-once (into one large intermediate buffer) instead of one row at a time (into smaller, re-usable buffers). All-at-once requires more intermediate memory but allows substantially more of the image to be decoded in the zlib-decompressor’s fastest code paths.

Mozilla's imagelib has a neat trick to gain performance while working with smaller memory buffers when you want to render the output image at a smaller resolution than it is encoded. They call it downscale-during-decode.

So this library pulls zlib decompression into the decode loop. Imagelib on the other hand optimizes at the other end of the pipeline and pulls downscaling into the decode loop.

imagelib's optimization has the advantage that it helps with both small images as long as they're not displayed pixel-exact and also makes decoding humungous images feasible without causing the browser to explode with OOMs.


> Also, unlike Go or Rust, Wuffs’ memory safety is enforced at compile time, not by inserting runtime checks that e.g. the i in a[i] is within bounds or that (x + y) doesn’t overflow a u32.

Am I missing something, or is this statement simply wrong? Have not used Go, but Rust checks its stuff at compile time as far as I know.


Rust does as much as possible at compile time, and some of its most famous features are entirely at compile time, but it will also use runtime checks where appropriate.

Some amount of runtime checking is inherent in any program. Even dependently typed programs must, for example, check input at runtime. They can make it easier to have absolutely minimal runtime checks, however. Rust is adding a limited form of dependent types for this reason.


I've not been following recent Rust development as closely, can you elaborate on what limited form of dependent types Rust is adding?

Functions and types can take integers as monomorphization-time template parameters (const generics).

It would be nice if you could pass (n, t) where n is supplied at runtime, the type of t depends on the value of n, and the compiler only lets you use t if your code is valid for all reachable values of n.

eg. fn(n: usize, arr1: &[i32; n], arr2: &[i32, n]) allowed the function body to assume the two slices can be zipped without losing elements.

Note that i don't have enough experience with either dependent types, zz, or wuffs to explain much further.


> Functions and types can take integers as monomorphization-time template parameters (const generics).

Const generics aren't dependent types though; you're still dealing with known constants at compile time. For it to be dependent types, you need something like in your latter example, where a type is dependent on an actual _value_ passed to a function at runtime.


> eg. fn(n: usize, arr1: &[i32; n], arr2: &[i32, n]) allowed the function body to assume the two slices can be zipped without losing elements.

Often you can wrangle the optimizer into eliminating all but one bounds check by passing slices instead and then slicing one of the inputs in terms of length of the other. The Zip iterator adapter also has optimizations for the case of two slices where iteration will only require two length queries at the beginning and no further bounds checks.


If the error is flagrant enough e.g.

    let a = [0;4];
    println!("{}", a[5]);
or

    let a = 128u8 + 128u8;
the compiler may tell you, but in general you can add any two `u8` or index an array with any `usize`, the issues will be checked for at runtime (or not at all for the overflow in release mode, by default).

Yes, Rust checks a lot of things at compile time, but not all. For example, dynamic bounds checks (indexing into a Vec or slice).


This is true for Rust in most cases but with const generics and fixed-size arrays, Rust can check those as part of the type system at compile time.

Rust will sometimes have to insert implicit runtime checks. Wuffs appears to require the user to insert explicit runtime checks when needed (it will reject any flow that could possibly overflow, but if you check for overflow it's smart enough to allow that).

Statement is correct, Rust does insert runtime bounds checking when needed.

Rust does use runtime checks for overflows and will panic if an overflow happens. In cases where you don't care or where it's intentional you can use `overflowing_add()` and its variants.

... in debug builds only, by default. In release, by default, there are no checks, and you get two’s compliment overflow, though the correct thing to do is use those variants, yes.

Both of these checks happen at runtime in Rust. Overflow checks in particular only happen in debug mode, not in release mode.

* if llvm does not optimize them out

This was my understanding as well, at least for the majority of memory related tasks.

Edit: actually, it would help if I read the actual quote in your post. I think they’re right in saying that those things are checked at runtime in Rust, though I could be mistaken.


How would Rust check at compile-time whether an array index read from a file on disk (at runtime) is out of bounds?

Looks good, but has somewhat widely false claims:

> SMHasher is a test and benchmark suite for a variety of hash function implementations. It can provide data for claims like “our new Foo hash function is faster than the widely used Bar, Baz and Qux hash functions”. However, when comparing Foo to CRC-32, be aware that a SIMD-accelerated CRC-32 implementation can be 47x faster than SMHasher’s simple CRC-32 implementation.

In fact smhasher implements both, the slow soft crc32 he cites, and the fastest crc32_pclmul, which is not just 47x faster, but 5000x faster. Other than wuff's implementation of the crc32_pclmul variant.

Compare https://github.com/rurban/smhasher/#smhasher

    crc32  392.10 vs crc32_pclmul 1972140.38 MiB/sec
Reminds me a lot on ATS

> smhasher implements both

Ah, I'm based on aappleby/smasher, which is what the original location (http://code.google.com/p/smhasher) redirects to. I didn't realize that you have a more active fork.

> not just 47x faster, but 5000x faster

It's 47x faster: https://github.com/rurban/smhasher/issues/189


Thanks a lot for detecting this bug! Nigel's numbers were right, mine were wrong

That number doesn't seem realistic. Even for future generations of ram and PCIe the bandwidth is on the order of 50GB/s not 2TB/s .

The author is now a new favorite person of mine. I've been working on my own language, and I have an incredible overlap with the features that Wuff has, specifically the coroutines, refinement types, the "facts" (which I had been inspired to implement from ATS with its proof threading), the io_buffer slice type, and the effects system. It's kind of shocking how much stuff there is on overlap, but it does give a good indication to me that this is a "good idea."

I'm especially pleased with his approach to practicality re: compile times.

I'm also glad it's Apache2 licensed because I'm going to be reading through its source for inspiration at a future point for reference for my own implementations.

I also see a strong likelihood that I will use it extensively for a few projects I had been working on in embedded for the ESP32 on my mobile device prototype and for non-embedded projects with my PL and otherwise (like for my Neovim UI frontend). I'm so glad this was posted.


Thanks for the kind words!

Another tool along these lines is Galois' Ivory language https://ivorylang.org/ , a Haskell-embedded language for writing safe/reliable C.

You might be interested in cakeml and cogent. What Safety Integrity Level can ivory guarantee? level2?

CakeML is interesting, but I don't think any language can guarantee a SIL.

Another one is verifast.

I know there are information theory tradeoffs, but if you want to go really fast, use JPEG. Specifically, LibJpegTurbo.

I have been playing around with this, and it is fast enough to encode a 1080p+ 8bpp framebuffer in under 5 milliseconds. This is fast enough for interactive applications. Most web browsers likely use something similar to this, so as long as you encode quickly on the server you can expect fast decode on the client.


Interesting! Do you have an example of how such a thing could be used in an interactive language?

Think about extreme forms of server-side rendered applications where you need to keep the client device as simple, cheap and locked-down as possible.

All you need are 2 streams: 1 from client-to-server w/ all touch/keyboard events, and another from server-to-client with JPEG frames. One could easily build client devices using FPGAs or other integrated technologies with this sort of architecture.

The only caveats are latency & bandwidth requirements, but there are strategies for managing this, as well as problem domains where sufficient constraints exist on both counts (i.e. secure local networks).


Are there any compiletime benchmarks?

I would be interested how much their approach scales in terms of code size, since they use likely SMT solvers to guarantee having no arithmetic overflows (which is more than Rust guarantees). Ideally as comparison with compile times of Rust programs.


> Are there any compiletime benchmarks?

It used to take a couple of seconds to compile Wuffs' standard library. It's back down to a few hundred milliseconds.

https://nigeltao.github.io/blog/2020/dumbindent.html#compila...


Wuffs does not use a SMT solver. https://github.com/google/wuffs/blob/main/doc/note/assertion... explains.

> since they use likely SMT solvers to guarantee having no arithmetic overflows

At least based on a quick skim of the docs, they use a combination of programmer assertions, interval arithmetic, and their type system for bounds checking and ensuring no arithmetic overflows.


Checked as well. They can only do this, because they require that heap memory owned by pointers does not get fragmented. So programmers must manage pointer offsets to each heap structure themself. Hence you never get potential pointer soup (pointers pointing to subparts and pointing to other things), which Rust resolves with pointer lifetime checking.

Also loop invariants etc must all be annotated. I would be more interested how they plan to handle IO effects or if they want to omit that.


> programmers must manage pointer offsets to each heap structure

I'm sorry but I don't understand this statement. Can you re-phrase it?

> I would be more interested how they plan to handle IO effects or if they want to omit that.

I'm not sure exactly what you're asking for, but see the Coroutines, Effects and I/O pages in https://github.com/google/wuffs/tree/main/doc/note


Nice. Still putting everything into memory make algorithm go brrrr is not too surprising :-). I'm really happy that folks continue to improve it though.

Can someone please explain usage?

I assume this would not be useful on a web server, unless it could be used by Image-Magick or GD.

Is this targeting Software devs (like Chrome or Affinity Designer)? Is this meant to replace the default decoder somehow, for a user that wants to speed up his system? Is there some other use I am missing?


I recently wrote a Rust PNG decoder mostly for use in no_std contexts, and it was motivated out of wanting to decode emoji for a font rasterizer.

I didn't really put any effort into optimizing it yet, though it does use miniz_oxide and it decompresses the entire image at once instead of per-scanline.

I'm curious if it is at all competitive with the Wuffs decoder, I'm guessing probably not because I haven't put any effort into using SIMD.

https://github.com/bschwind/png-decoder

But overall the implementation was pretty straightforward, I just read the RFC and wrote it out over a few weeks. One quirk of my library currently is it always returns an RGBA buffer, instead of potentially letting you save memory on a grayscale or RGB image.


> I'm curious if it is at all competitive with the Wuffs decoder

You could try adapting https://github.com/google/wuffs/tree/main/script/bench-rust-...

> https://github.com/bschwind/png-decoder

Note that statements like:

let bytes_per_scanline = ((header.width * header.bit_depth as u32 * header.color_type.sample_multiplier()) + 7) / 8;

can silently overflow a u32 if width is 0x800_0000, bit_depth is 8 and sample_multiplier is 4. Part of the point of Wuffs is to catch things like that (at compile time).


> You could try adapting ...

Do you accept PRs for the Rust benchmarks? My code is nowhere near competitive to the Wuffs implementation, but it would be a fun goal to work towards.

> Note that statements like ... can silently overflow a u32

Thank you for catching that, I'll go through the entire codebase and try to fix those up.

That's a part of Rust that always makes me feel uneasy - the compiler will tell you if you're doing math with numbers of different bit widths, but it won't warn you in most cases about overflow. That would be a good addition to the Rust compiler, or at least a Clippy lint.


> Do you accept PRs for the Rust benchmarks?

Sure! I'll take a PR that adds script/bench-rust-png-decoder to the Wuffs repo.

I'll also take patches to script/bench-rust-png. I'm not as fluent in Rust as I am in C/C++ or Go, so I might easily be doing something silly there.


Wuffs == Wrangling Untrusted File Formats Safely it appears

The check-summing in PnG is poorly designed. It uses 2 different checksum on top of each other. The format itself checksum each chunk, but then the pixel data is using zlibs compression header that have a second checksum on top of that, using a different algorithm.

Most everything about PNG is poorly designed. Using a text codec to encode image data doesn't make any sense even if you pre-filter it. A consequence of this is how much larger PNG files get with alpha, even if the alpha channel has no information in it.

Lossy codecs with lossless modes tend to be a lot more efficient although may be harder to decode.


why did the title change on this submission? is it not the fastest in the world?

The title changed multiple times. I've seen at least 3 in the last 30 minutes (safest, fastest and now this one).

why? usually I only see this happen for misleading headlines and appending "(year)" to old articles that might be construed as recent. what's misleading about the actual article's headline? as far as I can tell it seems to be true.

The author of Mango posted a benchmark on Reddit: https://www.reddit.com/r/programming/comments/mld1ob/the_fas...

                  load         save             
  ----------------------------------------------
  // image: 1165 x 859 (1943 KB) 24bpp
  libpng:       19.2 ms     494.7 ms 
  wuffs:        14.9 ms       0.0 ms 
  mango:        10.0 ms      84.7 ms 

  // image: 312 x 442 (43 KB) 32bpp
  libpng:        2.9 ms      18.3 ms 
  wuffs:         0.4 ms       0.0 ms 
  mango:         0.1 ms       7.3 ms 

  // image: 160 x 120 (13 KB) 8bpp
  libpng:        0.7 ms       9.5 ms 
  wuffs:         0.1 ms       0.0 ms 
  mango:         0.1 ms       3.2 ms 

  // image: 90 x 112 (23 KB) 24bpp
  libpng:        0.9 ms       3.9 ms 
  wuffs:         0.5 ms       0.0 ms 
  mango:         0.2 ms       1.3 ms

Ah cool, it would be neat to see a response from OP about this!

Interesting 'single purpose' language though I'm surprised that

1) variables have to be declared before the code (I had to do this a long time ago and it was annoying)

2) the separated section for uninitialised variable in struct, why not =undef like all the other languages?

Also having to use base.u32 everywhere instead of just u32 must be annoying fast.


Fun fact, author is Terence Tao’s brother.

Despite the intent to push “wuffs a 21st century programming language”, nothing here is presented that couldn’t be achieved by just applying the same limitations/optimizations to the current libpng code.

That could get to parity with "Fastest", but it wouldn't match it with "Safest".

The README on GitHub looks useful:

https://github.com/google/wuffs


So what happens if you try to decode a really big PNG, e.g., like any of the > 1Tb large pngs of space photos that are around?

First time hearing about Wuffs. It says Wuffs compiles to C. In this case, will the choice of C compiler affect the performance?

Yes:

> Note that gcc10 performs slightly faster than clang9 in Wuffs’ benchmark numbers at the top of this post (as well as in an earlier non-SIMD Adler-32 implementation), although clang9 sometimes performs better for the other C mimic libraries.


> Wuffs is not a general purpose programming language. It is for writing libraries, not programs. The idea isn't to write your whole program in Wuffs, only the parts that are both performance-conscious and security-conscious.

Aren’t all apps (anything not one-off scripts) performance and security conscious? Being performance and security conscious seems like the rule, not the exception.


Generally code that deals with untrusted input often needs to balance the scale more towards security than other parts of the program might.

What does `choosy_up` stand for? Looks like choosy stands for vtable, and up for parent, right?

Sorry, that part is relatively new and poorly documented.

Choosy is sort of vtable-ish. It selects between method implementations (which all have the same signature), typically based on cpuid (e.g. if the CPU is SSE4.2+PCLMUL capable, set foo = foo_x86_sse42; if not, use the default foo impl).

up (the private method) just implements update (the public method). Only private methods can be choosy.


Surprisingly, Go's png decoder is half as slow as the default libpng :(

https://nigeltao.github.io/blog/2021/fastest-safest-png-deco...


Ugh, what does "half as slow" mean? Twice as fast? Half the speed?

Any comparison to "slow" is generally the wrong direction.


This isn't that surprising. In general Go is about half the speed of C to barring specifically optimized code or calling out to a faster library this would be expected.

I wonder if the same performance benefits apply to Encoding PNGs.

Looks like there isn't an encoder for PNG: https://github.com/google/wuffs/tree/main/std/png

Given the project goals, I guess most encoders don't make a lot of sense: For image encoding you basically provide an "x * y * bytes_per_pixel" memory area and an encoder does its magic on that. There isn't really any complicated untrusted input in that case.


The performance tradeoff isn't as justifiable, either: most images are decompressed orders of magnitude more times than they're compressed (generally just once).

I presume there are companies that do a lot of image compression which have the financial incentive to employ people to improve the compression efficiency. Either because of the CPU costs where the company will get a return on engineering hours to improve the speed of the algorithm; or because of reduced bandwidth costs where the company has an incentive to reduce the size of the compressed image even if the CPU usage might be higher.

However it isn’t obvious whether such companies have any incentive to share their code, even just because the code is too specific to their hardware?


I’m not sure I would take that bet. How much security camera footage is compressed, recorded, never looked at, and discarded after x days?

Certainly in number of images (as opposed to number of bytes), those might tilt things the other way.


optipng and pngcrush come to mind, but in the spirit of big O notation a run through those might be considered one macro-encoding process.

> Looks like there isn't an encoder for PNG

There isn't one yet, but I'd like to have one at some point.

https://github.com/google/wuffs/blob/main/doc/roadmap.md


Darn, I had a handful of NumPy arrays which could use a fast encoder.

But I'm sure there are plenty of fast encoders? Wuffs isn't the path to a fast encoder. It's a path to something safe that is also pretty fast.

If you're encoding PNGs from NumPY arrays you don't care at all about safety.

If, for some reason I can't quite comprehend, whatever encoder you have on hand isn't fast enough, you can just disable compression optimizations -- but at that point you may as well be writing out a bitmap.


semi-related since the post is talking about Rust, oxipng is a decently fast PNG optimizer, albeit not an encoder per se: https://github.com/shssoichiro/oxipng

So will it makes its way into browsers?

Isn't imagelib already very good and does a few special things like dynamic downscale?

Chrome and fuchsia. So to your future phone also

Wuffs sounds really cool, surprised I haven't heard about it. It claims memory safety but I'd be interested in how it has proven this - is it verified?

Sad that it's a Google project, but I can look past every now and again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: