Hacker News new | comments | ask | show | jobs | submit login
Parallelizing PNG: choosing Rust for mtpng (brionv.com)
137 points by fanf2 5 months ago | hide | past | web | favorite | 48 comments

,,I ended up not using the fancy iterator systems (yet?) but the ThreadPool is perfect.''

Please give Rayon iterators another try instead of using thread pool + message passing, it's one of the best features of Rust.

> Please give Rayon iterators another try (...) it's one of the best features of Rust

Isn't it possible to implement these iterators in any language? Why are they a feature of specifically Rust?

Sure, but when you use a parallel iterator like Rayon in Rust, memory and data-race safety is guaranteed at compile time. And using Rayon is as simple as changing “.iter()” to “.par_iter()” in your code thanks to the trait system.

Not that you lose that safety doing the thread pools and such manually, it’s just unnecessary.

The first time I added rayon to a project and switched my main `iter` to `par_iter` and it compiled with no other changes was decidedly creepy. That particular code didn't run any faster, but the fact that it was still guaranteed to be safe just felt so counter to everything the last 10 years of programming had taught me. Rust is not a perfect language (whatever that means), but more so than any language I've ever used it gives me these strange sort of mind-blowing moments.

Is there any reason to use .iter() over .par_iter()? Does the latter have higher constant costs that only amortize on lists larger than some value? Or could you just replace .iter() with .par_iter() under the hood and reap the benefits for free?

I guess if you are iterating over something declared unsafe? But the compiler should be able to recognize that and fall back to the linear version automatically.

Rayon has relatively small overhead, but that doesn't mean you can find'n'replace iter() with par_iter():

- not every collection/combination of iterators supports parallel work. For some it doesn't make sense, for some it's just not implemented.

- while Rust prevents data races, it can't prevent deadlocks, so you need to be mindful of locks.

- .iter() is allowed to mutate state outside of the iterator and use non-thread-safe objects, but .par_iter() isn't (without explicit synchronization primitives). So par_iter() will force you to use more expensive atomic and mutex-locked objects.

For data-bound problems most of the time it just works great. I've ran into problems with some networking libraries having global state, which prevented use of par_iter() (but no crashes - caught at compile time!)

There are a lot of cases where using par_iter wouldn't be correct even without unsafe. If you are doing anything with side-effects par_iter would execute the side effects out of order.

iter is in the standard library, and par_iter isn't, for one.

I'm not quite sure about the constant cost thing, but I would imagine that it's true, as it's inherently doing more work.

Rust’s concurrency guarantees make it much easier to not introduce problems. The API can be similar in other languages, but not copied directly, as those guarantees are encoded in the type signature of said API.

Ok, but if one function you call uses unsafe Rust, then it might break down. Which isn't a problem, except if you don't know that it happens. Would Rust warn about that? Just curious.

It would not warn; that's not how unsafe works in Rust.

Or rather, if you are calling something unsafe, you have to write "unsafe { }". That's a sort of "warning" in a sense. But if that unsafe code is wrapped in a safe interface, it's that code's responsibility to be correct. Bugs do happen!

Ok, I guess it would be nice then if code which is unsafe but still thread-safe can be marked as such, and Rust should give a warning if used incorrectly. That way, you can parallelize any piece of code without having to think about it.

Almost all code ends up going through unsafe, as calling into the OS to do any I/O ends up being unsafe. A warning like this would warn on almost all useful programs, making the warning useless.

You already don't need to think about it. If you're not calling unsafe yourself, then it's not your job.

What I meant is that sometimes, when you have to write unsafe code, it's easier to make it not thread-safe. Then later on, someone uses that piece of code and tries to parallelize it, and runs into problems. This can be prevented with language support.

Rust does disallow you from doing some kinds of unsafe operations between threads, unless you explicitly opt in.

That said, unsafe is not generally more convenient than safe and so people empirically don’t do this in the first place.

Let me ask you then: would you use a Rust library in multithreaded code, without checking if that library is thread-safe? Wouldn't you like your compiler to tell you?

All the time.

The compiler does tell you if it’s thread safe. That’s the point.

There is so called marker traits Send and Sync that are used to track whether something is tread-safe or not. https://doc.rust-lang.org/beta/nomicon/send-and-sync.html Regardless of whether you use unsafe or not, thread safety is tracked with these and they allow the compiler to detect and prohibit data races.

It would be better to use a real concurrency-safe language like pony or distributed pascal for the fast ones, or e.g. clojure or perl6, ..., and not rust which requires a mutex on all stateful objects. But hype-driven worse is better always succeeds.

Rust does not require a mutex on all stateful objects.

either a mutex or a rwlock. both are a sign of bad concurrency design and dead locks. why dont you fix your docs finally regarding your wrong claim of "fearless" concurrency. they are still many people believing your hype.

Not even that! It requires that you prevent data races. Scopes threads, for example, can have no synchronization primitives at all! When you do need one, there are far more options than Mutex or RwLock, you have atomics, for example. Someone made a monitor the other day, I think. Really, anything works.

That's what I meant, totally concurrency unsafe. A proper and "fearless" concurrency safe language is thread-safe without manual mutex/locks/atomics/semaphores.

Rust also supports atomics, plus plain access of immutable data.

I really enjoy posts like these. It’s pleasant to read and gives you some insight into what people do to get their idea materialised.

> But it adds the C++ standard library as a dependency, which might or might not fly for GNOME.

You can also statically link C++'s standard library, just as you do now with Rust's.

This might not always be an option. GCC's libstdc++ is actually under the GPLv3[1]. There's an exception for code that's done via some Eligible Compilation Process, but I'm not sure if this also counts static linking of the libstdc++ into it. That said I'd be a little surprised if statically linking changed things here, I don't see any language about it like there is in the LGPL since it also considers replacement of the LGPL code without needing to touch the original inputs.

[1] https://gcc.gnu.org/onlinedocs/libstdc++/manual/license.html

It isn't under plain GPLv3, but a license which includes the following exception:

> When you use GCC to compile a program, GCC may combine portions of certain GCC header files and runtime libraries with the compiled program. The purpose of this Exception is to allow compilation of non-GPL (including proprietary) programs to use, in this way, the header files and runtime libraries covered by this Exception.

See https://gcc.gnu.org/onlinedocs/libstdc++/manual/license.html

Does that include libstdc++ or just libgcc? In any case, I guess you could always use libc++ instead.

As the page says, that license and exception apply to libstdc++. I don't know if they apply to libgcc.

GCC is not the only C++ compiler around.

> Rust’s type system is powerful, with a generics system that’s in some ways more limited than C++ templates but much easier to grok.

I'm definitely not experienced enough in C++ to know the details around this but I would love to hear about it from someone who is. I know some of the limitations of Rust's current type system but there is a decent amount of work underway to implement things like constant generics.

There's two core differences.

The first is what happens when you get something wrong. Let's say you write a templated function. This is Rust syntax, mapping it to C++ is left as an exercise for the reader:

  fn foo<T>(x: T) {
Rust checks the types before expansion, not after. So you get this error:

  error[E0599]: no method named `bar` found for type `T` in the current scope
   --> src/lib.rs:2:9
  2 |       x.bar();
    |         ^^^
In C++, this stuff is checked after expansion, so if you only pass things that have bar to foo, you're all good! It will compile. But when you pass something that doesn't, you'll get an error then.

This is a restriction, but one that leads to better error messages, and stronger checks. You'd need to write

  fn foo<T: Bar>(x: T)
where Bar is a trait that provides a bar method.

The second difference is what is allowed in generics: Rust only lets you use type parameters. We have accepted an RFC to allow constant expressions (the most straightforward of which is 'integers'), but it hasn't been implemented yet. C++ lets you do this today https://stackoverflow.com/questions/499106/what-does-templat... and https://en.cppreference.com/w/cpp/language/template_paramete... (they also have "template template parameters" aka higher kinded types https://en.cppreference.com/w/cpp/language/template_paramete...)

It is worth adding that the C++ community has wanted to add something called "concepts" since a very long time. Rust traits sounds similar. So C++ will likely move in the direction of Rust here.

Yes. They're similar in ways, but also very different. I know that they have been added to the C++20 draft, but haven't gotten a chance to really dig in yet.

C++ templates are weird. They are kind of like a mix between macros and the kind of generic types you would see in other languages. As a macro system, you can use templates to "generate code" but in a very limited way, it's nowhere near as expressive as a full macro system like you'd see in Lisp, Rust, or Haskell. As system for creating generic types, templates are too general-purpose... they give you a lot of power to underspecify things, and you can't typecheck a template until you instantiate it (like a dynamically typed program).

It interacts in strange ways with operator overloading, namespace resolution, and other language features like constructors. There are all sorts of template helpers that are necessary but leave you scratching your head until you actually see them in use and find the corner case that they're needed for, like std::declval. There are also all the rules about ADL that interact with templates, and then the cherry on top is SFINAE which lets you control whether a problem with a template is an error or not.

Simple templates are not too hard to grok, but writing actual generic types ends up being painful due to the proliferation of corner cases you have to consider, and there are a lot of things that you could do with a macro system that are frustrating and difficult with templates, but since they're "almost possible" you end up contorting the code a little bit and sprinkling on some ad-hoc helper definitions until it works.

It's better than the C preprocessor but worse than almost everything else.

Languages like Lisp have proper macros, so anything you want to do with a template you can just achieve by defining it in a macro... you have more rope to shoot yourself in the foot with, but the rules for Lisp macros are straightforward and easier to understand (you just write a function that spits out code, basically).

Languages like C#, Java, Haskell, etc. have better generic types, so you can do something like say "this function takes an array of type T[], and T must implement the Widget interface, it will give you back a single T". In C++ you can accomplish this goal but you can't as easily encode the constraint that "T must be a Widget", you have to do something weird like put a static assert in the function body, or put a std::enable_if in the type, or accept that there will be template substitution errors with possibly long and difficult-to-read error messages.

Mind you, this has really underscored (for me) the amazing developments in our understanding of language design over the past decades. C has preprocessor macros which are barely serviceable and impossible to use for anything complicated, C++ has templates which are more powerful but complicated, Java generics with type erasure are very clean but interact in subtle ways with e.g. covariance and contravariance, C# addresses the covariance and contravariance problems but interacts poorly with operator overloading. TBH I think Go made the right decision to keep both generics and operator overloading out of the core language at the beginning, since not having it at all is in many ways preferable to having a bad version.

You can't provide provide specializations for generics in C# and that often leads to performance penalty (for example you can't provide a specialization that would avoid boxing).

Yes, that's exactly the point I was making. C++ templates sit in this odd place between a macro system and a generics system--generics are the least powerful and most restrictive, macros are the most powerful and least restrictive, and C++ sits in the middle but its usability and ergonomics are exceptionally poor compared to either option.

However, in practice you often avoid boxing and unboxing in C#. This is due to the differences between the type system for C# and Java, with Java generics are built with type erasure which forces you to box and unbox everything. With C#, internally a List<T> will have a T[] inside it, and if T is a struct type you will not need to box or unbox it.

I'm not sure what examples you are thinking of where you are paying for boxing and unboxing due to generics in C#, I guess I haven't run into that particular problem. In general, generics in C# avoid boxing and unboxing. That's the difference between System.Collections (which doesn't use generics, and requires boxing/unboxing) and System.Collections.Generic (which does use generics, and as a direct consequence, avoids boxing/unboxing).

You can't provide a single generic method that works with T for both value types and reference types, but works without boxing/unboxing for value types. With specialization, you would be able to specialize the relevant part depending on the type.

Either that's factually incorrect or we have a disagreement about what "specialization" means.

In my mind, "specialization" means ad-hoc polymorphism. That is, you can change the behavior of a generic function, depending on its instantiated types. C# does not have this.

However, that does not mean that you get the same machine code for both object and reference types. As an optimization, when you use a template in C#, code is generated for the instance you are actually using. For value types, this means that the code will not box or unbox. This is clearly not specialization, at least by the definition I'm using, but it also avoids unboxing.

> The interesting question is how does .NET compile the generic IL of the server to machine code. It turns out that the actual machine code produced depends on whether the specified types are value or reference type. If the client specifies a value type, then the JIT compiler replaces the generic type parameters in the IL with the specific value type, and compiles it to native code. However, the JIT compiler keeps track of type-specific server code it already generated. If the JIT compiler is asked to compile the generic server with a value type it has already compiled to machine code, it simply returns a reference to that server code. Because the JIT compiler uses the same value-type-specific server code in all further encounters, there is no code bloating.

from https://msdn.microsoft.com/en-us/library/ms379564(v=vs.80).a...

Also this gem,

> Because the generic code does not force the boxing and unboxing of value types, or the down casting of reference types, performance is greatly improved.

https://stackoverflow.com/questions/47675493/equivalent-of-s... notes const generics, const specialisation, associated type constructors, type-quantified higher-rank trait bounds and variadic generics.

This is good work. Widespread parallel png codecs are a practical need that we've had for a while, everyone benefits.

Not interested by his implementation strategy but would only recommend this get merged into libpng to make the largest impact.

Compression time is heavily mentioned, but how about the output file size?

Over the years there've been wildly different PNG file sizes And a good encoder can secure something like 50% file size against a naïve implementation.

To be honest, I hadnt expected a speed problem here.


* gzipping 7680×2160x3 bytes from /dev/urandom on my thinkpad x220t takes 1.6 seconds (not mutch difference between -1 and -9)

* gzipping /dev/zero takes 0.4 seconds

So it turns out this might be a problem after all!

I'm not sure what you expected?

Random data doesn't compress well, and requires a lot more effort by the compressor.

Images aren't anywhere near that random.

I expected the throughput of the gzip process to be a lot higher, to be honest. Bot the random data and the stream of zeros took a noticeable time to compress.

In a video game or real-time mpeg stream decoding job, the video card has to generate this amount of data 30-60 times per seconds. So 0.4-1.6 seconds to compress the result seems long in comparison.

I tested the 2 extreme situations I could think of: ultimate random and ultimate order. Images will fall somewhere between these 2 limits, so on my PC, the PNG compression has a lower bound of 0.4-1.6 seconds to write an image with the current algorithm.

I must admit I upvoted for the puns.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact