
Writing a JPEG Decoder in Rust – Part 2: Implementation I - adamnemecek
https://mht.technology/post/jpeg-rust-2/#
======
pslam
If you scan through the source (on github), the most notable thing is the lack
of the "unsafe" keyword. I've seen too many people basically transliterate
from C to Rust, along with all the unsafe operations. This one is pretty much
unoptimized, but still seems to be performant enough, and doesn't do anything
"unsafe".

That's not to say you could just throw this into the back-end of a public-
facing website. It's still going to panic (abort) if someone unexpected
happens, and it's still going to chew indeterminate amounts of CPU, memory or
storage unless sandboxed (maybe even just ulimit). And there's still the
danger one of the Rust standard library functions has a flaw in it. But this
is the kind of starting point you wouldn't get with the plain C equivalent.

~~~
technion
Given the history of vulnerabilities in tools like ImageMagick, that panic
might still be a better position than an exploitable memory bug.

~~~
conradev
If you run it in a spawned thread, the thread will actually just unwind and
disappear if it panics (by default).

But to detect panics, you can use `afl.rs` to fuzz a parser:
[https://github.com/frewsxcv/afl.rs#user-content-trophy-
case](https://github.com/frewsxcv/afl.rs#user-content-trophy-case)

------
0x0
I don't know rust, but what happens if you get a 0xff as the last byte here,
wouldn't i+1 go out of array bounds?

    
    
       while i < vec.len() {
                encoded_data.push(vec[i]);
                if vec[i] == 0xff && vec[i + 1] == 0x00 {
                    // Skip the 0x00 part here.

~~~
grayrest
Since Rust people are reading this.

I find myself wanting to walk an iterator pairwise regularly for stuff like
this and I'm using:

    
    
        for (cur, next) in (& vec).iter().zip(vec.iter().skip(1)) {
            encoded_data.push(cur);
            if cur == 0xff && next == 0x00 {
                // ...
            }
        }
    

Is there a better way to write this? What I'd really like is the equivalent to
Clojure's partition[1]. I know that the stdlib has windows `(partition n (dec
n) seq)` and chunks `(partition n n seq)` implemented on slices but I usually
want this on iterators for the lookahead-like behavior shown here.

[1]
[https://clojuredocs.org/clojure.core/partition](https://clojuredocs.org/clojure.core/partition)

~~~
steveklabnik
I'm not 100% sure, but
[https://crates.io/crates/itertools](https://crates.io/crates/itertools) might
have what you're looking for.

~~~
elktea
I think [http://bluss.github.io/rust-
itertools/doc/itertools/struct.Z...](http://bluss.github.io/rust-
itertools/doc/itertools/struct.Zip.html) would fit their use case?

------
bluejekyll
this is a really elegant way to take parallel arrays and put them into a
table:

    
    
      let codes: Vec<HuffmanCode> = data_table.iter()
                .zip(code_lengths.iter())
                .zip(code_table.iter())
                .map(|((&value, &length), &code)| {
                    HuffmanCode {
                        length: length,
                        code: code,
                        value: value,
                    }
                })
                .collect();
    
    
    

Nice!

~~~
lmm
It seems like a lot of work? In scala this would just be:

    
    
        val codes = (data_table, code_lengths, code_table).zipped map HuffmanCode

~~~
mtanski
Rust is not Scala

~~~
lmm
It's not, but they're both ML-family languages. I had hoped that Rust would be
able to offer a similar level of elegance to Scala.

~~~
nercury
Looking at the example, I can't find a thing in there that is not required
(except the obvious `:Vec<_>` type that is not required). We need `.iter` to
tell that iterator is imutable (there is mutable version), we need `.collect`
to actually run the iteration. I also don't think objects should have default
constructor functions.

It may be possible to implement `.zipped` on tuples though.

~~~
steveklabnik

      > (except the obvious `:Vec<_>` type that is not required)
    

It is required, otherwise, `collect()` doesn't know what type of collection to
collect into.

    
    
      > It may be possible to implement `.zipped` on tuples though.
    

This is impossible without varargs, no?

~~~
pcwalton
> This is impossible without varargs, no?

Nah, you'd just implement it as an impl over and over on tuples of reasonable
size (up to 16 or so).

It wouldn't be elegant, but it'd work in practice.

~~~
steveklabnik
Oh right, I see now.

~~~
__s
This technique is used a lot by specs: [https://github.com/slide-
rs/specs/blob/master/src/join.rs#L5...](https://github.com/slide-
rs/specs/blob/master/src/join.rs#L50)

------
Artlav
I feel like i'm missing something.

In the "Why" section he claims that Rust is a performant language with a low-
level feel.

In this part, he claims that decoding a tiny JPG image took 2 seconds with his
code.

How is that "performant" by any definition?

~~~
balducien
Somebody on reddit found the bottleneck:
[https://www.reddit.com/r/rust/comments/4yinbt/writing_a_jpeg...](https://www.reddit.com/r/rust/comments/4yinbt/writing_a_jpeg_decoder_in_rust_part_2/d6obz7r?context=2)

After fixing it, it apparently runs in 100 ms.

~~~
martinhath
It's worth mentioning, those numbers are for my test program which only read a
single file, using my and his implementations. In addition, the file read was
20MB. lena.jpeg is 90K.

------
daconvergence
Rust code looks very similar to ES6 or Typescript. The great CS language
convergence has begun!

~~~
quotemstr
Yes: now we all forget how to correctly manage memory. Rust: where resources
are unlimited and aborting when you run out of them is okay

~~~
vertex-four
The vast majority of programs that you run on a day to day basis will abort
when they run out of memory. On a default Linux system, there's not even any
way to prevent that - you need to faff with the config to make it actually
return errors when allocating memory.

~~~
wahern
That's all irrelevant for a language which is supposed to be a systems
programming language, however that term is defined.

~~~
vertex-four
I don't think it is - less, bash, gcc, systemd, most interpreters, and Firefox
will all pretty much just crash if malloc() fails - they're varying levels of
things you'd want a "systems language" for. Unless you're trying to define
systems language to mean one used for kernel or bare-metal embedded
development (both cases where even in C, you skip the standard library and
write your own memory allocation functions etc), there's really very few cases
where you (a) are allowed to allocate memory and (b) need to fail gracefully
somehow if you can't.

~~~
wahern
I'm not saying that aborting on OOM never makes sense. I'm saying there are
many situations where you shouldn't abort on OOM, and those situations overlap
considerably with the situations in which a systems language is ideal[1]. So a
systems language that doesn't allow for robust and fine-grained OOM handing
isn't much of a systems language.

Example: Basically any highly concurrent network daemon that multiplexes many
clients on the same thread. In that case, you want much more control over
where to put your recovery point. Even if the process doesn't abort, if the
recovery point is beyond the scope of the kernel thread (i.e. a controller
thread, which was the recommended solution before catch_unwind), that can be
really inconvenient, and also requires a lot of unnecessary passing of mutable
state between threads, which is usually something you try to avoid.

[1] Lua has robust support for OOM recovery, which is noteworthy because there
can be situations where you both want to handle OOM but where a scripting
language is more preferable. Example: An image manipulation program with
scriptable filters, where you don't want an operation that can't complete to
take down your process or thread.

~~~
vertex-four
On the other hand... who actually disables over-committing on their Linux
servers or the machines they run image editors on so that you could actually
handle out-of-memory errors? I've never come across anyone who's changed the
setting unless told to by a specific piece of software (almost always database
software).

~~~
wahern
I do. Not only do I disable overcommit, I disable swap.

Most software is riddled with buffer overflows and other exploits, and yet
it's rare that you come across an intruder while he's installing his rootkit.
That doesn't mean it's not happening, just that people are ignorant about it;
and that things can appear normal even with rootkits installed.

Like buffer bloat, people can be experiencing a problem without even realizing
it's a problem. When software crashes under load they just think that it's
_normal_ to crash under load.

Or when it crawls to a snails pace under load because it's swapping like mad,
they think that's normal, even though QoS would have been much better if the
software failed the requests it couldn't serve rather than slowing everybody
down until they _all_ timeout, sometimes even preventing administrators from
diagnosing and fixing the problem.

OOM provides back pressure. Back pressure is much more reliable and responsive
than, e.g., relying on magic constants for what kind of load you _think_ can
be handled.

------
juliangoldsmith
Original thread:
[https://news.ycombinator.com/item?id=12319557](https://news.ycombinator.com/item?id=12319557)

~~~
martinhath
Oh well - not getting karma on HN isn't the worst thing that could've happend
:)

------
namelezz
[0] an old decoding speeds survey of existing JPEG decoders(2010).

[0] - [http://www.briancbecker.com/blog/2010/analysis-of-jpeg-
decod...](http://www.briancbecker.com/blog/2010/analysis-of-jpeg-decoding-
speeds/)

