
Puffs: Parsing Untrusted File Formats Safely - ingve
https://github.com/google/puffs
======
evincarofautumn
Some curious things about the language:

• All operators have equal precedence, so parentheses are required when mixing
operators. I guess they’ve seen one too many bugs from precedence confusion.

• It doesn’t have dependent types; rather, it uses a proof checker with some
built-in knowledge about the language, and a way to specify preconditions,
postconditions, and invariants that the checker can reason about.

• There is limited effect typing: functions can be marked pure, impure (!), or
impure coroutine (?).

• There doesn’t appear to be any kind of polymorphism—over types, effects,
refinements, or proofs.

~~~
masklinn
> All operators have equal precedence, so parentheses are required when mixing
> operators.

Doesn't that mean it _rejects_ precedence? In a language equal precedence
(e.g. Smalltalk), operators would just be executed from LTR or RTL.

~~~
evincarofautumn
I agree, although I suppose different operators could still have different
associativity in a language without precedence. I took that phrasing from the
“Puffs the Language” documentation.

~~~
masklinn
> I suppose different operators could still have different associativity in a
> language without precedence

That… is true, and I have a hard time computing how it would even work.

~~~
klodolph
Think exponentiation, where 2^3^4 might be 2^(3^4), but where subtraction
2-3-4 is (2-3)-4.

~~~
mnarayan01
Consider:

    
    
      2-3^4

~~~
irishsultan
Since there is no precedence in this (hypothetical) language that would be an
invalid expression.

------
nigeltao
Hi, Puffs author here.

ingve linked to the github page instead of the announcement e-mail:
[https://groups.google.com/forum/#!topic/puffslang/2z61mNTAMn...](https://groups.google.com/forum/#!topic/puffslang/2z61mNTAMns/discussion)

That announcement has more to say about the comparison to Rust, which is
probably the most frequently asked question.

There's also some more words, on Rust and on other related work like Dafny, at
[https://github.com/google/puffs/blob/master/doc/related-
work...](https://github.com/google/puffs/blob/master/doc/related-work.md)

Edit: It's also not Rust per se, but the numbers at
[https://github.com/google/puffs/blob/master/doc/benchmarks.m...](https://github.com/google/puffs/blob/master/doc/benchmarks.md)
shows that, on Puffs' benchmarks, gcc 7.2 noticably outperforms clang/llvm
5.0. I'm sure this is a solvable problem, and not a fundamental flaw with
llvm, but fixing that's beyond my llvm knowledge.

~~~
vvanders
> Some of the safe languages (Go, Java, JavaScript, Python, Rust, etc.) are
> faster than others, but generally speaking, they're not as fast as C/C++.
> Consider an array indexing expression like "a[i]".

I don't think you'll win any friends in the Rust camp by making broad
statements like that. Performance of "unsafe" vs "safe" languages largely
boils down to memory access patterns, cache usage and how many levels of
indirection you tend to hit. Rust easily matches C/C++ while keeping high
level abstractions(bounds checks are done at slice level and not per-element
access).

I don't think Mozilla would have used Rust for Servo/Quantum if it was slower
than C++. It's certainly held true in all the cases I've used Rust in place of
C++.

~~~
nigeltao
First, as I said in another comment, Rust is great tech, written by great
engineers. Puffs is still different (with different trade-offs).

And, yes, memory access patterns, cache usage, etc. affect performance.

And, yes, in general, Rust performs comparably to C/C++. As I noted elsewhere,
Rust with runtime arithmetic overflow checks currently performs worse than
Rust without such checks. So, yes, Rust without those checks is as fast as C,
and in general, arithmetic overflow isn't the biggest concern.

steveklabnik, a Rust expert, commented on this page that, in the future, "if
the runtime checks get cheap enough, we can do them in release mode as well".
If so, that's great, I'm happy to be proven wrong. Cheap still isn't zero,
though, and see "nanoseconds become milliseconds" at
[https://groups.google.com/forum/#!topic/puffslang/2z61mNTAMn...](https://groups.google.com/forum/#!topic/puffslang/2z61mNTAMns/discussion)

In contrast, Puffs today performs as fast as C, _with_ arithmetic overflow
checks. They just happen to be compile time checks. And sometimes overflow is
indeed a concern (search for "underflow" in
[https://blog.chromium.org/2012/05/tale-of-two-pwnies-
part-1....](https://blog.chromium.org/2012/05/tale-of-two-pwnies-
part-1.html)).

I'm sorry, but I don't understand what you mean by bounds checks being done at
the slice level and not per-element access. A statement like "pixels[y *
stride + x] = etc" is per-element, right?

~~~
vvanders
> I'm sorry, but I don't understand what you mean by bounds checks being done
> at the slice level and not per-element access. A statement like "pixels[y *
> stride + x] = etc" is per-element, right?

Yes, if you index a slice you have to check each access. However the idiomatic
way to work with strides of data in Rust is to use Iterators.

Bounds is checked at the entry of an iteration and the the inner loop is nice
and fast. So your example would be:

    
    
      for pixel in &mut pixels[0..y*stride+x] {
        *pixel = etc
      }
    

I tried to do something similar on the playground[1] but it turns out
Rust/LLVM is too smart and folded the whole loop down to a constant.

[1] [https://play.rust-
lang.org/?gist=f3699d6456a561c3874395bff36...](https://play.rust-
lang.org/?gist=f3699d6456a561c3874395bff36feaa1&version=stable)

~~~
bjz_
[https://rust.godbolt.org/](https://rust.godbolt.org/) is good for playing
around with this stuff:
[https://godbolt.org/g/9TgVXg](https://godbolt.org/g/9TgVXg)

------
margorczynski
> The aim is to produce software libraries that are as safe as Go or Rust,
> roughly speaking, but as fast as C, and that can be used anywhere C
> libraries are used

One question about this to people more versed in language compilation -
wouldn't it be possible for SAFE Rust code to be faster than C code
considering that the Rust compiler has much more syntax to play around and
optimize? Cause many of those safe constructs are part of the language and can
be taken into account

~~~
chrismorgan
In theory, yes, some Rust code can be faster than the C or C++ code people
would write, especially as pertains to the guarantees offered by its aliasing
control.

In practice, LLVM hasn’t had any incentive to implement those sorts of
optimisations at this stage (because before Rust came along nothing would
benefit from them), so the benefits are generally theoretical only.

It remains to be seen what this will effect.

~~~
muizelaar
LLVM does implement these sorts of optimizations:

C:

    
    
      int foo(int *x, int *y) {
        *x = 0;
        *y = 1;
        return *x;
      }
    

gives:

    
    
      foo(int*, int*): # @foo(int*, int*)
        mov dword ptr [rdi], 0
        mov dword ptr [rsi], 1
        mov eax, dword ptr [rdi]
        ret
    

where as in Rust:

    
    
      fn foo(x: &mut i32, y: &mut i32) -> i32 {
        *x = 0;
        *y = 1;
        *x
      }
    

which compiles to:

    
    
      example::foo:
        mov dword ptr [rdi], 0
        mov dword ptr [rsi], 1
        xor eax, eax
        ret

~~~
wolf550e
That optimization is useful to C and C++ code, they use "restrict":
[https://en.wikipedia.org/wiki/Restrict](https://en.wikipedia.org/wiki/Restrict)

Does LLVM implement optimization that only Rust can use, or at least those
that cannot be used from C/C++?

~~~
steveklabnik
&mut T pointers are basically restrict by default.

LLVM is adding some semantics specific to non-C or C++ languages. I’m on my
phone so I can’t link you, but they’re adding an intrinsic related to infinite
loops because languages like Rust have different semantics here.

~~~
Manishearth
Not anymore, LLVM was buggy about this and we had to remove this.

~~~
steveklabnik
That’s why I said “basically”, it’s been fixed upstream and will be turned on
again soon.

------
tc
This DSL seems largely motivated by the desire for provable compile-time
bounds checking. One general-purpose modern language that can check bounds at
compile time and generate code that is performance-competitive with C is ATS:

[http://www.ats-lang.org/](http://www.ats-lang.org/)

Dependent types are meant to solve exactly this class of problem. Rust has an
RFC for adding these:

[https://github.com/rust-lang/rfcs/issues/1930](https://github.com/rust-
lang/rfcs/issues/1930)

~~~
throwaway613834
Aren't dependent types undecidable?

~~~
tomp
integers with additions are decidable - that's pretty much what you need to
verify array bounds (multiplication with a constant is included as well, so
matrices should work too).

~~~
throwaway613834
Do you mean "in practice" that's what you need? I could buy that, but I'm
thinking about it in terms of the language spec. What would it say? Are we
moving toward a language spec that says the meaning is "whatever the reference
compiler can handle"? Or do you mean one that says "array indices may only be
(constant) linear combinations of variables in order for the code to be
compilable" or something like that?

~~~
tomp
Good point. The way I imagined is, the language is defined as "safe", and has
both "static" and "dynamic" guarantees. Whatever the compiler is able to prove
statically, amazing, or else it adds dynamic checks. The programmer can
optionally tag the function/expression so that the compiler will warn if it
can't prove it. Long-term, I think it's very reasonable to assume the solvers
will improve, so best to design a language with this in mind.

------
CapacitorSet
>The aim is to produce software libraries that are as safe as Go or Rust,
roughly speaking, but as fast as C, and that can be used anywhere C libraries
are used.

I'm all for having different implementations of software, but does Rust not
fulfill these requirements?

~~~
Ded7xSEoPKYNsDd
There are C compilers for many architectures that the Rust compiler (today)
does not support. They may also be referring to the fact that you can't just
drop it into existing projects with their complicated build systems in the
same way that you can drop a few generated ".c" and ".h"files.

~~~
yaantc
Great points, plus from the puff readme: "In Rust, integer overflow is checked
at run time in debug mode and silently ignored in release mode by default, as
the run time performance penalty was deemed too great.". With puff, it's
statically checked. I haven't delved into puff's details, but to support as
many environment as possible in an easy way today, a DSL generating C seems a
pragmatic decision.

~~~
tveita
By default, but it is possible to enable integer overflow for release builds
with "-C overflow-checks".

~~~
nigeltao
And that comes with a performance penalty. Maybe it won't, in a future version
of Rust, but it does for now.

------
microcolonel
This is very interesting, after kicking the tires on a few parsers, I've come
to the conclusion that most parsers are absolute garbage, and it's nobody's
fault in particular. This should make it close to feasible to write fast
native parsers (especially validators) which aren't a liability.

------
tom_mellior
Does anyone know what analysis or prover they use to discharge proof
obligations? I poked around the .md files in the repository, but I haven't
seen this addressed anywhere. I'm not (yet) motivated to dig into the actual
source...

~~~
haskellandchill
Looks like it is not a general system, they prove the hardcoded binary
operations:
[https://github.com/google/puffs/blob/master/lang/check/asser...](https://github.com/google/puffs/blob/master/lang/check/assert.go#L270)

~~~
tom_mellior
Thanks. From what I've seen in that file and another one in the same
directory, it seems to be a standard abstract interpreter on a domain of
intervals. I would have expected a more refined relational domain that can
express certain inequalities, but this seems to be enough for a first
prototype.

~~~
nigeltao
I'm not entirely sure what you had in mind, but you can express certain
inequalities. For example,
[https://github.com/google/puffs/blob/master/std/gif/decode_l...](https://github.com/google/puffs/blob/master/std/gif/decode_lzw.puffs)
has this line:

assert n_bits < (width + 8) via etc

The 'via' syntax is discussed at
[https://github.com/google/puffs/blob/master/doc/puffs-the-
la...](https://github.com/google/puffs/blob/master/doc/puffs-the-language.md)

~~~
tom_mellior
Thanks for your answer. Yes, this kind of symbolic inequalities was what I had
in mind. The solution via "via" is quite clever.

In the beginning I had thought that maybe you were using something like
Pentagons: [https://www.microsoft.com/en-
us/research/publication/pentago...](https://www.microsoft.com/en-
us/research/publication/pentagons-a-weakly-relational-abstract-domain-for-the-
efficient-validation-of-array-accesses/) which were developed explicitly in
the context of proving array bounds checks.

You might want to give them a look. They would cost you in implementation
effort, of course, but they might make Puffs (almost as) fast _and_ pretty
smart!

------
ekr
An interesting observation: this was written by the brother of Terence Tao.

~~~
dullgiulio
If I am not wrong, he is a member of the Go core team?

~~~
nigeltao
Yep, a member of the Go core team, although not as active in the Go community
these days as I was in past years.

And yep, I'm also Terry's brother.

------
eridius
> _The aim is to produce software libraries that are as safe as Go or Rust,
> roughly speaking, but as fast as C, and that can be used anywhere C
> libraries are used._

This is kind of a weird thing to say, because Rust _is_ as fast as C, and can
be used to create C-compatible libraries.

~~~
nigeltao
See my other comment (written after your comment). Search this page for "That
announcement has more to say about the comparison to Rust, which is probably
the most frequently asked question."

------
andrewchambers
What makes this language specific to parsing files? It mentions that it is a
DSL for parsing files a few times... why not network packets, or any other
task.

~~~
fixermark
I interpret that to be more a statement of creator intent than a restriction
on the end-user. The general vibe I get is "This language is verbose as hell
and puts a lot of constraints on what your functions are allowed to do;
writing a full-sized application in Puffs alone is going to be a super-tedious
exercise."

------
Santosh83
Can this DSL be extended to cover other classes of bugs and undefined
behaviour in C without spilling over into runtime checks?

~~~
nigeltao
Possibly. I'd like Puffs (and its generated C code) to be free of UB (and I've
skimmed the blog.regehr.org posts), but I'm not 100% certain if it is.

------
rurban
Looks like a mix of ATS and Pony, but without the rich proof system of ATS,
without the safeties provided by both and without the performance advantages
of both. Wonder why they still try to reinvent the wheel, when there are
already established and better langs out there. Well, it's Google, so they are
still academics.

------
actionowl
We're google, we can pick any name we want, even if a very similar project is
already using it.
[https://www.netbsd.org/docs/puffs/](https://www.netbsd.org/docs/puffs/)

~~~
acdha
Creating a filesystem in user-space is “very similar”? I mean, they both
involve files but…

[http://netbsd.gw.com/cgi-bin/man-
cgi?puffs+4.i386+NetBSD-7.1](http://netbsd.gw.com/cgi-bin/man-
cgi?puffs+4.i386+NetBSD-7.1)

~~~
deepsun
What will I get if I type "sudo apt install puffs"?

What do I have if I see libpuffs.so somewhere?

~~~
acdha
That's a valid question but not the same as saying the projects are very
similar. In that case in particular, the answer would come down to “Are you
running on NetBSD?” and so the most likely answer is that the NetBSD package
maintainer would package it as "google-puffs" or something like that.

~~~
nigeltao
Yeah, it'd be unfortunate, but not unprecedented. For example, Debian has both
epiphany and epiphany-browser, which are two separate things.

~~~
nigeltao
I'm considering a name change:
[https://groups.google.com/d/topic/puffslang/cSrH-s7UqwA/disc...](https://groups.google.com/d/topic/puffslang/cSrH-s7UqwA/discussion)

As for the "We're google, we can pick any name we want" assumption that
somebody else wrote earlier,
[https://github.com/google/puffs](https://github.com/google/puffs) says
"Disclaimer: This is not an official Google product, it is just code that
happens to be owned by Google". Also, before the launch, I didn't find many
hits for "puffs" when searching github.com projects, or searching "apt search
puffs", or searching the Web in general for queries like [puffs programming],
or uses of the ".puffs" file extension. It's not like I cackled maniacally as
I deliberately screwed over the NetBSD puffs project, whether for myself or on
behalf of Google, I just didn't find it. I am bad at searching. Sorry.

------
tambourine_man
That's a great acronym

