
WebAssembly doesn’t make unsafe languages safe - paraboul
https://00f.net/2018/11/25/webassembly-doesnt-make-unsafe-languages-safe/
======
pcwalton
The general point that this article makes is true--C and C++ are still unsafe,
even if you execute them in a VM--but there are some very important caveats to
note:

* ROP should be impossible in Web Assembly, because the call stack is a separate stack, inaccessible from the heap and effectively invisible to the program.

* For the same reason, classic buffer overflow attacks involving attacker-supplied machine code are impossible. In addition, Web Assembly bytecode is never mapped into the heap, so attackers cannot inject shellcode via the usual methods.

* Mmap functionality is proposed for the future [1], which would allow for the implementation of guard pages. (Guard pages are fairly weak defenses anyway...)

I'm sure that it's _possible_ to attack a Web Assembly program's control flow
by overwriting vtables and function pointers and taking advantage of the
resulting type confusion. But it's significantly more difficult than it is in
native code.

[1]:
[https://github.com/WebAssembly/design/blob/master/FutureFeat...](https://github.com/WebAssembly/design/blob/master/FutureFeatures.md)

~~~
pjmlp
If WebAssembly had tagged pointers that could be avoided, as proven by Solaris
for SPARC with ADI in production.

Because when something is possible in security attacks, it will eventually be
exploited.

~~~
pcwalton
ADI doesn't seem to me to have enough bits to effectively protect against
vtable type confusion.

Besides, nothing stops a wasm compiler from emitting such checks itself, if it
wants to.

~~~
pjmlp
Not sure about that, it does seem quite effective, though.

Having the compiler emiting it wouldn't be an option, as hardware enabled
solutions seem to be the only ones accepted by C devs, as shown by SPARC and
now ARM.

~~~
sitkack
I think there is the possibility of integrating the source maps and the WASM
with special purpose execution envs to apply these checks.

------
archgoon
Making unsafe languages safe is not a design goal of WebAssembly. The author
seems to believe that WebAssembly should provide many things that are the
responsibility of the operating system and standard c library.

I do not believe that is a good way of thinking about webassembly. It
shouldn't be thought of as an operating system target (like Windows, Linux, or
Mac); it should be thought of as an architecture target (like Arm vs x86).
What the author wants is an operating system and runtime, which you can write
for (and standardize on) web assembly and then load your program into it.

~~~
yellowapple
Making unsafe languages safe (at least to some degree) _has_ to be a design
goal of WebAssembly; else, it has zero business being used to run arbitrary
code that's automatically downloaded from the Internet.

~~~
saagarjha
There are different kinds of safety, though. There's sandboxing, which means
that applications running in WebAssembly should not have access to resources
they are not privileged to access. And there's memory safety, which means the
WebAssembly application itself shouldn't try to break guarantees about memory.
These are two separate things, and only the first can do things outside of
breaking the website you're on.

------
danShumway
> _Technically, this can be implemented already. However, for successful
> adoption, a standard interface has to be defined._

This is literally the opposite of how the web works. The web is based on the
Extensible Web Manifesto[0], where _first_ we come to a consensus and see
widespread adoption, and _then_ we make a standard that reflects that
consensus.

> _We have a fantastic and highly secure execution environment from a host
> perspective. But from the guest perspective, that very same environment
> looks like MS-DOS, where memory is a giant playground with no rules._

Well, yeah. Add me to the list of people who are just kind of confused about
why fixing C's problems is the responsibility of the browser, and why people
thought that it was ever a priority for WASM. WASM is a VM-like isolation
chamber for low-level code. It's not designed to make your code safe, it's
designed to protect the host environment from unsafe code.

I've seen multiple people make the argument that manually laying out memory in
WASM is a design flaw rather than one of the biggest points of the entire
implementation. Manually laying out memory with close to zero restrictions or
paradigms or safeguards is the thing that makes WASM good.

I feel like I missed something, I don't know where people got this idea that
WASM was trying to be a higher level tool. It's an extremely low-level
language-agnostic compile target for both safe and _unsafe_ languages, and
languages that care about safety should add their own safeguards as they see
fit.

[0]:
[https://github.com/extensibleweb/manifesto](https://github.com/extensibleweb/manifesto)

------
ncmncm
It is worse than the title says: a key tool for trapping corrupted process
state, null pointer segfault, is turned off in WebAssembly targets.

The second most common source of process corruption, integer overflow, usually
ignored in native targets because direct memory corruption is so much worse,
is also adopted into WebAssembly wholesale.

This is all overwhelmingly worse than in the native case, because we make some
effort to run only trustworthy native code, but browsers actively solicit
unknown code from known-hostile sources--most particularly, web ads.

The existence of safe-ish languages doesn't help, because there is no
incentive to deploy them in user-hostile code fragments.

~~~
AgentME
>This is all overwhelmingly worse than in the native case, because we make
some effort to run only trustworthy native code, but browsers actively solicit
unknown code from known-hostile sources--most particularly, web ads. The
existence of safe-ish languages doesn't help, because there is no incentive to
deploy them in user-hostile code fragments.

WebAssembly strongly sandboxes WASM modules that are executed, so the
situation where an attacker creates and serves a malicious WASM module to the
user's browser is one that is handled well.

The article is saying that WASM's protections stop a module from escaping its
sandbox, but it doesn't prevent a buggy module from having its own memory get
corrupted by its own memory-related bugs common in unsafe languages.

------
nickcw
I'm not sure the NULL pointer access being allowed point is valid.

The C standard allows for the NULL pointer to be numerically any value, not
just 0, and on some architectures in isn't 0. So I would hope C compilers
targeting wasm would just use a different value for the NULL pointer, eg
0x8000000000000000 not 0.

Whether they actually do or not I don't know, but it isn't an architectural
failing of wasm.

~~~
azakai
It's true that in theory a NULL pointer could be 0xffffffff or such, and that
would prevent some of the problems the article mentions. However, that would
only help when you assign NULL to a pointer explicitly. The problem is that in
practice NULL pointers are often caused by other things, like thinking that
zero-initialized memory contains a pointer - in fact, this is even more common
in wasm than natively, since wasm guarantees memory and stack are initialized
to zero.

This is serious enough an issue that for debugging purposes emscripten has a
SAFE_HEAP mode which will check for NULL pointers on every access. We'd still
need that to look for 0 even if the compiler considered NULL to be something
other than 0.

~~~
int_19h
> The problem is that in practice NULL pointers are often caused by other
> things, like thinking that zero-initialized memory contains a pointer

Which, to be clear, is not a valid assumption wrt portable C or C++.

~~~
spc476
While C may not mandate zero-initialized pointer is NULL, POSIX does.

------
azakai
> there would be clear benefits in also delegating basic dynamic memory
> management to the host.

It would be quite hard to spec malloc/free into the Web platform, because of
the necessary detail: the behavior of that malloc/free pair would need to be
identical (same pointers returned from malloc) in all browsers, for every
possible sequence of allocations and deallocations. GC is actually easier to
spec since so much of the underlying details are unobservable.

It's not impossible in theory, but it seems like it would mandate all browsers
use the exact same allocator implementation (say, dlmalloc version x.y.z). In
addition, this would not allow optimization over time, again, unlike GC.

In the non-browser case, server-side implementations may not need to worry
about specs, so more options may be open there.

~~~
pedrocr
_> the behavior of that malloc/free pair would need to be identical (same
pointers returned from malloc) in all browsers, for every possible sequence of
allocations and deallocations_

Why would you ever spec it like that? That doesn't seem needed at all. Plenty
of programs run across platforms with wildly different mallocs and everything
runs file. Replacing the malloc with LD_PRELOAD tricks is even done sometimes.

~~~
azakai
For almost all platforms that's true, but the culture of standardization on
the Web is unique. It comes from wanting a Web page to have the maximal chance
of running across all standard-compliant browsers, forever - that's just not
the case for practically every other platform.

~~~
pedrocr
If only that was true. Javascript has had all plenty of inconsistent behaviors
across browsers and the feature matrix for what is and isn't supported in each
browser is hugely complex. And even if that was true standardizing the results
of malloc makes no sense, it gives you no extra compatibility guarantees for
any sane programs and no one cares about insane ones.

------
jillesvangurp
C is unsafe because it comes with no memory protections. In an environment
like a typical operating system (depending on the OS of course), there's a
chance it will escape its sandbox. This risk does not exist in wasm. You can
write the most atrociously misguided C ever and run it in wasm and know for
sure that it will never do anything worse than corrupt its own program state.

You get similar benefits by using a decent OS and/or by putting the
untrustworthy code in e.g. a docker container. It's still code that can't be
trusted but at least you are setting some hard boundaries that have no
easy/known ways of being bypassed.

However, it would still be bad if the program gets compromised. If, say, you
are handling some credit card details and are using some C code to do that and
it gets compromised, all of the interesting stuff (i.e. the credit card
details) would be inside those boundaries. This is why using C to do that is
not a great idea.

This is a concept that is often misunderstood by people: most of the
interesting stuff (from a hacker point of view) happens inside the sandbox.
This is where you access and handle sensitive data and access protected
resources (e.g. a third party website). Many attacks use relatively low tech
mechanisms such as script injection, man in the middle attacks, social
engineering, etc.

Wasm indeed does nothing to protect against that. If you are handling user
input in any way, that is potentially a way for hackers to inject code in your
sandbox. If that sandbox has access to or control over anything interesting,
that just got compromised. If you are using a language that is notorious for
its decades long history of input validation mechanisms getting compromised
(cough C cough), that means you can't trust input validation to function
properly even inside a wasm container.

This is exactly how many browser attacks work. They don't install viruses on
your machine or whatever but they simply trick you into revealing your
credentials, visiting some evil website, or entering your credit card. A bit
of injected javascript or a redirect is all that this takes. Any injected code
never leaves the sandbox; it doesn't have to.

~~~
pjmlp
Which is exactly why C++ modules in the CLR are marked as unsafe.

There used to be the possibility to restrict the C++ language to a safe subset
via /clr:pure and clr:safe switches, but now Microsoft official position is
that C# or other .NET languages, are the only way to effectively write safe
and verifiable code.

[https://docs.microsoft.com/en-us/cpp/build/reference/clr-
res...](https://docs.microsoft.com/en-us/cpp/build/reference/clr-
restrictions?view=vs-2017)

[https://docs.microsoft.com/en-us/cpp/dotnet/pure-and-
verifia...](https://docs.microsoft.com/en-us/cpp/dotnet/pure-and-verifiable-
code-cpp-cli?view=vs-2017)

------
cyberbullets
There is also another blog post (link to white paper inside) that talks about
different memory safety issues with WebAssembly if using a memory-unsafe
language: [https://www.forcepoint.com/blog/security-labs/new-
whitepaper...](https://www.forcepoint.com/blog/security-labs/new-whitepaper-
memory-safety-old-vulnerabilities-become-new-webassembly)

------
kbumsik
> First, hosts have no visibility on how memory is being managed within a
> guest. Want to diagnose memory leaks? Good luck with that.

Is that really true? I think we can implement a memory allocator done by
hosts. By using "import" feature of WASM, a guest could implement a memory
allocator that imports functions from the host that ask the host to look
through the gest's memory directly (it is possible in JS code) and then
allocate one. Maybe it will be slower but it is certainly possible that hosts
can know what is happening to the memory in this way.

------
pier25
I love the idea of WebAssembly in the browser, but what's the point of using
it as an universal module language everywhere else?

We can already run most popular languages pretty much anywhere, and I imagine
at better performance than WebAssembly, no?

~~~
josephg
The advantage is that code run through WASM is fast, sandboxed and platform-
independant. Nothing else is capable of giving you all 3 of those features.

Embedded scripting languages (JS / Lua) are platform-independant and
sandboxed, but much slower than native code.

C is fast and small. It can be very carefully sandboxed by the OS (eg iOS
apps) but the sandbox has historically been very leaky. Also binaries must
target a specific hardware architecture.

The JVM & .NET environments are platform independent, but their heavy reliance
on garbage collection slow them down. They also both have large runtime
environments which makes them more awkward to embed. I wouldn't want to embed
the JRE inside the linux kernel or in a web browser because its so big and it
has such a terrible security track record. But WASM would work great.

What other technology would be a good fit for making mods for a multiplayer
game? Java? Too bloated. Lua? Workable but slow. C? You can't ship 3rd party
mods safely and even if you could they would need to be recompiled for every
platform. But WASM will work great. What other tech could you use to run a
user-supplied filesystem driver in-kernel? Everything else is too slow (lua)
or too unsafe (C).

~~~
pier25
Ok, plugins/mods for real time applications are a good use case. Not only
games, but also databases, web servers, etc.

I'm starting to see the value in using WebAssembly outside the browser.

------
abecedarius
> Memory is represented as a single linear block

iirc you can have multiple instances or modules, each with its own private
linear memory, and they can be set up to call each other's exported functions.
It's just that current compilers don't use that encapsulation when compiling a
single program. Am I wrong about this?

(I haven't used WebAssembly, only read about it some time ago. But here's a
link in support:
[https://groups.google.com/forum/#!topic/e-lang/3A6zYWF6u5E](https://groups.google.com/forum/#!topic/e-lang/3A6zYWF6u5E)
"Multiple module instances only share an address space if they explicitly
export/import a linear memory, which is just another kind of module-level
definition. A module can define its own linear memory and not export it, in
which case nobody else can access it.")

~~~
kbumsik
It might be possible when exchanging only numbers. But I think it's going to
be quite tricky when exchanging pointers of buffers between modules since
values of a pointer in a module is meaningless to the other module if the
momory block is not shared.

------
dmitrygr
this is complete nonsense. A normal c program running on a modern operating
system has the mmu control what memory it can access. So, Web assembly's "just
a block of memory" is no better - it is the same.

~~~
sitkack
Correct.

A normal C program has full reign over the entire process memory space.

A wasm program only has access to the call interface provided to it and its
linear memory region allocated to it.

The guarantees are much different and greatly in favor of wasm. In wasm the
stack and the code segments are off limits.

~~~
kllrnohj
You missed the author's point by miles.

C only has access to its process space as does WASM. WASM's only improvement
is it makes stack smashing bugs harder (which ironically makes real security
worse as it means you can't use retpoline, but let's ignore that for now)

HOWEVER it gives up on using guard pages and address randomization (systemic
preventions against common bugs like buffer overflows), making WASM _in
practice_ less safe than C.

WASM is only safer for the host embedding untrusted code. For trusted code,
though, it's across the board worse in every way - including security. Which
shouldn't be all that contentious of a statement since WASM had no goals or
intentions to ever replace trusted native code execution. That's not a thing
it tries to do, and it shouldn't be a surprise that it didn't do it.

~~~
monocasa
> which ironically makes real security worse as it means you can't use
> retpoline

Why wouldn't the WASM JIT retpoline all code (probably only if the underlying
processor needs it) rather than relying on the guest programs to do it
themselves.

> HOWEVER it gives up on using guard pages and address randomization (systemic
> preventions against common bugs like buffer overflows), making WASM in
> practice less safe than C.

Once again, wouldn't the JIT be able to enforce this at runtime?

It's not like WASM binaries are referencing raw address offsets.

~~~
kllrnohj
> Why wouldn't the WASM JIT retpoline all code (probably only if the
> underlying processor needs it) rather than relying on the guest programs to
> do it themselves.

Possible, but is that then a runtime option in the WASM header? How does it
know if the program needs the cost of retpoline or not? And how can a program
ensure that it's getting such mitigations when it needs them?

> Once again, wouldn't the JIT be able to enforce this at runtime?

The JIT has no insight into the app's malloc/free usage, so no, it can't.
There is no system allocator in WASM, just a big chunk of linear memory that
the app does whatever it wants with.

For an actual case study in why this is risky all you have to do is look at
OpenSSL's heartbleed. If it had been using the system's malloc/free (which
WASM doesn't have) instead of using its internal free lists then heartbleed
largely wouldn't have existed on platforms like OpenBSD.

~~~
monocasa
> Possible, but is that then a runtime option in the WASM header? How does it
> know if the program needs the cost of retpoline or not? And how can a
> program ensure that it's getting such mitigations when it needs them?

You're looking at it the wrong way. Untrusted code doesn't have a "I pinky
promise that I'm safe" bit they can twiddle in their header. If the code
running is untrusted, it doesn't get to make that decision, all code under
that context needs to be retpolined. (V8 already does this
[https://chromium.googlesource.com/v8/v8/+/7d356ac4927e9da3e0...](https://chromium.googlesource.com/v8/v8/+/7d356ac4927e9da3e0c9757e214aa4b20f68acca)
)

> The JIT has no insight into the app's malloc/free usage, so no, it can't.
> There is no system allocator in WASM, just a big chunk of linear memory that
> the app does whatever it wants with.

It can retarget the code so that the virtual base address of the linear region
isn't fixed. That's about all ASLR does anyway.

> For an actual case study in why this is risky all you have to do is look at
> OpenSSL's heartbleed. If it had been using the system's malloc/free (which
> WASM doesn't have) instead of using its internal free lists then heartbleed
> largely wouldn't have existed on platforms like OpenBSD.

You can implement all of the malloc protection strategies that OpenBSD uses in
user space. It's completely legitimate to use your own malloc in those
circumstances, libc's malloc isn't magically special, and it makes sense that
they'd want to rely on memory allocation with additional guarantees than libc
gives you. OpenSSL's sin wasn't re-implementing malloc; it was re-implementing
it poorly (and not bounds checking input from attacker controlled packets).

~~~
kllrnohj
> You're looking at it the wrong way. Untrusted code doesn't have a "I pinky
> promise that I'm safe" bit they can twiddle in their header. If the code
> running is untrusted, it doesn't get to make that decision, all code under
> that context needs to be retpolined. (V8 already does this
> [https://chromium.googlesource.com/v8/v8/+/7d356ac4927e9da3e0...](https://chromium.googlesource.com/v8/v8/+/7d356ac4927e9da3e0..).
> )

No, you're not understanding.

retpoline doesn't keep you from attacking others, it keeps you from _being_
attacked. But not everything handles sensitive data and thus doesn't care
about being attacked, and doesn't want to pay the cost of the retpoline that
they didn't need at all.

How can a WASM app signal "I need retpoline because I have secrets to protect"
from "I don't give a damn about retpoline because there's no secret data in my
process space at all"?

That's not a decision WASM can unanimously make on behalf of others.

> You can implement all of the malloc protection strategies that OpenBSD uses
> in user space.

No, you literally can't. WASM doesn't have the APIs necessary to do that. It
doesn't have mmap & mprotect. Those are in the "proposals we might consider"
future section of WASM. But WASM _today_ cannot implement a malloc/free on-par
with those in actual libc implementations on major platforms. To say nothing
of debug tools like valgrind.

> OpenSSL's sin wasn't re-implementing malloc; it was re-implementing it
> poorly (and not bounds checking input from attacker controlled packets).

So the solution to people re-implementing malloc badly is to force _everyone_
to re-implement malloc? How does that follow at all?

I get WASM doesn't want to include malloc/free, but the problem is it didn't
include paging-based allocation APIs. It used a nonsense linear growth heap
design that doesn't match anything about how memory works on any platform and
has been obsolete for 20+ years.

This isn't a big deal in the hyper specific context that WASM is actually
targeting, but it _is_ a big deal if you're talking about shoving WASM in
places it never had any intention of being.

~~~
monocasa
> That's not a decision WASM can unanimously make on behalf of others.

I literally linked to the commit where V8 is indeed unanimously making that
decision. If you clear the indirect branch table when transitioning to and
from all untrusted programs, what's your threat model again?

> I get WASM doesn't want to include malloc/free, but the problem is it didn't
> include paging-based allocation APIs. It used a nonsense linear growth heap
> design that doesn't match anything about how memory works on any platform
> and has been obsolete for 20+ years.

Go read the WASM spec. It's explicitly is designed for multiple linear
regions, just how mmap works. It just didn't make it into the MVP.

~~~
kllrnohj
> I literally linked to the commit where V8 is indeed unanimously making that
> decision.

And for things where retpoline isn't needed it's unambiguously wrong (wrong in
that runtime is slower than it should be for no gain). What are you not
getting about that? They made a decision that when it's wrong it's still
_safe_ , but it's still going to be wrong for a non-trivial amount of users.

> Go read the WASM spec. [...] It just didn't make it into the MVP.

Yes, I know, that's why I said what I said. Which was correct and supported by
both the spec and your statement just now...?

We're not talking about hypothetical future WASM, we're talking about WASM as
it exists today and what problems it can/can't effectively handle.

------
devit
NULL pointing to valid memory seems like a serious flaw that can and should be
easily fixed.

~~~
DannyBee
There have been plenty of real environments where NULL pointed to valid memory
(AIX, for example)

It also has a reasonable use case ( or did, pre Spectre).

It allows you to hoist and speculate loads that may be null. Since it is valid
memory, instead of a fault, you would just get a result (usually zero on most
of these platforms) you would throw away if it was not needed.

------
kiriakasis
> While escaping the sandbox itself may be difficult to achieve, application
> security doesn’t benefit from the mitigations commonly found in traditional
> environments.

I feel this the most important point regarding use server-side.

------
kiriakasis
There is a yet in the original title.

