
Warning: Implicit Backdoor - ssklash
https://flak.tedunangst.com/post/warning-implicit-backdoor
======
comex
For those wondering how this works:

In C, like most languages, all binary operators on numbers – including
arithmetic as well as comparisons – 'natively' require two operands of the
same type. When you pass operands of different types, the language usually
converts one of the operands to the other's type. (Sometimes it converts both
operands to a different type instead.)

In this case, without the cast, the comparison will be performed by converting
the `int` to `size_t`. The dangerous case is when `num` is negative. Since
`size_t` is unsigned, the conversion will turn negative numbers into very high
positive numbers, so `num > limit` will be true, and the function will safely
reject the allocation. On the other hand, with `num > (int)limit`, no
conversion occurs; the comparison is performed between two signed integers,
and `num > limit` will be false if `num` is negative.

It would be better to be more explicit. Perhaps use two comparisons (the
compiler will optimize them into one if possible):

    
    
        if (num < 0 || num > limit)
    

Or, better yet, make `allocatebufs` take `size_t` instead of `int` in the
first place. Taking `int` is risky even without the comparison bug: if someone
calls the function with a `size_t` on a platform where `size_t` is larger than
`int` (i.e. 64-bit platforms), the upper bits will be truncated. For instance,
if the passed-in value is 0x100000001, it will be truncated to 0x1, causing
allocatebufs to return a much smaller buffer than expected.

If you're wondering about the exact rules for what types get converted to
what, they're very complicated, unnecessarily so. It mostly boils down to
"larger type wins; in case of a tie, unsigned wins over signed". But there are
exceptions. Here's the spec language:

\- [http://c0x.coding-guidelines.com/6.3.1.8.html](http://c0x.coding-
guidelines.com/6.3.1.8.html)

\- [http://c0x.coding-guidelines.com/6.3.1.1.html](http://c0x.coding-
guidelines.com/6.3.1.1.html)

~~~
thegeomaster
I still don't understand what the security issue is. In case of num < 0,
malloc will be called with a negative number, which will be converted to
size_t, resulting in a huge number which will most likely fail to allocate,
returning NULL. Even if it doesn't, I don't see any way to exploit this
further. What am I missing?

~~~
bennofs
The negative number is multiplied by 64, so it can then underflow and become
positive again.

~~~
a3_nm
Ah interesting... but then what could happen? Is the risk that the program
could try allocating too much memory and crash? Calling this a "backdoor"
seems excessive, no? -- at worst it's just a denial of service.

~~~
bennofs
If you take (1 << 63) + 1 as size (which is a large negative int64), multiply
that by 64 and you get 64. So it will call malloc(64) which will succeed. If
other parts do the correct unsigned comparison then you have a heap overflow.

~~~
firebacon
This makes sense, but it also implies that there is no actual backdoor in TFA.
The code as shown in the article is not exploitable without assuming more
(actually exploitable) bugs somwhere else in the callsite (which the article
doesn't mention). Or maybe we haven't figured out the actual vulnerability
yet...

~~~
bennofs
Well, the rest of the code is also correct on its own though. If I tell the
function to allocate X bytes, it should either fail or return a buffer that
has space for X bytes. If the function returns less than X bytes if X happens
to be a large unsigned int (that is a negative value if interpreted signed)
then that is a bug and is exploitable.

~~~
firebacon
But that is not what happens in the code shown in TFA.

Passing a large value into the method shown in the article will do nothing
nefarious (assuming sizeof(size_t) >= sizeof(int)). It will either return a
large allocation, or, more likely, fail because the amount of requested memory
is too large.

If you have a narrowing/casting bug _somewhere else_ in your program, which
BTW would produce an obvious warning, that would of course cause trouble as
you have described.

And while mixing signed and unsigned arithmetic for buffer sizes is, of
course, a recipe for desaster, I think it's incorrect to claim that the
allocatebufs method shown in TFA has a "backdoor" because of this. I feel that
is a bit like saying memcpy has a backdoor because you might get your pointer
arithmetic wrong when calling it.

------
nayuki
In my opinion, the interactions between unsigned integers and signed integers
are too hard for a programmer like myself to handle correctly in practice,
especially being consistently vigilant all the time.

I admit that I didn't understand how the code in the article worked (i.e. why
the unsigned check was correct and why the signed check was incorrect) until I
read the answer in this comment thread.

I like Java's design where all integer types (except char) are signed, and all
arithmetic is done in signed mode. It simplifies the mental model and reduces
the opportunities for logic errors.

Signed integers are definitely useful sometimes (e.g. difference of two
unsigned indexes). A wide-enough signed type (e.g. int32) can hold all values
of a narrower unsigned type (e.g. uint16). So it is possible to design a
language without unsigned integer types without losing functionality.

~~~
marcosdumay
My current position is that a high level language should have either explicit
numeric conversions with very strong typing (like Haskell), or a single
integral type (like Python). Anything else is unecesary complexity that won't
even bring any sizeable gains. Strong types with implicit conversions (like
SQL) isn't too bad, but Java's system is.

But C isn't a high level language, so none of the above applies.

~~~
saagarjha
> C isn't a high level language

Depends on your viewpoint, of course :)

------
5-
This was potentially in response to this event, posted to HN earlier today:
[https://news.ycombinator.com/item?id=20874470](https://news.ycombinator.com/item?id=20874470)
(a Spectre Linux kernel patch rendered ineffective by LTS maintainers in an
attempt to silence a compiler warning).

The scenario described in this submission was the first thing that came to my
mind upon reading about that patch.

------
windsurfer
To me, as someone that doesn't use C in their day to day life, C appears to be
a security nightmare. I cannot imagine using it in a production environment
that takes any amount of user input.

~~~
carapace
Yep. It's _insane_ to use tools/languages susceptible to problems like these
in 2019.

~~~
chupa-chups
To underscore your general point: You're totally right. But... a but:

Implicit type conversions in C are cumbersome. In javascript for example, they
are _way_ more. If javascript had a similar role (i.e. bare metal code), the
world would be _way_ messier than it already is.

Languages without strict type checking are in general open to problems like
this, more or less depending on the leniency to check potential type error. C,
being a bare-metal language is especially ugly since it allows to cast
anything to anything else, granted - but is is spottable, reviewable and, for
new code, you won't get away with ugly casts. In javascript you don't even see
casts, they just happen.

Still, javascript doesn't have an unsigned integer, so _this_ attack vector
would not work.

~~~
tsimionescu
More to the point, memory-safe languages _cannot have_ these types of
vulnerabilities regardless of other aspects of the type system.

~~~
saagarjha
That's assuming, of course, that the runtimes for those languages are correct:
as numerous web browser exploits have taught us, this isn't always the case ;)

------
pilif
I’m quite sure this is a response to
[https://news.ycombinator.com/item?id=20874470](https://news.ycombinator.com/item?id=20874470)
submitted earlier today. I don’t believe there was any malicious intent there,
but it’s yet another safety issue in C that anybody tells me they are aware
of, that it would never happen to them and that memory safety issues are
things that only happen to other less experienced programmers.

I must only know the cream of the crop of C programmers /s

~~~
mirimir
Thanks. I was wondering what he was ~pointing at.

------
SketchySeaBeast
As someone who doesn't understand the language's nuances - what does adding a
cast to a value being assigned in the function open? Will it return NULL more
often?

~~~
jchw
Sign confusion: malloc takes size_t, so negative numbers are just really large
numbers to malloc. A negative int passes the check after adding the cast,
because we’re now comparing with signs and now a negative number is smaller
than a positive one, so malloc will be called with the larger value. Since it
gets multiplied you will also overflow and get pretty much any size. There’s a
DoS opportunity at least.

The correct fix is to cast the input to unsigned, I think.

~~~
Sniffnoy
That doesn't make sense -- how would a negative value of num be greater than
the positive value that is (int)limit? It must be something else...

Edit: Oh, I see what I missed -- I was thinking of the danger as _directly_
passing too large a number to malloc, when actually the danger is passing a
negative number, which then gets implicitly converged to a large positive
number. Oops.

~~~
pavon
Because C's implicit integer promotion rules are horrible. If you are
operating on two signed types, or two unsigned types, it does what you would
expect and converts both to the type with greater rank (essentially more
bits). Likewise, if you have one signed type and one unsigned type and the
signed type has greater rank then it does the sensible thing and converts them
both to the signed type.

However, if you have one signed type and one unsigned type, and the unsigned
type has greater or equal rank, then it converts to the unsigned type, despite
the fact that the unsigned type can't represent any of the negative values of
the signed type, so they get wrapped!

This is the cause of so many bugs.

~~~
wahern
But the irony is that the [immediate] bug in this case was adding an explicit
cast, which is _also_ _exactly_ what people would have done in languages that
didn't implicitly promote to unsigned.

The cleverness of this implicit backdoor hypothetical is that it would have
worked substantially the same way in most other low-level languages, including
Rust. Modern C compilers literally complain in the same way as stricter
languages, which had the effect of inducing a poorly considered cast.

~~~
pavon
Yeah, it frustrates me that both Go and Rust have decided to require casting
for all numeric conversions, whether there is a chance of information loss or
not. This causes casting to proliferate, and you grow numb to it. If lossless
conversions were implicit, then you would only need casts in potentially lossy
situations so they would stand out more, and hopefully make you think harder
about correctness.

------
aasasd
I feel that it says something about the problem area when over forty comments
in the thread are dedicated to the discussion of how exactly the five lines of
code work.

~~~
3pt14159
Half of the problem is programmers putting their head in the sand. Half of the
problem is lack of real leadership from people that know better.

Computers are not securable. Not from these types of attacks, anyway. State
intelligence agencies are working overtime trying to stop the fires from
spreading, but at the end of the day we've had Petya, NotPetya, and WannaCry.
We have Schneier with a book called "Click Here to Kill Everyone" and we still
have computers running everything.

Ok fine. But one day we're going to get an event and people are going to say:
"Oh the humanity, how did this happen? How could we have known?!" I don't know
what it is going to be, but I know it's going to be just like 2008 all over
again. Industry insiders know the score. If it does happen, my deepest hope is
that it ends up being strictly financial, but my gut says that it will
probably be cyberphysical. The financial sector has had decades to harden
their systems.

~~~
adrianN
> State intelligence agencies are working overtime trying to stop the fires
> from spreading

I think you're mistaken. Intelligence agencies do the opposite by hoarding
exploits and purposely weakening security standards.

------
twiceaday
The code is 'totally safe and secure' in a nasty, obfuscated way. It takes too
much language / system knowledge to tell that negative numbers are covered.
Plus, here isn't an extra block whose purpose is confusing that is responsible
for negative numbers. Instead the behavior is hidden as part of another block
that has a simple face-value interpretation. This is (1) fragile but (2) not
self-aware that it's fragile. If you don't refactor to avoid fragility at
least leave a comment. Or how about unit tests? If you don't question the
motives of the author you should at least question their coding ability.

------
nickpsecurity
Obligatory mention of early attempt to really understand and address
subversion as a threat plus a follow-up by pioneers that built on that:

[http://seclab.cs.ucdavis.edu/projects/history/papers/myer80....](http://seclab.cs.ucdavis.edu/projects/history/papers/myer80.pdf)

[https://apps.dtic.mil/dtic/tr/fulltext/u2/a435312.pdf](https://apps.dtic.mil/dtic/tr/fulltext/u2/a435312.pdf)

The current state-of-the-art in both formal methods, automated analyses, and
test generators can get us pretty close to their goals with such designs, too.
Way less cost-prohibitive than it used to be. Although, high-assurance
security today has moved far away from security kernels they advocated to
focus on securing hardware/software and distributed systems in ways that allow
expressing more flexible and useful policies. The design and assurance methods
of the past still work, though. Stuff like Rust, SPARK Ada, RV-Match, Why3,
and CompCert handle more code-level problems than older methods, too.

------
gyanchawdhary
Or just introduce a format string vulnerability by removing the format
specifier from printf et all functions ... or an off by one or double free bug
if you want to get more genuine

------
jancsika
How about this in stdlib.h:

    
    
      void *malloc(signed long long int untrusted_size) {
          size_t size;
          if (untrusted_size < 0) {
              fprintf(stderr, "warning: exploit "
                  "detected. Click <ok> to cancel");
          }
          size = (size_t)untrusted_size;
          etc.
      }
    
    

Or better yet redefine malloc with a wrapper and include the __line__ and
__file__ in the exploit error.

Edit: markdown

~~~
pornel
The hack depends on numeric overflow in the caller that flips negative number
into positive, so malloc can't possibly know it got the wrong size.

It's even funnier that signed overflow is Undefined Behavior in C, so the
compiler is allowed to assume it can never happen (and thus let it happen, and
even remove non-kosher overflow checks).

~~~
jancsika
> The hack depends on numeric overflow in the caller that flips negative
> number into positive, so malloc can't possibly know it got the wrong size.

Look again at the hack-- Ted is sending a _negative_ number to malloc. If I
change malloc's interface to accept signed numbers, then I can check inside
the definition for negative numbers and report to the user that something bad
has happened.

------
munk-a
Is this bug exposed only on systems with 1 byte words? I'm struggling a bit to
understand the exposed bug and, while I understand the whole point is that
there are lots of issues of this kind that can hide in code pretty innocently
it'd be nice to actually comprehend this one.

~~~
jstimpfle
You can pass a negative number such that num * 64 will underflow and result in
a large positive number. (Technically it's UB, but in practice it will
probably underflow).

Or as someone else mentions, even without underflow, it will be converted to
unsigned since malloc takes a size_t.

------
crisnoble
I wonder if you could trick an auto linter git hook to do the malicious commit
for you.

------
baby
I’d be interested to see such an example with a more secure language like
Rust.

~~~
pornel
This specific case simply doesn't compile:

    
    
        if num > limit {}
                 ^^^^^ expected i32, found usize
    

The whole trick isn't really translatable to idiomatic Rust, because `malloc`
is not used beyond specific C interoperability needs. If you want to store
`num` elements, you use `Vec::with_capacity(num)` or such, instead of
allocating the hard way.

------
draw_down
The point is not really about the specific mechanism shown in the post, but
more about plausible deniability and how a malicious actor might introduce a
vulnerability like this in a way that makes their innocence seem
unimpeachable.

