
When ‘int’ is the new ‘short’ - noondip
http://googleprojectzero.blogspot.com/2015/07/when-int-is-new-short.html
======
yason
I've never really liked the short/int/long definitions. They're just out there
to confuse you, especially if you've written your share of assembly.

For as long as I can remember I've just used size_t, uintptr_t,
uint32_t/int32_t (or any of the 16/32/64 variants), exactly because I want to
be explicit about the machine word sizes I'll be dealing with. Before that, I
always used similar (u32/i32, LONG/ULONG, ...) platform-specific typedefs on
proprietary systems too.

For all practical purposes, int/unsigned int has been at least 32 bits since
the early 90's (well, on modern platforms) but why use those if you can
explicitly declare how many bits you actually need.

(I've bumped into a few archaic platforms where stdint headers weren't present
but it's easy to just add a few build-specific typedefs somewhere in that
case.)

~~~
antimagic
Well I know historically it was because there were potentially performance
issues with using explicit sizes - int was basically the guarantee that you
would be working in the CPU's native size, and hence at it's most efficient,
no need to shift and mask bits to get the value that you wanted. Obviously
this leads straight to a bunch of relatively subtle bugs, but I guess for some
applications the tradeoff of speed vs safety was worth it.

~~~
Gibbon1
On 8 bit machines it's the reverse, an int being 16 bits minimum requires more
operations to handle than an 8 bit number. Pass an 'int' push two things on
the stack. Etc. Causes your code size to balloon noticeably.

~~~
revelation
Many 8bit processors such as AVR have instructions that work with 16 bit
numbers (stored in register pairs). So that's not the case.

~~~
kevin_thibedeau
That only applies to adds, subtracts, and register moves. 16-bit Booleans,
shifts/rotates, multiplies, and load/store still need to be done with multiple
instructions.

~~~
Gibbon1
I did a little mucking on some AVR code of mine. Sometimes going from an
uint8_t to a uint16_t saves a couple of bytes Sometimes adds a dozen.

One case changing an index in a for loop to an int, code went from 34024 bytes
to 34018 (saved four bytes). But changing uint8_t i, j, k; to uint16_t i, j,
k; code compiled to 34068 bytes, gain of 44 bytes.

------
pornel
Conversion to and from C's `int` is one place where I think Rust is failing
too :(

Rust allows silent truncation of values in numeric conversions and considers
it "safe", because other features for memory safety will catch buffer
overflows — but it doesn't care about cases where the program will do a
memory-safe logically-invalid thing (e.g. write to a wrong location within a
buffer).

That's because Rust has no integer size promotion at all, which means `len as
usize` and `len as c_int` are required _all over the place_ when interfacing
with C (and the `as` operator has no overflow checking by design).

~~~
aidenn0
I think integer size promotion would only make it worse.

I do agree that "as" should have range checking. Some of it can be done
statically (e.g. converting to a strictly larger type, and when the range of
possible values are known) and the rest can be done with minimal overhead if
they emit the code right.

~~~
pcwalton
It's not minimal overhead in many cases. Think about vectorization, for
example.

------
ctz
It's kind of bizarre that chrome goes to exceptional lengths to sandbox things
into lots of mutually untrusting processes, then parses network input outside
of that protection.

------
chetanahuja
Many many C/C++ programmers, even many of those employed by google, have a
natural tendency to use a old C types like short/int/long etc. for integral
types without thinking through X-platform issues or api interactions with
other code.

Also, size_t is a frustrating beast. It's meaning is dependent on the
platform. The Single Unix spec only calls for size_t to be an unsigned integer
type. Now imagine you're writing code to compile over multiple mobile
platforms as well as on x86_64 on the server side. Can you tell me what is the
largest number you can address with that type -- without getting into a long
google/stackoverflow session or hitting the compiler manuals for each of those
platforms? If you absolutely want to make sure that your type can handle the
values you expect it to handle, better give it well defined types provided by
stdint.h (uint32_t is sooo much better than just int or unsigned int or even
size_t for this purpose).

Now granted, you'd need to interact with external libraries (including
libc/libc++) that'll want to use size_t etc. Not much you can do here but be
very careful when passing data back and forth between your code and the
library code. But that's been the lot of C coders since time began.

~~~
unwind
I disagree.

All you need to care about for cases like these, when you're talking about the
size of something, is that both malloc() and new[] handle allocation size
using size_t.

That, to me, says pretty clearly that "the proper type to express the size, in
bytes, of something you're going to store in memory is size_t".

It can't be too small, since that would break the core allocation interfaces
which really doesn't seem likely.

You don't need to know how many bits are in size_t all that often, and
certainly not for the quoted code.

~~~
ww520
For cross platform interoperability, API with the exact size type helps remove
any ambiguity. Using size_t might be fine for intra-process usage, but as soon
as we are dealing with data across platforms, exact size type definition is a
must.

~~~
raverbashing
> Using size_t might be fine for intra-process usage, but as soon as we are
> dealing with data across platforms, exact size type definition is a must.

I don't know why you are downvoted, but this is very important.

Never send anything "on the wire" (or to a file) unless you know its exact
size and endianness.

~~~
sharpneli
That is correct. For file formats and packets etc you must use exact sizes.

However for cross platform support using size_t in an API (as in what is
exposed via .dll or .so) is a must. It's exactly the correct way to write
cross platform code.

------
nikic
I think the most interesting bit here is this:

> Now, the more astute reader will point out that I just sent over 4 gigabytes
> of data over the internet; and that this can’t really be all that
> interesting - but that argument is readily countered with gzip encoding,
> reducing the required data to a 4 megabyte payload.

This was pretty much my first thought on seeing the IOBuffer signature - "That
exploit payload is going to be huge". But things are not always as they seem
and using gzip to generate a large string on the client is something I had not
previously considered.

~~~
ceequof
You can blow up all sorts of things with gzip:
[https://en.wikipedia.org/wiki/Zip_bomb](https://en.wikipedia.org/wiki/Zip_bomb)

------
albinofrenchy
I don't entirely understand why 'int' is the new 'short' here; int hasn't been
a particularly good way to store sizes since C99.

Good spot though; I kind of doubt that this was a conscience design decision
and probably just a slip up.

~~~
dottrap
Agreed on both points. Plain int hasn't been very good for a long time. C99
<stdint.h> is really the way to go, but since this is C++ we're talking about,
both the C++ committee and Microsoft Visual Studio deserve most of the blame
for why people weren't using them since neither recognized/supported it for
the longest time. (Visual Studio just _finally_ got stdint and stdbool, about
15 years late.)

And agreed, good catch.

~~~
pbsd
Visual Studio has had stdint.h since its 2010 edition. Before that there were
readily-available substitutes (like
[https://code.google.com/p/msinttypes/](https://code.google.com/p/msinttypes/)),
or you could do it yourself by typedef'ing [unsigned] __intNN as [u]intNN_t.

~~~
KayEss
Somebody elsewhere pointed out to me that these will give types that are not
aliases of the common ones. I.e. __int8 isn't the same as either unsigned char
or char. Probably won't make a difference most places, but what does?

~~~
pbsd
That is also the case in usual implementations of stdint.h, where int8_t is
defined to be `signed char`. In C and C++, `char`, `signed char`, and
`unsigned char` are different types, and `char` is not guaranteed to be signed
or unsigned---that's up to the implementation.

EDIT: looking at the documentation, it appears that __int8 is supposed to
always be an alias for `char`, even as far back as 2003:
[https://msdn.microsoft.com/en-
us/library/29dh1w7z(v=vs.71).a...](https://msdn.microsoft.com/en-
us/library/29dh1w7z\(v=vs.71\).aspx). However, the workaround found in
msinttypes suggests that Visual Studio 6 does have this problem. I weep for
those still using it.

------
dottrap

      Now; on x86_64, with gcc and clang an int is still a 32-bit integer type;
    

Minor nit. The size of int is typically defined by the platform ; the compiler
follows along. All the major/popular ones happen to define int as 32-bit, so
that's what you are seeing with gcc/clang. Maybe on Solaris you might see it
as 64-bit.

------
stinos
Can anyone shed some light on the design rationale for using integer types
(most commonly _int_ it seems, as here) followed by a check if the number is
not negative, whereas one could just use an unsigned type right away?

~~~
hthh
I disagree with it, but the Google C++ Style Guide offers the rationale you're
looking for (under "On Unsigned Integers"): [https://google-
styleguide.googlecode.com/svn/trunk/cppguide....](https://google-
styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types)

~~~
raverbashing
Yeah this doesn't convince me

The bug example there, sure, needs a signed type, so you can't blame the type
for its wrong usage

I've had more bugs coming from using signed types that I'll not be bothered by
writing 'unsigned' ever again

~~~
detrino
I disagree that a signed type is the correct solution to that, rather you need
a while loop:

    
    
      std::size_t i = foo.size();
      while (i != 0) {
        --i;
        ...
      }

~~~
gohrt
The point is that C int types are an unsafe mess, so it's better to have one
simple rule than memorize all the corner cases and address them all the time.

~~~
detrino
I am addressing the specific example given by the Google style guide. The code
for counting down using a signed integer as they do has more corner cases than
the while loop I have shown. Using their way, you have to remember to subtract
1 from the size at the start and then use >= in the loop conditional. My way
is just the inverse of what you do while counting up.

It's also worth pointing out the style of loop they give can't be used at all
if you are counting down iterators or pointers instead of numbers.

------
jhallenworld
I've recently tried to fix JOE (a UNIX portable program) so that it will
compile without warnings and with minimum casting with -Wconversion. This is
what I've found:

Chars: I hate it that they are signed because I like the convention of
promoting them to int, and then using -1 as an error. It's easy to forget to
convert to unsigned first, and the compiler will not complain. In the past
I've used 'unsigned char' everywhere, but it's a mess because strings are
chars and all library functions expect chars. My new strategy is to use 256 as
the error code instead of -1. The only problem is that getchar() uses -1, so
it's weird. IMHO, it's a C-standard mistake that char is signed.

I used to use int for indexes and long for file offsets. But these days, int
is too short on 64-bit systems and long is not large enough on 32-bit systems.

ptrdiff_t is the new int. I've switched to ptrdiff_t in place of int and off_t
in place of long. Ptrdiff_t is correct on every system except maybe 16-bit MS-
DOS (where it's 32-bits, but I think it should be 16-bits). Off_t is a long
long if you have '#define _FILE_OFFSET_BITS 64'. Ptrdiff_t is ugly and is
defined in an odd include file: stddef.h. It's not used much by the C library.

The C library likes to use size_t and ssize_t. The definition of ssize_t is
just crazy (it should just be the signed version of size_t, but it isn't).

I understand why size_t is unsigned, but I kind of wish it was signed. It's
rare that you have items larger than 2^(word size - 1), so signed is OK. You
are guaranteed to have -Wconversion warnings if you use size_t, because
ptrdiff_t is signed (even if you don't use ptrdiff_t, you still get a signed
result to pointer differences so you will have warnings). Anyway, to limit the
damage I make versions of malloc, strlen and sizeof which return or take
ptrdiff_t. They complain if the result is ever negative. Yes this is weird,
but I think it's better than having many explicit casts to fix warnings. Casts
are always dangerous.

~~~
david-given
chars are only signed on some platforms (x86, for one). On others they're
unsigned (ARM, for one).

One knockon effect of this is that strcmp() will return different values on
the two different platforms for UTF-8 strings (because 0xff > 32, but -1 <
32)...

Incidentally, I don't know if you know about intptr_t; it's an int large
enough to put a pointer in losslessly. It's dead handy. (My current project
involves a system with 16-bit ints, 32-bit long longs, and 20-bit pointers...)

~~~
jhallenworld
I did not know that chars were unsigned on arm- interesting.

I try to be conservative with the definitions I use, so I'm worried that
intptr_t might be too new.

------
pixelbeat
In my experience we can't rely on manual handling of these integer overflow
issues, especially with changing compiler behavior over time.

I've noted some compile time and run time checking options at:

[http://www.pixelbeat.org/programming/gcc/integer_overflow.ht...](http://www.pixelbeat.org/programming/gcc/integer_overflow.html)

------
mohawk
A signed type for a buffer size in Chrome. Sigh, C/C++ just wasn't made for
humans.

~~~
zvrba
Signed types are the sane ones and give you error checking possibilities
(negative size doesn't make sense; huge positive size may be right or an
error). Unsigned types break trivial math like x < y => x-1 < y-1.

~~~
mattdw
Do signed types not also "break trivial math" like that, just at a different
boundary? Genuine question. (The 0 boundary is obviously going to be more
commonly hit than the 2^32 boundary, but nonetheless.)

~~~
mohawk
Signed types have the same "wraparound" problem, i think the OP meant that
they don't have this problem at the zero boundary.

~~~
mikeash
Signed types have a worse problem. They _typically_ wrap around, but the
behavior is undefined. That means the compiler can assume it never happens and
optimize your code accordingly, which can lead to all sorts of entertaining
misbehavior.

------
fugyk
How can they hide the bug in an open source project after fixing it. Isn't
commit public?

------
imaginenore
How exactly are you going to serve a 4GB certificate? Who in the world is
going to wait for that to load?

~~~
umanwizard
The article mentions that it can be compressed to 4 MB...

