
Bounded Integer: Header-only C++ library replaces integers, adds explicit bounds - olvy0
https://bitbucket.org/davidstone/bounded_integer/src/master/
======
OskarS
At this point, when I see libraries like this all I can think is ”oh good, my
compile times aren’t long enough, lets make EVERY INTEGER a template”.

Also, while I can reasonably beliveve that most of the overhead goes away at
-O2 or -O3, this has to just trash performance for debug builds, which is not
unimportant.

~~~
banachtarski
I don't think you even looked at the library, but from I can tell, debug perf
would not be impacted at all.

~~~
OskarS
Of course it will be impacted. Libraries like these are "zero-cost" only
because of very aggressive inlining (not generally available in debug builds),
otherwise all its operations happen in subroutines. Operator overloading is
the most obvious example: for regular integers, addition is basically a single
instruction even in debug, but with a library like this, it has to make a
subroutine call. A call which has who knows how many layers of abstraction
that compiler optimizations hasn't shaken off.

In fact, the README says as much: it has only zero time/space overhead
"assuming basic compiler optimizations like inlining".

And I did actually look at the code, but at a glance I couldn't get much sense
of it. The main header imports like two dozen other headers, and the "detail"
folder is filled to the brim with more headers. I tried cloning and using it,
but I couldn't get it to compile (which is probably mostly my fault, but I
really didn't want to spend too much time on it). I did run just the
preprocessor though, and "#include <bounded/integer.hpp>" expanded to 48000
lines. 48000 extra lines to compile in order to use integers. For _every file_
you use it in.

I don't want to be too harsh here, I'm sure it's an excellent library and it
does what it says brilliantly, and if you need bounded integers I'm sure it's
awesome. But language like "[integers in C++] are mostly unusable" rankles a
bit, and the the implication in this file is that this is something you should
regularly use instead of integer types.

I personally feel that libraries like this are taking C++ in the wrong
direction ("ranges" is another obvious example) and making the language less
and less usable in my profession.

~~~
banachtarski
> but with a library like this, it has to make a subroutine call.

I mean, before I wrote my comment I checked and it's `constexpr` all the way
down to the add instruction so if it's going to make a call, I'm not seeing
it. There is definitely a lot of template machinery, but I can't be the judge
of that immediately.

> I personally feel that libraries like this are taking C++ in the wrong
> direction ("ranges" is another obvious example)

I think you're making an emotional argument based on some preconceived
twitter-verse sentiment. I'm in games too (graphics specifically) and people
are up and arms about the wrong thing. Maybe there are corner cases that are a
bit overly complex, but what's wrong with being able to write
`std::sort(container)` instead of `std::sort(container.begin, container.end)`.
There may be things we don't like, but I don't think we should resort to
hyperbole either.

Incidentally, I wouldn't use this library if only because for what I do,
having explicitly sized data types is important.

~~~
OskarS
constexpr functions are not inlined in debug builds. They can't be: the whole
point of debug builds is that you can attach a debugger and step through the
code.

Illustration, compare the assembly for "foo1" and "foo2":
[https://godbolt.org/z/t9Zkx-](https://godbolt.org/z/t9Zkx-)

~~~
banachtarski
Ah ok, my mistake, I read "debug" as "-g" not "-O0." I generally run with
optimizations on but with symbols.

------
Animats
Oh, that's nice. I wanted that decades ago, when I was working on program
verification. Templates have made a lot of progress if this can be done
entirely in C++ templates. I once wrote, but never published, "Type Integer
Considered Harmful", back when there were still 16-bit integers in most C
programs. I wanted ranges on everything, like Ada. As a practical matter,
integer overflow became less of an issue with 32-bit.

Sizes of intermediates are a big issue. When you write

    
    
        int_32 a,b,c,n;
        ...
        n = (a*b)*c;  
    

how big is each part? My thinking on this was that it's the compiler's job to
prevent overflow in intermediate values where the final result will not
overflow. So, above, you'd have to compute (m * n) as a 64-bit product, do a
64-bit divide, and only then check that the result fit in n.

is legal to compute in 32-bit, but requires overflow checking on the
intermediates. If an overflow occurs, there will be an overflow in the result.
(Although, the case where some values are zero is an issue. Suppose a * b
overflows but c is zero so it doesn't matter. That's probably an error.)

Sometimes you have to use larger sized intermediates. For

    
    
        int_32 m,n,p;
        ...
        n = (m * n) / p;
    

how big is each part? Above, you'd have to compute (m * n) as a 64-bit
product, do a 64-bit divide, and only then check that the result fit in n.

To do this right, you need something in the compiler that can do basic
reasoning about machine arithmetic. Something that knows, for example, that

    
    
        uint_16 n;
        ...
        n = (n + 1) % 65536;
    

cannot really overflow and can be optimized down to a plain unsigned 16-bit
add.

If you try to to this through linguistic type analysis only, it's not going to
be satisfactory. You need to be able to prove out inequalities.

~~~
Animats
Botched the editing near the examples.

Sizes of intermediates are a big issue. My thinking on this was that it's the
compiler's job to prevent overflow in intermediate values where the final
result will not overflow.

When you write

    
    
        int_32 a,b,c,n;
        ...
        n = (a*b)*c;  
    

that's legal to compute in 32-bit, but requires overflow checking on the
intermediates. If an overflow occurs, there will be an overflow in the result.
(Although, the case where some values are zero is an issue. Suppose a * b
overflows but c is zero so it doesn't matter. That's probably an error.)

Sometimes you have to use larger sized intermediates. For

    
    
        int_32 m,n,p;
        ...
        n = (m * n) / p;
    

how big is each part? Above, you'd have to compute (m * n) as a 64-bit
product, do a 64-bit divide, and only then check that the result fit in n.

------
TazeTSchnitzel
Maybe Ada's approach of every number type requiring explicit bounds is a good
one.

~~~
ken
Or Lisp’s approach of just using ”big” ints by default (which is practically
the same but with an implicit default of “infinity”). And trust the compiler
to use fixints internally when the programmer has declared it safe to do so.

~~~
electrograv
I just finished writing a custom FIFO allocator for an _extremely_ high
bandwidth and low latency data processing (and UI) system written in C/C++
(even std::deque was doing WAY too many heap allocations, not to mention the
allocations within each object passing through the system, despite use of move
semantics to minimize redundancy).

Performance improved by _100x - 1000x_. And it was already blazingly fast
before, if measured against performance standards we’ve become accustomed to
from JavaScript and other GC languages.

In high performance systems (where the benefits of C/C++/etc. outweigh the
downsides), dynamic allocations always come back to bite you.

If you rely on them too heavily from the start (and don’t plan for custom
allocation schemes in the future), you can even get into bad situations where
it’s infeasible to refactor to custom memory management (without a total
rewrite), when you later need the performance gain.

Incorporating even the _possibility_ of heap allocations into a language’s
most fundamental data types will doom that language to being relegated to
performance-insensitive and latency-insensitive tasks, if only because it
requires that a heap exist (whereas C, Rust, etc can run on embedded real-time
systems with no heap).

And that’s okay! It’s good that we have languages for that. But
C/C++/Rust/etc. are definitely not where you can tolerate such a thing in the
core language.

~~~
ncmncm
If you are doing high-throughput, and you ever allocate anything after
startup, you are Doing It Wrong. Any brand of FIFO loses.

What you need is a big-ass ring buffer, mmapped on a hugetlbfs, fed by a
process on a NOHZ isolcpu core. Readers are separate processes.

~~~
CoolGuySteve
No. Thread local/core-specific FIFO is more cache efficient than a ring buffer
because the address about to be allocated is significantly more likely to be
in a high level cache.

With a ring buffer, you're constantly cycling out to L3 or worse and hoping
the prefetcher figures out what you intend. It's basically a LIFO allocator.

Even if you want to use a separate processing core, you get better latency
using a FIFO allocator between 2 threads on the same core complex and the code
is simpler, reducing instruction fetch overhead.

Frankly, I see architectures like yours all the time from firms like Hudson
River Trading and I think they suck. They incur tons of overhead, the process
separation adds a useless layer of abstraction that's annoying to transcend,
and you end up with this useless message protocol between cores that invokes
tons of copies and breaks compiler inlining features.

~~~
ncmncm
Ring buffers have a strictly sequential access pattern, which prefetchers are
specifically optimized for.

Anybody using a "message protocol" with ring buffers, or doing copies out of
them, is Doing It Wrong. I routinely get 10x performance by doing away with
FIFOs and buffer allocation and freeing.

Process separation means you can start and stop readers independently of any
other activity.

------
ur-whale
It takes quite some amount of stamina to keep on reading after "The built-in
integer types in C++ (int, unsigned, long long, etc.) are mostly unusable"

~~~
DubiousPusher
I can see why you'd say that that way but the author probably feels that way
because of the domain they work in. I've encountered research types who just
use bigint everywhere because it's safer and they don't worry about the perf.

When I was coming up as a game programmer no one would use boost shared_ptr
and weak_ptr because of perceived issues with it. Everyone rolled their own
handle system.

In fact there were a lot of people that considered languages beside C/C++
untouchable due to perf. They couldn't fathom there were domains where that
perf just wasn't that important.

I now work a lot with Unity and find people feel the same about C# foreach
because it used to trigger an allocation.

You'll find people often make the mistake of thinking their domain specific
problem is a problem for everyone.

------
leni536
> bounded::integer uses built-in integers as the template parameter to
> determine its bounds. This means that it cannot store an integer larger than
> an unsigned 64-bit or unsigned 128-bit integer (depending on the platform)
> or smaller than a signed 64-bit or signed 128-bit integer. This restriction
> should be removed at some point in the future.

So when an overflow happens here I assume it's a compile time error (UB due to
integer overflow in constexpr context must be diagnosed by the compiler). A
handful of multiplications can get you there easily.

~~~
asveikau
I am not aware of any compiler or platform where int is 128 bits. The most
common these days, even on 64 bit platforms, is 32 bits, which they don't even
mention.

I am not sure why they chose to speculate about sizeof(int) in this way, vs.
expressing the limit in terms of something more concrete.

~~~
SomeOldThrow
How are you envisioning this library being used?

~~~
asveikau
What?

I am saying the author did not express the limitation well, and instead made
it sound like they don't understand sizeof(int). The explanation that leni536
offered makes a lot more sense, so maybe they should have put it that way
instead of using an easily misunderstood term as "built-in integer" to mean
"one of several integer types that we may choose based on some ifdef"

~~~
SomeOldThrow
Why did you arrive at this confusion?

Why would you assume your observations are relevant to the use case?

~~~
asveikau
Did you read this part:

> bounded::integer uses built-in integers as the template parameter to
> determine its bounds. This means that it cannot store an integer larger than
> ...

They say the limitation arises from using "built-in integers" as template
parameters. It is reasonable to assume "built-in integer" means "int". It is
more a stretch to assume "built-in integer" means any number of types
depending on the evaluation of several ifdefs [which is what I found in the
source].

There is nothing about a "use case" relevant to any of that quoted statement,
so I find your reply very confusing too.

~~~
SomeOldThrow
> It is reasonable to assume "built-in integer" means "int".

This strikes me as a little nuts for C but it definitely explains this thread.
I just assumed people had already switched over to e.g. sint32/size_t to avoid
this problem a long time ago. I couldn’t tell if this was a legitimate
complaint or someone’s language lawyering tendencies gone too far.

~~~
asveikau
First of all, we're not talking C. This is a C++ template parameter. The
feature doesn't exist in C.

Second, it is not "nuts" to use int, I would argue it's quite a bit more crazy
to use rarely-supported 128-bit quantities when you don't need them. Your
suggestion of "sint32" (this is not a standardized typedef, you mean int32_t
perhaps) would _be the same as int_ on most compilers today, and size_t is
likewise _very often_ 32 bits. Your suggestion is basically a no-op in a lot
of places. Domain-specific areas like file formats or network protocols are a
different story, as those have to be explicit about specifying sizes.

Third:

> someone’s language lawyering tendencies

I suggest you avoid passive-aggressive communication style and just say "you"
instead of "someone". If you're going to criticize or give feedback, be
direct. Don't vaguely tell me that "somebody" "might" have a problem.

Best regards to you, friend.

~~~
SomeOldThrow
Hey I was just trying to figure out if you were bringing up anything of value
:)

------
ssalazar
> The built-in integer types in C++ (int, unsigned, long long, etc.) are
> mostly unusable

Weird. Ive been programming in C/C++ professionally for over a decade and have
been using built-in integer types mostly without issue. Might want to qualify
your hyperbole- is it worth reading on?

------
golemotron
> The built-in integer types in C++ (int, unsigned, long long, etc.) are
> mostly unusable because of the lax requirements on bounds.

This must be some new definition of the word 'unusable.'

~~~
vbezhenar
I agree with that statement. How can you use type when you don't know range of
values. I want to store year. From 1850 to 2050. What type should I choose? I
want to store colour value: from 0 to 2^24-1. What type should I choose?

~~~
ori_b
int, long. The standard specifies their minimum sizes.

~~~
anilakar
X86 has also had saturation arithmetic since MMX was introduced.

~~~
pjc50
.... However, unless you use non portable intrinsics, you can't make the
compiler use MMX. So in practice you get wrap on overflow.

Worse, the compiler is entitled to use saturation arithmetic if you've assumed
wrapping.

~~~
jstanley
> Worse, the compiler is entitled to use saturation arithmetic if you've
> assumed wrapping.

C89 (draft[1], because the actual spec is not public) section 3.1.2.5 states:

> ... a result that cannot be represented by the resulting unsigned integer
> type is reduced modulo the number that is one greater than the largest value
> that can be represented by the resulting unsigned integer type.

So, unless I'm misunderstanding, the compiler is not entitled to use
saturation arithmetic on unsigned integers.

[1]
[http://port70.net/~nsz/c/c89/c89-draft.html](http://port70.net/~nsz/c/c89/c89-draft.html)

~~~
ncmncm
signed and unsigned have different semantics.

The compiler is allowed to do literally anything on signed overflow, including
launching the missiles, and not running the program. Or both.

------
DubiousPusher
Because of the field I work in I feel this much more keenly with floats. I
have been bitten by their excentricities so many times I'd just much rather
have a flexible fixed point type that let me choose how much precision I want
on either side of the decimal for a given set of calculations.

~~~
michaelcampbell
Doesn't COBOL do this?

------
duneroadrunner
At first glance it seems to be similar to boost::safe_numerics. Are there
significant differences?

[https://github.com/boostorg/safe_numerics](https://github.com/boostorg/safe_numerics)

~~~
proverbialbunny
Both libraries do the same thing. Boost lets you set your exception policy and
I'm unsure if this Bounded Integer library gives the same level of control.

~~~
davidstone
The bounded::integer types accept three template parameters: `integer<min,
max, overflow_policy>`. That final parameter can be bounded::throw_policy,
which is itself templated on the exception type thrown. The default exception
policy is "overflow is undefined behavior". The other two policies supported
out of the box are wrapping / modulo and saturation / clamping.

------
tom_mellior
> you don't pay for what you don't use

> throwing an exception on overflow or clamping the value to the minimum or
> maximum are also possible by use of template policies (and those particular
> use cases are already built in to the library)

> Never perform a run-time check when a static check would work instead

... but _sometimes_ do run-time checks?

> Have no space or time overhead

This is a very confusing set of claims. You can't have all of these at the
same time. Dynamic checks will be needed more often than you'd think, even in
cases that "we" can tell won't overflow, so you'll definitely pay for stuff
that you use even though with a stronger system you wouldn't need to use it.

For example, I'm wondering if this system can eliminate dynamic checks in
cases like these:

    
    
        int sum = 0;
        for (int i = 0; i < 10; i++) {
            sum += i;
        }
    

After this loop sum's type should be integer<55, 55>, but I'd bet (without
having tried) that with this actual system it's inferred to integer<0,
infinity> and you'd get dynamic checks. Unless you use the "null policy", in
which case you don't need this library at all.

There is definitely a case to be made for a system that tries to trap
overflows but also tries to use static analysis to be smart about where to put
the dynamic checks. But I don't think this system is that.

------
anon91831837
Reminds me of Pascal having dependent types like

    
    
        var
          age: 18 ... 130;

------
mister_hn
what about support for older gcc (say 6), older clang (say 5) and old MSVC
(say 2015/17)?

~~~
davidstone
Only very old versions of the library are supported by those older compilers.
My library is targeting the C++20 standard, so currently the only compile
version that can compile it is clang with the concepts branch.

------
FpUser
Ouch

