
Swift: Madness of Generic Integer - krzyzanowskim
http://blog.krzyzanowskim.com/2015/03/01/swift_madness_of_generic_integer
======
gilgoomesh
To summarize, the author has two problems:

1\. Bit shift is not part of the IntegerType protocol, when it should be
(although the author could avoid the issue by accumulating the bytes in a
UIntMax instead of the generic type).

2\. Construction from (and conversion to) a UIntMax bit pattern is not part of
the IntegerType protocol, when it should be (done correctly, this addresses
the author's sign _and_ construction complaints)

The author incorrectly claims/implies that these are problems with generics or
protocols or the diversity of integer types in Swift. They're really a problem
of omissions in the standard library protocols that are forcing some very
cumbersome workarounds. The necessary functionality exists, it just isn't part
of the relevant protocols. Submit these as bugs to Apple.

Edit:

As a follow up, here's a version that gets around the standard library
limitations using an unsafe pointer...

    
    
      func integerWithBytes<T: IntegerType>(bytes:[UInt8]) -> T? {
        if (bytes.count < sizeof(T)) {
            return nil
        }
        var i:UIntMax = 0
        for (var j = 0; j < sizeof(T); j++) {
            i = i | (UIntMax(bytes[j]) << UIntMax(j * 8))
        }
        return withUnsafePointer(&i) { ip -> T in
            return UnsafePointer<T>(ip).memory
        }
      }
    

Of course, at that point, why not simply reinterpret the array buffer
directly...

    
    
        func integerWithBytes<T: IntegerType>(bytes:[UInt8]) -> T? {
            if (bytes.count < sizeof(T)) {
                return nil
            }
            return bytes.withUnsafeBufferPointer() { bp -> T in
                return UnsafePointer<T>(bp.baseAddress).memory
            }
        }

~~~
barrkel
The more protocols that are added, the more concepts there are to scare people
away from what they think of as relatively simple primitives. 26 is already a
scary number of concepts to tie to simple whole numbers.

To be clear, the complexity is inherent in using numbers programmatically. The
only real way around it would be to reduce flexibility around overloading
operators, forcing people to implement bundles of related operators that are
all associated with a concept (protocol). This would decrease the utility of
overloading for implementing some algebras.

------
ChuckMcM
Wow that takes me back. Back to a conference room where we were talking about
Integers in Java. If you made it a class, an Integer could carry along all
this other stuff about how big it was, what the operators were, etc. But
generating code for it was painful because your code had to do all of these
checks when 99% of the time you probably just wanted the native integer
implementation of the CPU. And Boolean's were they their own type or just a 1
bit Integer? And did that make an enum {foo, bar, baz, bletch, blech, barf,
bingo} just a 3 bit integer?

Integers as types can compile quickly, but then you need multiple types to
handle the multiple cases. Essentially you have pre-decoded the size by making
into a type.

At one point you had class Number, subclasses Real, Cardinal, and Complex, and
within those a constructor which defined their precision. But I think everyone
agreed it wasn't going to replace Fortran.

The scripting languages get pretty close to making this a non-visible thing,
at the cost of some execution speed. Swift took it to an extreme, which I
understand, but I probably wouldn't have gone there myself. The old char,
short, long types seem so quaint now.

~~~
pjmlp
Just because they look like objects to the programmer they don't need to be at
implementation level, like the Pascal family of languages.

~~~
xamuel
This is only true until the programmer actually uses them like objects. If the
programmer, say, sticks the IntObj into a LinkedList<IntObj>, suddenly that
integer is going to need some sort of additional overhead associated with it.

~~~
pjmlp
Why?

It is a matter of how a specific language is implemented, if tagged types are
used, how generic code is handled and so on.

~~~
xamuel
Because adding/removing an object in a linked list requires some sort of
connection between the object and the list (e.g. a "next" field). Regardless
whether or not that's abstracted away by the language --- no language can
perform magic.

(To pre-empt a possible misunderstanding: I'm talking about adding the integer
itself to the list, not a copy of a snapshot of its value.)

~~~
sukilot
What? Are you familiar with pointers? If you want a List<*int>, you can have
that. "Next" is part of the list cell, not the contained object.

~~~
xamuel
Yes, I'm a bit familiar with pointers ;)

You can store the "next" in a list cell if you want, but there still has to be
some way to figure out, given just the object, which cell it corresponds to.
Well, you could traverse the whole list to find it, but I sure hope you
understand why that's a bad idea. Sure, compute a hash---but that computation
is overhead.

Why do you think Java distinguishes Integer from int? If it were possible to
have integers that walked, talked, and quacked like an object, but without any
overhead, then we wouldn't use integers. We'd use those things instead.

~~~
pjmlp
> Why do you think Java distinguishes Integer from int?

Because the compiler writers didn't want to spend effort using tagged types or
compiler optimization techniques.

The original goal was to generate bytecode for simple execution in embedded
devices.

.NET for example, does not distinguish between Integer and int. One is the
alias for the other.

Eiffel INTEGER is mapped to a plain C int[0], when generating native code.

Smalltalk implementations usually use byte tagging to map primitive objects to
register sizes. Described in the blue book.

Ada Integer has quite a few pre-defined attributes and the language allows for
additional user defined attributes[0]. Additionally one can specify the amount
of bits used for storage.

Any good CS compiler design course would cover such cases in detail.

[0] Which is a pure OO language.

[1] think methods

------
Animats
I once, many years ago, wrote something titled "Type Integer Considered
Harmful". (This was way back during the 16-32 bit transition). My position was
that the user should declare integer types with ranges (as in Pascal and Ada),
and it was the compiler's job to insure that intermediate variables must not
overflow unless the user-defined ranges would also be violated. Overflowing a
user range would be an error. The goal was to get the same answer on all
platforms regardless of the underlying architecture.

The main implication is that expressions with more than one operator tend to
need larger intermediate temporary variables. (For the following examples,
assume all the variables are the same integer type.) For "a = b * c", the
expression "b * c" is limited by the size of "a", so you don't need a larger
intermediate. But "a = (b * c)/d" requires an temporary big enough to handle
"b * c", which may be bigger than "a". Compilers could impose some limit on
how big an intermediate they supported.

This hides the underlying machine architecture and makes arithmetic behave
consistently. Either you get the right answer numerically, or you get an
overflow exception.

Because integer ranges weren't in C/C++, this approach didn't get much
traction. Dealing with word size differences became less of an issue when the
24-bit, 36-bit, 48-bit and 60-bit machines died off in favor of the 32/64 bit
standard. So this never became necessary. It's still a good way to think about
integer arithmetic.

~~~
deathanatos
Especially in higher-level languages, I've wished language designers would
move toward using variable-size/bignum integers instead of fixed size integers
(Python does this, for example). It eliminates overflow, and the need to
analyze each int to see if it will overflow the type you're sticking it into.

I wouldn't mind being able to have a "RangedInt<min, max>" type either, in
addition. If the bounds are tight enough, the compiler could just use the
next-bigger machine integer type (and do bounds-checking, please!). I think a
integral type that was always modulo the max would be useful in many
applications as well (i.e., unsigned, and overflow is well-defined to wrap,
but you explicitly opt-in to this behavior.) You can imagine,

    
    
      int: signed, bounded only by memory
      ranged_int<min, max>: integer type capable of holding anything in [min, max].
         Over/underflow is an error (exception? panic? [1])
      modulo_int<min, max>: unsigned, overflow wraps.
         (Mathematicians probably have a better name here… "ring"?)
      "usize" or "size_t": capable of holding any memory address, so useful for indexes.
      native::uint8, native::uint16, etc: whatever your hardware gives you, if you really need it.
    

The default type a new coder would grab for (int) won't overflow on them,
although there are questions about what does some_array[int_index] do, esp. if
it overflows the index type.

[1] Rust has some interesting thoughts here, and I thought they did a good job
of detailing the consequences and their reasoning; see
[https://github.com/rust-
lang/rfcs/blob/master/text/0560-inte...](https://github.com/rust-
lang/rfcs/blob/master/text/0560-integer-overflow.md).

~~~
Animats
If you want wrapping, use the "mod" or "%" operator. The compiler should be
made to understand how to generate fast code for idioms such as "n =
(n+1)%65536;"

------
Filligree
For amusement value, the Haskell equivalent is:

    
    
      import Data.Bits
      import Data.List(unfoldr)
    
      f :: (Num a, Bits a) => a -> [a]
      f = unfoldr $ \case
        0 -> Nothing
        n -> Just (n .&. 0xff, n `shift` 8)

~~~
panic
That's the inverse of the function the author wants, which ought to have type
'[Word8] -> Maybe a' (where 'Nothing' is returned when the bytes exceed the
range of the type in question).

~~~
Filligree
My bad. Then I'll leave it as an exercise for the reader; the signature of the
correct function is the same.

~~~
poikniok
This is so typical in the Haskell world, comment with a short snippet of code
that does not produce the intended result, and when this is pointed out
dismiss it as a trivial implementation problem.

~~~
bkirwi
Here you go, then:

    
    
        -- little-endian sequence of bytes to arbitrary integral type
        integerWithBytes :: (Bits a, Integral a) => [Word8] -> a
        integerWithBytes = foldr (\byte acc -> (acc `shiftL` 8) + fromIntegral byte) 0
    

At least, I'm pretty sure that's what the article was trying to do...

I don't think this is some amazing showcase of Haskell; after all, the article
includes a (working?) implementation in C++. But it does serve as a nice
counterexample to the idea -- expressed elsewhere in the thread -- that being
generic is necessarily a hugely painful or complicated thing.

~~~
codygman
Yep this works:

    
    
        λ> import Data.Bits
        λ> let integerWithBytes = foldr (\byte acc -> (acc `shiftL` 8) + fromIntegral byte) 0
        λ> integerWithBytes [0xFF, 0xFF, 0xFF, 0xFF]
        4294967295

------
shurcooL
Generic code is like nerd sniping.

I look at this and think "why would you want to write generic code for all
those ints?"

The integer types may look similar but they're different in more ways than
they're similar. They have different bit sizes, different signedness. The CPU
literally has to do different things depending if its `uint8` or `int64`. So
why do you want or expect one piece of code that does it all?

It's just so much easier and faster to do it like Go, have non-generic
functions that do exactly what you want and as a result, get meaningful work
done. It's faster to write (because you don't need to figure out how to do in
a generic way), faster and easier to read, and possible to make changes to one
func but not others.

~~~
jmgao
Because it should be trivial?

    
    
        template<typename T>
        T getValue(std::array<char, sizeof (T)> bytes) {
            return *reinterpret_cast<T *>(bytes.data());
        }

~~~
pcwalton
Well, it's not really fair to compare an implementation that throws out memory
safety to a type-safe language. Swift has pointer casts too.

~~~
jmgao
A memory safe implementation is only a bit longer:

    
    
        template<typename T>
        T getValue(std::array<char, sizeof (T)> bytes) {
            T result = 0;
            for (char byte : bytes) {
                result <<= 8;
                result |= byte;
            }
            return result;
        }

~~~
barrkel
The reason this works in C++ is because it doesn't have protocols. The
operations that T is required to support in order to be a valid type argument
to this function are not expressed symbolically; they need to be documented,
otherwise you end up with compiler error messages that quickly get unwieldy in
more complex scenarios.

In large part, that's the problem the OP is confronted with. When you make
these things symbolic and first-class, unless they're extremely complete, you
find holes in the system. And when they're very complete, you find yourself
overwhelmed by the number and apparent complexity of what should be simple.
There's an inherent conflict.

~~~
rian
C++ templates do have "protocols," they just aren't necessary. The result is
that the perpetrator of a template error is ambiguous.

check out type traits:
[http://en.cppreference.com/w/cpp/types](http://en.cppreference.com/w/cpp/types)
and std::enable_if:
[http://en.cppreference.com/w/cpp/types/enable_if](http://en.cppreference.com/w/cpp/types/enable_if)

"concepts lite" is a proposal to add syntactic sugar for type traits as well
as enhance them a bit.

in general this is how C++ does things now: first add library-level solutions
as far as possible, then add language-level syntactic sugar once the usage and
implementation is fully understood.

------
lectrick
I guess this sort of gets at the crux of the issue: Do you want it to be more
like a scripting language (which would basically give you the mathematical
equivalent of "integer" including unlimited size) at the cost of speed, or do
you want it to be closer to the implementation in the CPU, which entails
dealing with 8/16/32/64 bit limits and sign bits?

Why not have a way to do both? You can get an easy-to-use Int when speed is
less of a concern, and can deal with Int16's, Int32's, UInt32's and whatnot
when the job demands it.

~~~
iopq
If you have tagged integers you can have 31 bit ints that are super fast (one
shift away from the actual number). The performance cost (allocation) is only
when it overflows.

------
chvid
I think you all are being too nice to Apple.

I had a similar experience as the blog post author. I spent many hours
battling generics and the huge forest of (undocumented) protocols to do
something seemingly trivial. I just gave up rather than try to pin down
exactly what was wrong in a long and detailed blog post.

The prevailing answer to everything seem to be: Write a bug report to Apple
and use Objective-C (or Swift's UnsafePointer and related).

This ignores what I think really is the issue here: Swift has an overly
complex type system. This picture:

[http://swiftdoc.org/type/Int/hierarchy/](http://swiftdoc.org/type/Int/hierarchy/)

Tells a lot. And this is from unofficial documentation that has been generated
from Swift libraries. When you read the documentation Apple provides there is
little explanation of this huge protocol hierarchy and the rationale behind
it.

It seems to me that: Swift has been released in a rush with bugs even in the
core language and compiler. Lacking documentation. And of course even a larger
number of bugs in the IDE support, debugging etc.

Secondly: Swift battles the problem of easy-to-understand typesafe generics
like so many other languages only it has it much worse: It carries a lot of
stuff from Objective-C and it has to support easy interoperability. Plus it
has ideas like not allowing implicit conversion of number types (requiring an
integer added to a double to be explicitly converted to a double) causing the
big type system to show it's messy head again and again.

I really want to love Swift but it will take years for Swift to be as clean
and productive as Objective-C.

I my opinion what Apple should have done was to create the "CoffeeScript" of
Objective-C. A language that essentially was Objective-C in terms of language
features but with a concise syntax.

------
grimlck
How does [0xFF, 0xFF, 0xFF, 0xFF], interpreted as a UInt32 turn into 16777215?

I would have guessed 4294967295

~~~
Hannan
I assume it's just a typo/incomplete edit; his graphic directly below has the
bit representation 00000000111111111111111111111111 indicating [0x00, 0xFF,
0xFF, 0xFF].

------
thought_alarm
The Swift equivalent of his first `NSData` example is essentially this:

    
    
        func integerWithBytes<T:IntegerType>(bytes:[UInt8]) -> T
        {
            let valueFromArrayPointer = { (arrayPointer: UnsafePointer<UInt8>) in
                return unsafeBitCast(arrayPointer, UnsafePointer<T>.self).memory
            }
            return valueFromArrayPointer(bytes)
        }
    
        let bytes:[UInt8] = [0x00, 0x01, 0x00, 0x00]
        let result: UInt32 = integerWithBytes(bytes)
        assert(result == 256)

------
frozenport
"All problems in computer science can be solved by another level of
indirection, except of course for the problem of too many indirections."
~David John Wheeler

------
bluehex
I wouldn't expect a much better design from a language developed behind closed
doors with no community input.

Now all we can do is file bugs against Apple, and hope they improve it; they
who chose to release a new language with but three months of public beta. They
obviously didn't care much to have their designs tested or incorporate
feedback then.

------
cjensen
> More or less, what I want to archive can be done with old world NSData:
> data.getBytes(&i, length: sizeofValue(i))

That doesn't work in C/C++ if you are using a modern optimizer.

C does not have the other Swift issues the author mentions, so shifting into
the largest int and casting from there does work.

------
rismay
I'm glad I'm not the only one having this issue.

------
IgorPartola
So no macros? Seriously, isn't that the way to solve this?

~~~
Sephiroth87
No macros in swift

