How SerenityOS declares ssize_t

kazinator · on April 4, 2023

I would do it like this:

  #if SIZEOF_SIZE_T == 8
  typedef int64_t ssize_t
  #elif SIZEOF_SIZE_T == 4
  typedef int32_t ssize_t
  #elif SIZEOF_SIZE_T == 2 // LOL
  typedef int16_t ssize_t
  #else
  #error port me!
  #endif

SIZEOF_SIZE_T can be obtained using a script which compiles a test program without executing it.

Over they years I used more than one approach, settling on this one:

https://www.kylheku.com/cgit/txr/tree/configure?id=1f902ca63...

Here, in the test program, a DEC macro has been defined which given a constant expression, produces two decimal digits as the initializer for a two-character array. For instance:

   DEC(42) -> { '4', '2' }  // not exactly: character constants are not used

With this trick we can use DEC(sizeof (size_t)) to get a value like { ' ', '8' } into the portion of some character data, which we can prefix with an identifying string we can look for, like:

   SIZEOF_SIZE_T= 8

That we can basically grep out. In my configure script, this data is extracted, spaces are removed from it, and it's evaluated directly as shell assignments, so then the values are available in shell variables.

rkangel · on April 4, 2023

> #elif SIZEOF_SIZE_T == 2 // LOL

Fun fact - for a while, this was true for the third most popular processor architecture in the world, one that most people haven't heard of.

CSR (the bluetooth chip maker) was originally a spin-out from Cambridge Consultants, and their chips were for years based on a Cambridge Consultants processer design called XAP. As CSR had made billions of devices, XAP was likely right up there in terms of number of processors made.

XAP was unusual in a couple of ways (at least unusual compared to the fairly universal modern design). Here's a fun set of statements that all evaluate to true:

    sizeof(uint8_t) == 1     // So far so normal, this is required by the C standard
    sizeof(uint16_t) == 1    // Surprise! BTW, uint16_t is defined as "unsigned int"
    sizeof(void *) == 1      // Weird
    sizeof(void (*)(void)) == 2 // Yes, function pointers are not the same size as data pointers

This is all completely legal according to the C standard. Fun fact, the size of a byte isn't "8 bits", it's technically "the minimum addressable size on your platform"[1]. On every modern processor design I know of, that is an 8 bit quantity, but the XAP only allowed you to address 16 bits at a time. Presumably this was for efficiency - the XAP was very low gate count for a fairly capable processor (at the time). 16 bits was also the size that the instruction set operated on, the "natural size for calculation" as described in the C spec, so that's the size of an int. This means that sizeof(int) == 1, which is a surprise to most people. It also means that you can't do tricks like "cast a uint16_t to an array of 2 uint8_ts" (which is non-portable also for endian reasons anyway).

The other unusual aspect is that it's a Harvard architecture, which separates the address space for code and data. Code size being large, this meant function pointers were 24 bit, whereas data pointers were 16. Code that assumed it could cast a function pointer to "void *" to put it in callback context could get a nasty surprise (usually a subtle one, because it only broke if the function was late in the address space).

This was what I worked on early in my career, it was a great education in writing C that was actually portable. If it ran on PC and XAP, then it would likely run on anything!

[1] This is why picky/anal engineers will refer to "octets" in a network protocol rather than bytes. Bytes don't have a meaning except in the context of execution, which obviously doesn't apply to a network protocol.

magicalhippo · on April 4, 2023

> sizeof(uint16_t) == 1

> This is all completely legal according to the C standard.

Is it? I don't have access to the standard, but from secondary sources[1] it seems not?

unsigned integer type with width of exactly 8, 16, 32 and 64 bits respectively (provided if and only if the implementation directly supports the type)

If it is, it kinda defeats the whole purpose of uint16_t and friends.

[1]: https://en.cppreference.com/w/c/types/integer

rkangel · on April 4, 2023

That's interesting. I didn't actually know the definition of uintXX_t. I was actually doing this with C90 most of the time, and using types we had defined (rather than from stdint.h). Sounds like they were more equivalent to int_leastXX_t.

I realise know when I've previously written the above examples I've done it was basic types, so sizeof(char), sizeof(int) etc.. Sounds like it would be more correct as well!

kps · on April 4, 2023

Yeah, checking C11 (FCD), “The typedef name uintN_t designates an unsigned integer type with width N and no padding bits.” My reading is that they should not have defined `uint8_t`. Probably they did so pragmatically (and maybe optionally) because so much code writes `intN_t` when they mean `int_leastN_t`, with 8 in particular. I don't think I've ever actually seen the `_least` or `_fast` versions used in real life.

(I've worked with multiple processors with MAU>8; I recall at least one with 24-bit ‘bytes’, though I've forgotten what domain drove that.)

magicalhippo · on April 4, 2023

Ah yes, having char and short etc be the same size makes a lot more sense, while still being a massive trap for young players.

dan-robertson · on April 4, 2023

It has 16 bits but the units of sizeof are not bytes – they are ‘the minimum difference between two addressable things’ which on that architecture is two bytes.

rkangel · on April 4, 2023

I can't remember the precise wording in the standard , but sizeof() is effectively "how many of the minimum addressable thing do you need to store this thing".

The "minimum addressable thing" is called a byte, and on XAP that is a 16-bit value.

Storing a uint8_t wastes a 8 bits, because there isn't anything smaller than an int to put it in.

magicalhippo · on April 4, 2023

Right, I was assuming the bytes were 8 bits[1] given sizeof(uint8_t) == 1, ie since it had uint8_t. Seems really weird to have uint8_t but have CHAR_BIT > 8.

[1]: https://en.cppreference.com/w/c/language/sizeof

tjoff · on April 4, 2023

Isn't that a byte?

So rather, a byte is 16 bits.

actionfromafar · on April 4, 2023

Sure. A byte is two octets. In cases like these it pays to be very precise.

vlovich123 · on April 4, 2023

What’s also interesting was that the processor was actually a 32 bit processor iirc (at least the later ones). The 16 bit variant is purely the VM mode they run “application” code on which emulates the environment of an old CPU (my guess anyway - not sure why the VM was limited to 16bits).

The Kalimba was equally weird with sizeof(int) = 1 being 24 bit and sizeof (long) was 48bit. I ported an ECDSA library to it for Pixel Buds 1 because the XAP was too slow. It was really really hard to get that to work properly and writing a simulated environment to make sure I emulated the math correctly with masking and whatnot.

It was such an annoying architecture to work with that all the senior SW leads for Pixel Buds fought extra hard to avoid Qualcomm’s solution. Their evolution for the next set of chips to compete with Apple’s W1 saw them go down a weird path where they doubled down on getting rid of GCC and instead using their home grown compiler (none of their shit ran natively on Linux not Mac) and similarly weird architecture decisions (forget all the details now). Our job was made easier in that they couldn’t actually deliver a W1 competitor. Their best was exposing each bud as a separate device which would have been a terrible experience and they could only improve that experience for Android if we mainlined their weird decisions into Android (sorry - no). By comparison BESTech delivered a proper competitor design to the W1 (transparent sniffing and hand off) and their SW architecture was totally sane (ARM chip + gcc for sure + FreeRTOS if I recall correctly). Much better partners than Qualcomm.

rkangel · on April 4, 2023

> What’s also interesting was that the processor was actually a 32 bit processor iirc (at least the later ones). The 16 bit variant is purely the VM mode they run “application” code on which emulates the environment of an old CPU (my guess anyway - not sure why the VM was limited to 16bits).

I was writing code natively for the processor - it was a XAP5 and natively 16 bit. Possibly CSR later moved to XAP6 (which was 32-bit) and kept application code portable using the VM.

Kalimba was my first experience writing hand-coded assembly - I was implementing sample rate conversion for a hearing aid manufacturer. It was good fun - a dedicated multiply accumulate that meant you could do filtering in a single instruction per sample.

vlovich123 · on April 4, 2023

Yeah maybe it was the XAP6 (CSR8675 if I recall correctly the exact model number).

For the kalimba I just used the Qualcomm C compiler. There may have been some assembly for audio related things although I can’t recall. It was mostly C code though I think.

That’s actually what they tried to do for the new chips if I recall correctly - they put Kalimba everywhere. I was like - wtf are you doing Qualcomm.

defrost · on April 4, 2023

> On every modern processor design I know of, that is an 8 bit quantity,

Serious throughput DSP chips don't sully themselves with a mere 8 bits, they're designed to do 32 bit and more FFT pipelines that modular index multiple vectors, fetch, multiply, add, and store every clock cycle.

Eg: the Texas Instruments TMS320C54x has CHAR_BIT 16 ( Table 7-1 page 192 [1] )

Other modern DSP family chips have CHAR_BIT 32 .. they're for numerics not ASCII text processing.

[1] https://www.ti.com.cn/cn/lit/ug/spru103g/spru103g.pdf

consp · on April 4, 2023

I also remember the 8051 based TI bluetooth chips which also have some "weird" things going on and has a harvard like architecture just with more memory types. I raise you three types: program memory is also 24bit (using both one 8bit and the 16bit register) and 16bit "external" memory and 256 bytes of mapped registers.

I don't know if the compiler actually does this as different types and how it internally handles it. Maybe someone can elaborate on that.

kazinator · on April 6, 2023

I have worked with such things myself; for instance most recently, at, Broadcom I had worked on ARM-based VoIP phones which had a Ceva Teaklite DSP, which has 16 bit bytes (and memory access with weird byte order that is neither big nor little endian).

If we had to be a bit more portable to include such systems, we could test on bit instead of byte sizes, e.g.

  #if SIZEOF_SIZE_T * CHAR_BIT == 16

unwind · on April 4, 2023

Cool. So on that platform CHAR_BIT was 16. I still wonder if those statements should use the plain non-typed sizes, or if no such were available. I can live with

    sizeof(char) == 1
    sizeof(short) == 1
    sizeof(void *) == 1

all being true with CHAR_BIT equal to 16, but it seems pointless to support uint8_t if it's not truly 8 bits.

Too far from my language-lawyer mood now to dig deeper. :)

rkangel · on April 4, 2023

It turns out you're right, and that's how I've actually written these examples in the past. A sibling comment provided a citation, and that I should really have been writing uint_leastXX_t: https://en.cppreference.com/w/c/types/integer

mid-kid · on April 4, 2023

It's also 16 bit for the AVR architecture used on Arduino devices.

rkangel · on April 4, 2023

Is it? Cool! I haven't done anything with an Arduino for a while, and only used larger AVRs.

skissane · on April 4, 2023

Rather than a bunch of #if blocks, one could instead do:

    #define SIZE_T_BITS 64
    
    #define PASTE3(a,b,c) a##b##c
    #define TYPEDEF_UINT(bits,name) typedef PASTE3(uint,bits,_t) name
    #define TYPEDEF_INT(bits,name) typedef PASTE3(int,bits,_t) name
    
    TYPEDEF_UINT(SIZE_T_BITS,size_t);
    TYPEDEF_INT(SIZE_T_BITS,ssize_t);

Should work for any value of SIZE_T_BITS, even something weird like 128 or 36 (PDP-10 port?), so long as you already have [u]intN_t defined.

It does require SIZE_T_BITS to be in bits rather than bytes, as your SIZEOF_SIZE_T is. But surely if your script can calculate SIZEOF_SIZE_T, it can multiply the answer by 8? (Or by CHAR_BIT, if we want to support weird platforms without 8-bit bytes – Lars Brinkhoff's PDP-10 port of GCC 3.2 has CHAR_BIT==9.)

kazinator · on April 4, 2023

The main thing in my post is how we get that input value, in your case SIZE_T_BITS, at the preprocessing level where we can then use it to select alternative pieces of code. After that it's cosmetic.

The simple, dumb #if blocks have the virtue is that they are not hostile to simple tooling, like Exuberant Ctags, Cscope, and whatnot. When we ask the editor to jump to the definition of ssize_t, it knows the three possible places where it is defined and serves them up.

This alternative is possible:

    #define PASTE3(a,b,c) a##b##c
    #define UINT_TYPE(bits) PASTE3(uint,bits,_t)
    #define INT_TYPE(bits,name) PASTE3(int,bits,_t)
    
    typedef UINT_TYPE(SIZE_T_BITS) usize_t;
    typedef INT_TYPE(SIZE_T_BITS) ssize_t;

I think in this form, there is a good chance the tools will grok the typedefs and index them, because we have not disguised the basic phrase structure.

denysvitali · on April 4, 2023

Unrelated to the article per se, but relevant about the author: I really admire Andreas positivity and his ability to spread such joy to the audience.

This guy is awesome, and his positivity is outstanding. Just watch a few of his YouTube videos and you'll understand what I mean (:

CodesInChaos · on April 4, 2023

Redefining a keyword via macro violates the C++ standard:

> Nor shall such a translation unit define macros for names lexically identical to keywords.

https://stackoverflow.com/questions/9109377/is-it-legal-to-r...

nickwanninger · on April 4, 2023

Lucky for them, it's a fully custom OS that doesn't follow the standard :)

Conscat · on April 4, 2023

They don't make the compiler. GCC in very many places substitutes standard library names with intrinsics for the purposes of optimization or static analysis. I'm not aware of it doing so for ssize, but in principle this is a bad idea.

Cthulhu_ · on April 4, 2023

I'm not a C developer at all, but this looks like bragging about a clever trick; when it comes to these things (and most code for that matter), I'd avoid clever tricks like the plague. Just write it out logically and as simple as possible. You're not saving much time / effort by writing out, and clear is better than clever. Code size is never an issue.

Y_Y · on April 4, 2023

It's fun though. It's nice to have "best practices" etc, but SerenityOS isn't for IBM calculators or NASA life-support systems, it's a project for enjoyment by people who enjoy writing software.

0xfedbee · on April 4, 2023

You’d understand it if you were a C developer.

tredre3 · on April 4, 2023

> You’d understand it if you were a C developer.

Counterpoint: I'm a C developer and a big fan of Andreas and I prefer mildly clever code over longer explicit code. But even then I was still surprised by his choice to keep this specific hack. It feels very brittle to me.

stathibus · on April 4, 2023

Seriously, in what dimension is this a good choice?

TheDong · on April 4, 2023

Programming is meant to be fun. That sounds like it was fun.

For some projects and people, fun is more important than readability, speed, or any other dimension.

(edit: not to imply that hack isn't readable or fast or such! I think it's cute and very readable)

ghosty141 · on April 4, 2023

SerenityOS is not production software or used for anything serious. Its just for fun.

pakyr · on April 4, 2023

> I don’t recommend doing this in your codebase, but it has worked for us so far. :^)

The story of my life . . .

fefe23 · on April 4, 2023

Haha funny I came here hoping that someone had found a better solution than me, but you did the same thing. I think __SIZE_TYPE__ is not standard but a gcc extension (that clang also supports). I don't care because my target platform is Linux, but you might.

I have doubts about the legality of this solution though. The user might have #defined unsigned and this would break that. So far none of my users was mad enough to do it but I think they would be in their rights if they did.

If you only support "standard" platforms you can just typedef signed long ssize_t However some platforms (looking at you here, Windows!) will define long as 32-bit even on 64-bit and for those that will break. Not sure if __SIZE_TYPE__ is intrinsically declared on Windows in the first place. The C standard allows platforms where pointers have more bits than integers, in which case long would not work.

Hey, I just had an epiphany. You could use __PTRDIFF_SIZE__!

nine_k · on April 4, 2023

This is a great illustration why C is, on one hand, terrible, and, on the other hand, staunchly practical.

Here the trivial, unthinking preprocessor allows to do a pretty crazy (though very understandable and predictable) thing, which allows to acceptably solve a problem which would take years and a ton of effort to be solved "properly" (with standardization and compiler support).

williamvds · on April 4, 2023

I agree on the practicality aspect. When programmimg in C, one learns to accept these preprocessor hacks.

When programming in C++, where limited compile and type metaprogramming exists, one is constantly hitting the limits and it causes endless frustration. I go through a mini cycle of grief until I give in and use a macro, or an otherwise less elegant implementation.

You're right in that it has taken years (decades, even) to standardise some better alternatives to macros. But even now C++ lacks some of their power.

I have to wonder how much faster alternatives would have been implemented, if macros weren't "good enough" for so many use cases.

roger10-4 · on April 4, 2023

I wouldn’t call meta- and compile-time programming limited in modern C++ (C++17 and above).

With the addition of “constexpr” and “consteval” compile time programming is the same as runtime for many cases. Templates are obtuse for meta programming but usually can get the job done.

The need for macros much less common in modern code.

williamvds · on April 4, 2023

You can mostly avoid macros now, thankfully. I'm particularly thinking of things like compile-time reflection, some more complex type introspection. Things that would let you get rid of some code generation like D's mixins perhaps.

Existing c++ reflection has mostly been done with macros, which you sacrifice readability for by declaring your class with macros instead, and I believe is often a runtime thing anyway. Complex type metaprogramming is possible, sure, but often so obtuse and illegible I'd dare say the preprocessor is a better alternative if it works.

pjmlp · on April 4, 2023

Most likely people would be using awk and sed for the same purposes.

qalmakka · on April 4, 2023

That's cursed just the way I like it.

st_goliath · on April 4, 2023

To expand further on that: POSIX only says that ssize_t needs to be able to store -1, i.e. the valid range is from -1 through 0 to some implementation defined maximum. (See the link in the article)

A really cursed implementation could hack up compiler support for an asymmetric integer type that treats all-bits-set as -1 and everything else as a positive number, allowing ssize_t to hold all but the largest size_t values. While perfectly standards compliant, I guess this might break a bunch of implicit assumptions across various existing programs.

im3w1l · on April 4, 2023

There's a really neat way to express asymmetric numbers that generalizes from both unsigned numbers and standard 2's complement signed numbers.

It's easiest to explain in terms of how to interpret a particular bitpattern. As the first step, interpret the bitpattern as an unsigned int, u.

If u <= T, for some threshold T, then we are done, the final value is u. If u > T, then we interpret it as the negative number u - UINT_MAX.

T = UINT_MAX gives you the unsigned numbers, and T = INT_MAX gives you the two's complement.

Because of modulus intense handwaving it's also easy to do addition, subtraction and multiplication using this representation.

riceart · on April 4, 2023

This isn’t particularly uncommon or without precedent. This is just treating SIZE_MAX as the sentinel value of an otherwise unsigned type. This is what is done for std::dynamic_extent in C++ for instance.

fanf2 · on April 4, 2023

That isn’t allowed by the C standard: signed integer types have to be two’s complement, one’s complement, or sign-magnitude; they can’t be wildly asymmetrical about 0.

dataflow · on April 4, 2023

Is there any real-world system on which defining ssize_t to be ptrdiff_t would break actual programs that exist? (i.e. I'm not asking for hypotheticals here)

addaon · on April 4, 2023

Yes -- anything that uses fat pointers and defines ptrdiff_t to be a fat pointer type that is larger than size_t/ssize_t.

dataflow · on April 4, 2023

What actual system is currently like this?

addaon · on April 4, 2023

I'm sure they're out there. The closest that comes to mind is the MSP430, but not quite -- although it has 20 bit pointers (with sizeof ptr being... 4, since they're padded to 4 bytes for storage) and has a 16 bit size_t, my recollection is that ptrdiff_t is also defined at 16 bits (which I think violates the C spec, which requires at least 17 bits?). I haven't worked with many other segmented architectures recently, but either there or capability machines are where I'd look.

wahern · on April 4, 2023

> my recollection is that ptrdiff_t is also defined at 16 bits (which I think violates the C spec, which requires at least 17 bits?)

C2x applies a proposal to permit pre-C99 limits:

  N2808    Allow 16-bit ptrdiff_t

N2808: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2808.htm

Draft C2x: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf

dataflow · on April 4, 2023

The entire purpose of my question was that "I'm sure they're out there" is as close as I've ever found an answer to this, so I was looking for an actual example, not a hypothetical one.

skissane · on April 4, 2023

IBM i (formerly known as AS/400) has two types of pointers, fat 128-bit pointers and thin pointers (which are either 32-bit or 64-bit, depending on the addressing mode). The 128-bit pointers contain embedded information on the type of the object they point to, and security capabilities for accessing it – there are actually several different types of fat pointers, which constrain which type of object they point to, but there is a generic pointer type ("open pointer") which can contain any other type of pointer, and hence point to anything. By contrast, the thin pointers are just memory addresses. IBM's C compiler defines extension keywords to declare if a given pointer type is fat or thin. However, they chose to define size_t, ptrdiff_t, etc, in terms of the thin pointers only. So, even this isn't a case of what you are looking for. But, if IBM had made some slightly different choices (permitted by the standard) in the design of their C compiler, it would have been. Also, back in the late 1980s / early 1990s there was at least one third party C compiler for the AS/400 (and its System/38 predecessor), and I'm not sure what choices that compiler made.

If people are looking for examples, I'm wondering about C compilers for Burroughs Large Systems. Or C compilers for Lisp machines (Symbolics had one). Those are the kind of weird architectures on which you'd do this, if anyone ever did. Indeed, it is rather obvious that the C standards committee gave compiler developers these unusual options with those weird architectures in mind. But it can't force them to make use of them, even if they are on a platform in which they might make sense.

addaon · on April 4, 2023

I mean, a C spec compatible compiler for the MSP430 would be an example (see paragraph 7.20.3), TI has just decided to deviate from the language standard here. It's an example of a platform that would meet the requirements, and chooses to not offer a (compliant) C compiler instead.

dataflow · on April 4, 2023

But that's not a counterargument; if anything, it's kind of my point. The standard seems to be catering to a hypothetical machine that seems to lack real-world demand/usage/market/utility/etc.

If an abstraction is placed into a standard, its answer to "how many people are benefiting from this headache we're giving everybody" really ought to be noticeably greater than zero.

addaon · on April 4, 2023

Sure. The C standard has always erred on the side of supporting odd platforms -- see ones complement, decimal floating point, sizeof function pointer != sizeof data pointer, etc. Most platforms today have converged around an approach that doesn't require these escape hatches. But if you're trying to e.g. bring up C on CHERI [1] or other platforms that don't make the assumption that memory is all one big flat address space, it's not only nice to have flexibility in the standard, it's nice that LLVM and other tools maintain that flexibility into their implementation.

[1] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf

dataflow · on April 4, 2023

> see ones complement

I'm glad you brought this up because is kind of my point. C++ realized this was useless baggage and finally left it behind. [1] I don't see why ptrdiff_t is much different here. C just doesn't want to let things to, I guess. Literally any feature you put into a language will end up being used (or abused) by someone for something. "It's nice" that at some point in the future someone can pick up any random shiny thing once in a while and twirl it around doesn't seem like a reason to keep it into the standard for decades and burden everyone else with it the whole time. (Not to mention there are much nicer things that C and C++ lack, and that would make people's lives easier rather than harder...)

[1] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...

addaon · on April 4, 2023

It's a philosophy thing. There are plenty of languages that just solve for flat Von Neumann memory models. In the past, that did not describe all machines -- e.g. the pre-standard Borland C/C++ compilers for x86 real mode that used 32-bit ptrdiff_t and 16-bit size_t (which doesn't count as an answer because they were pre-standard in so many ways). In the future, it may or may not describe all machines -- it's possible that RISC wins forever and we continue to push all complexity into the software, or it's possible that something capabilities based comes forward instead. Similarly in primitive types, perhaps some day with have unums.

If the future is still flat, then... we've paid the cost of an extra few paragraphs in the standard? Folks interested in writing non-portable code can ignore this and use implementation-defined behaviors; those who want to be fully portable to future machines can be more careful (although ptrdiff_t is basically a cursed type anyway, so I don't see this particular overhead mattering). Yes, getting rid of ones complement makes sense these days -- but if you were worried about the compatibility issues around supporting it properly and avoiding undefined behavior, you're probably spending exactly as much effort today dealing with the fact that INT_MIN and friends are cursed mathematically on twos complement machines.

Meanwhile, suppose that we actually break out of this local minima. That's an interesting world, and an even more interesting one if we can carry most of our software forward with us.

We're not even a century into designing computers yet. I can't even begin to predict what architectures will look like in another five or ten centuries -- even assuming that transistor budgets continue to taper off. But I'll say that of the languages I use daily, C is one of the few that I'd still expect to be around and functional in that future, even if only as an archeological curiosity; it allows a decently high fidelity description of how an algorithm should be implemented across more than half a century of hardware.

jabl · on April 4, 2023

C23 follows C++ in requiring signed integers to have two's complement representation. I think by this point it's pretty much settled that it's the "optimal" way to implement signed integers. Now if we change to non-binary architectures (or something radically different) things might change, but at that point quite a lot of the C standard will have to be thrown out as well.

addaon · on April 4, 2023

I /mostly/ agree -- it's hard for me to think of a modern or future platform that would use ones complement integers... with one possible exception. Integers represented on top of IEEE floating point are inherently ones-complement (sign+magnitude), and I can definitely imagine potential future platforms that use 64-bit floats as their primary primitive, restricting them to integer representations for certain tasks. Thinking of designing something like a DSP-focused microcontroller in a world where onboard SRAM can significantly exceed 4GB, and where a desire for C compatibility and occasional tasks make it worthwhile to support function and data pointers, but supporting 53-bit pointers in a float ends up simpler than adding a 64-bit integer ALU that would rarely be used. In this case the associated types (ptrdiff_t, for example) might end up as 53-bit ones-complement integers stored in floats.

Dylan16807 · on April 4, 2023

I know I'm being picky but sign+magnitude and ones's complement are a good bit different from each other despite both having negative zero.

Dylan16807 · on April 4, 2023

I don't understand how it's an example. It sounds like ssize_t and ptrdiff_t need the same number of bits on this architecture. And if ptrdiff_t is wrong, they'd probably make ssize_t wrong too. Also I don't think "we need more distinct constants because then if a compiler author deliberately makes one wrong they might leave the other one alone" is enough justification to have two.

SAI_Peregrinus · on April 4, 2023

ssize_t isn't standard C. It's just POSIX, which uses undefined behavior of the C standard to create an almost but not quite C language used on all Unix systems. POSIX C is not strictly conforming C, but it is conforming C. The same goes for GNU C, Visual C, etc.

adwn · on April 4, 2023

I fail to see how their overly clever solution is in any shape or form better than the regular approach shown under "How others declare them" – except for being a "cute" "hack".

awesomekling · on April 4, 2023

I find it neat that the code doesn't have to enumerate all the possible platforms it might be compiled on, and instead just takes something the compiler already provides and tweaks it.

But yes, the "cute" "hack" aspect is the primary endearing factor :^)

pakyr · on April 4, 2023

Well, he sorta says it at the end:

>Other C libraries typically use more careful techniques, such as wrapping the declarations in architecture-specific #ifdefs

They don't have to define it multiple times for different architectures. This is theoretically platform agnostic and saves a few lines. Not really that significant, but then again it's not like he's recommending people do it.

self_awareness · on April 4, 2023

My thoughts exactly. What if __SIZE_TYPE__ stops being a macro someday? Also double-underscore prefixed symbols are compiler reserved, so any other compiler can use the same symbol for entirely different purpose, so compiler guards are needed when using it.

I understand doing hacks when there's no other way around something, or when a hack is much cheaper than the proper solution, but in this case, I think those guys have chosen a more complicated, more expensive, less functional hack than a cheap and more complete proper solution (in "how others are doing it").

jeffparsons · on April 4, 2023

It's a hobby OS. That makes cuteness a lot more okay than other contexts.

asddubs · on April 4, 2023

well, obviously the advantage is that you don't have to enumerate all the architectures you support manually. Is it worth it? Ehhh, probably not, but the author isn't exactly advocating for it either

strken · on April 4, 2023

Somewhat off-topic, but what's the :^) supposed to mean?

I've always parsed it as "the previous statement was intentionally wrong or irritating", sort of like the /s sarcasm tag except it can also denote trolling those who aren't in on the joke. I'm unsure whether it's used as a normal smiley here or whether there's something I'm missing.

natdempk · on April 4, 2023

My understand is that :^) represents a more devious/cheeky smile. I think this is more common than strictly a trolling/sarcasm usage, though you could see how it could fit there as well under my definition.

trashburger · on April 4, 2023

In the SerenityOS community, it is intended as a normal smiley face instead of being sarcastic.

awesomekling · on April 4, 2023

That's right! It's a callback to Slackware, which was the first Linux distribution I used as a kid. The release notes for Slackware 4.0 ended with "Have fun! :^)" which I always thought had a great vibe.

layer8 · on April 4, 2023

I don’t know about the OP, but generally speaking it indicates cheekiness.

rep_lodsb · on April 4, 2023

<rant>

"__SIZE_TYPE__" is a builtin symbol in the preprocessor, seriously?

    echo __SIZE_TYPE__ | cpp -
    # 1 "<stdin>"
    # 1 "<built-in>"
    # 1 "<command-line>"
    # 31 "<command-line>"
    # 1 "/usr/include/stdc-predef.h" 1 3 4
    # 32 "<command-line>" 2
    # 1 "<stdin>"
    long unsigned int

Okay, so it's defined in a file apparently. But why is that file automatically included in every program, and not the more usual type definitions? And is there really no better way to define "an integer the size of a pointer" than by going through this rat's nest of text substitutions?

The C language seems almost designed on purpose to maximize uglyness. It is called "portable assembly language", but even in this day and age when there are only two or three relevant processor architectures, which have been largely designed around being able to run C code efficiently (to the detriment of everything else), it falls short of that.

It is only portable at all because of include files that are included from other files, containing __MACROS__ calling on __OTHER__MACRO_S___ to the n-th degree.

The most advanced compiler algorithms are then applied to the problem of transforming the resulting ((void *)(__pile_of(cr*p))()) back into machine code that is at least not completely terrible. Follow every obscure rule of the standard, and they might be so nice not to remove your carefully written null pointer check in the process!

And people worship this uglyness and needless complexity, even as it strangles the life out of every other technology like a cancer that has been growing for 50 years. Professionals and hobbyists both, they celebrate how clever they are, being able to work around its deficiencies, think that they are dealing with the fundamentals of computer science rather than the grotesque evolution of an operating system originally written to typeset documents and play SpaceWar.

Just once I'd like to see a new operating system - better yet, a new CPU! - that is not based on C and UNIX, one that is outright hostile to them at every level of abstraction. Not a single line of C code anywhere, different calling convention, different filesystem, user interface, networking etc.

makach · on April 4, 2023

isn't magical development where you do things just because it "works for me" building technical debt?

rvnx · on April 4, 2023

Technical debt is like corporate debt, it's debt that you leverage to get to the market faster and make more money.

You may, or may not, have to pay it back.

So some technical debt is good in many cases as long as it is properly managed.

If you have zero debt it means you are not efficient (the same in finance or if you purchase a home cash while you could get a low and fixed interest rate loan for example).

garrettgarcia · on April 4, 2023

I've never thought of it this way before. Great point and analogy!

doctor_eval · on April 4, 2023

It's definitely not technical debt. The worst that can happen is that you have to go back and do it the "proper" per-arch way at some point. It's just a neat hack.

MathMonkeyMan · on April 4, 2023

The C standard macro must expand to _something_. My concern would be that it expands to something like __builtin_foo_nugget instead of unsigned long...

kevin_thibedeau · on April 4, 2023

ssize_t isn't in the C standard. The closest substitute is ptrdiff_t.

MathMonkeyMan · on April 4, 2023

I was referring to __SIZE_TYPE__, but that isn't in the standard either. Whoops.

jheriko · on April 4, 2023

this locks you into the compiler.

i used to do "cute" tricks like this. then i learned better. always write the least mysterious code unless you have a good reason...

Conscat · on April 4, 2023

Actually both GCC and Clang have this feature (so does Circle), and uhh there are plenty of other indispensable features that lock you into those. You need statement expressions and inline assembly to write this kind of software.

notorandit · on April 4, 2023

Most Unix-like system inherit that type declaration from the kernel, for obivous reasons.

Why not doing the same?

SAI_Peregrinus · on April 4, 2023

They're writing the kernel.

cultureswitch · on April 5, 2023

Well my breakfast is on my feet now

NoZebra120vClip · on April 4, 2023

This seems to be about defining it, not merely declaring it. ("typedef" is short for "type definition", for example.)

riceart · on April 4, 2023

Definition and declaration have specific meanings in C and C++. A typedef is part of a declaration.

Declarations can be definitions but a typedef is not a definition in particular. https://en.cppreference.com/w/c/language/typedef