"And if you have a line that is longer than
2GB, you only have yourself to blame."
Typing in machine code programmes for the ZX81, and ZX Spectrum consisted of REM statements with 1000s of zeros, which then you had to poke the machine code from data statements. Converting magic numbers, to magic.
This is actually an unfortunate patch... it rearranged lines for no particular reason, and scrambled the comments without updating them: they were multi-line sentences and referenced each other's positions.
The original code:
struct line *b_dotp; /* Link to "." struct line structure */
short b_doto; /* Offset of "." in above struct line */
struct line *b_markp; /* The same as the above two, */
short b_marko; /* but for the "mark" */
struct line *b_linep; /* Link to the header struct line */
The new code:
struct line *b_dotp; /* Link to "." struct line structure */
struct line *b_markp; /* The same as the above two, */
struct line *b_linep; /* Link to the header struct line */
int b_doto; /* Offset of "." in above struct line */
int b_marko; /* but for the "mark" */
int b_mode; /* editor mode of this buffer */
Even though Linus wrote "it probably made sense 30 years ago as a way to save a tiny amount of memory" the rearrangement probably is to optimize structure layout.
If a pointer is 8 bytes and an int 4, the old structure typically (this is compiler dependent) would be 40 bytes, the new one 36.
But yes, that should have been two commits, and he should have updated the comments.
This also is somewhat of an argument for giving the compiler the freedom to choose filed order, as the order that is optimal for memory usage isn't the best for human understanding.
> If a pointer is 8 bytes and an int 4, the old structure typically (this is compiler dependent) would be 40 bytes, the new one 36.
Ah: I actually did work out the alignment math, as I considered that as a reason to reorder the fields, but I was hoping that the goal would have been only to do that if the change was making the struct even larger (which it doesn't) and failed to think about a 64-bit computer :(. However, I personally don't consider "save four bytes per window" to be a particularly compelling reason to rearrange fields (which seem to have been put in that order more for semantic purposes than compression), and yeah: the real issue is that this is all in one patch and scrambled the comments :(. It is essentially at best a an unrelated optimization that seemingly didn't even notice how it broke the comments. (BTW: an issue with having a compiler decide the field order is that you then make separate compilation and sharing structures between libraries really really hard.)
It would be simple to either have compiler directives that control structure layout or (a bit more limited) some way to indicate that the compiler should keep your field order (just as, in Pascal, the 'PACKED' modifier to a struct directs the compiler to omit most padding)
Also, since Linus worked on kernel code a lot, that reordering probably was like a knee reflex; it didn't involve his brain. His brain wrote that we shouldn't be bothered by this in an editor, but his spine made the edit, anyways, and, apparently, his spine doesn't read comments.
You typically want to use the native integer size for something like this, for the code to be more portable. The native integer size should really be int, but it's not defined as such in the standard. My preference at the moment:
off_t for file offsets
ptrdiff_t for memory offsets (vs. size_t for unsigned)
int for return flags
A problem with this is that printf does not have sizes to match these.
I'm in the "never use fixed size integers unless needed"-camp. For example, if you write code that's supposed to run on an 8-bit computer (something embedded) as well as your 64-bit desktop, it makes no sense to limit yourself. For example, say that you know that your number will be between 1-200, you could use uint8_t. But why not use int? It will be the native width of the platform, and likely to be faster. Calculations on an 8-bit int can be slower than 64-bit ints on your 64-bit CPU, because the CPU might have to mask out the relevant part of the register to only operate on your 8 bits. And you don't gain anything by writing uint8_t, you still need to occupy a register on your CPU (which will be 64-bit). The behavior of your program will be the same on both platforms.
In practice, how much of my code is going to be run on an 8-bit CPU? Zero. I'd rather have the clarity and consistent behavior of explicitly typed ints than pre-optimize for something that will almost certainly never happen.
There were so many bad programmers who assumed 32 bit machines would last forever, that now we're stuck with compilers that don't default to 64 bit ints even building for 64 bit targets.
This will bite you if your code depends on int being 32 bit (for example if you depend on overflows or use asm). If your code explicitly uses an int32, this will always be true.
Silently increasing the size of an int is more likely to break things than it'll be helpful. If you really want an int to be 32 bit on a 32 bit machine and 64 bit on a 64 bit machine, make it explicit (e.g. intptr_t for pointers)
I have yet to use a C compiler for an 8-bit platform where int was the native size. sizeof (int) should be at least two chars. Using int instead of an uint8_t where you know its values to fall within those limits would be detrimental to performance in such cases.
That said, the code bases that target both 64- and 8-bit architectures probably aren't that many and both our points are insignificant in those other 99.9% of cases.
I think it has more to do with memory use. See for instance the "65535 interfaces ought to be enough" thread. If you allocate a bazillion of a certain structure, then using the shortest allocation unit for its fields can make a difference, especially if you have a lot of these "small" fields.
there are also int_least8_t or what ever size and signedness too. or int_fast8_t. You explicitly state I want at least 8 bits or I want at least 8 bits but whatever is fastest.
Fixed sized integers are useful in two situations (that I can think of); if you're interfacing with a language that doesn't use the same integer size as your C compiler, or if you're relying on integers being a certain width (maybe you're casting between integers and non-integers, or writing a certain number of bytes to a binary file).
I doubt either of these situations come up when dealing with line sizes in a text editor.
And a third situation: Writing device drivers interfacing with hardware registers of specific sizes, although you should use `{u}int_least<N>_t` for that, to give the compiler some leeway with alignment.
There are many cases where using fixed size types can save a significant amount of memory (and your application is constrained by memory on some platform) or CPU time (design to minimize cache misses happens all the time in high performance applications like games).
Neither of these situations is happening in a text editor application though.
In my experience with languages that have both signed and unsigned ints, but have no overflow or underflow protection (which is pretty much all of them), you really want to use signed ints so that you can write assertions that numbers you expect to be positive are positive. You can't do that with unsigned ints since they are all by definition >= 0, so you always end up finding bugs where suddenly you've got MAX_INT - 34 floating around in your program horking things up with no easy ability to tell there's a problem early.
A uint that actually threw an exception or something if you tried to underflow it would be useful, but most unsigned ints nowadays aren't all that useful.
You still have to deal with wraparound all the way off the bottom of the negative end (which is absolutely possible: these kinds of massive additions and subtractions have been the basis of various exploits, often involving array math), and so you need to solve that problem anyway; but now you also have to constantly verify the number is positive, and it is extremely common to see checks in the wild which only verify that a sized index is less than some maximum :/.
You'll be comparing against a lot of other things like offsets that aren't naturally unsigned and having everything be of the same type just tends to reduce the number of potential corner cases.
And it will be a bit faster for certain weird architectures since the overflow behavior of an unsigned is prescribed but signed overflow is undefined. But I really doubt anybody cares about that here.
> And if you have a line that is longer than 2GB, you only have yourself to blame.
This is why I like languages like Lisp: by default, integers are only limited by the amount of memory available (as an example: (- (expt 2 1024) (expt 3 27)) → 179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356321998626652229).
For efficiency one can of course limit things to fixnums, and in a kernel one would want to be careful not to let reading a file eat all memory — but that's all doable in a Lisp.
For structs where there are only a dozen or so in memory at any one time, this makes sense.
For structs where there are thousands, or millions, in memory using the smallest type required to get the job done will improve performance because you will fit more data into the cache.
Only if makes the overall structure actually becomes smaller. A bunch of these shorts were surrounded by fields whose alignment means that changing to int results in no increase in the overall size of the structure. In fact, I think some of them actually got smaller due to reordering the fields to be more alignment-friendly.
This reminds me of my first job (in ECAD/EDA) where microemacs was part of the design process. We had microemacs "programs" (like macros) which post-processed the output from other CAD tools. Microemacs had an internal C-like (interpreted) language, and combined with the fact that it ran on DOS and Windows 3.1 made it sort of suitable for this. However I really hope those have been replaced now.