
Stop using 'short' for line and allocation sizes (2013) - vs2
https://git.kernel.org/cgit/editors/uemacs/uemacs.git/commit/?id=fa00fe882f719351fdf7a4c4100baf4f3eab4d61
======
Arkanosis
And then: “Don't use 'char' for number of lines” (2014:
[https://git.kernel.org/cgit/editors/uemacs/uemacs.git/commit...](https://git.kernel.org/cgit/editors/uemacs/uemacs.git/commit/?id=8841922689769960fa074fbb053cb8507f2f3ed9))

------
TeMPOraL
"32,767 characters in a line should be enough for anyone."

I guess there's one constant with humans and infrastructure - people will
always find ways to hit its limits.

~~~
zeristor
"And if you have a line that is longer than 2GB, you only have yourself to
blame."

Typing in machine code programmes for the ZX81, and ZX Spectrum consisted of
REM statements with 1000s of zeros, which then you had to poke the machine
code from data statements. Converting magic numbers, to magic.

------
saurik
This is actually an unfortunate patch... it rearranged lines for no particular
reason, and scrambled the comments without updating them: they were multi-line
sentences and referenced each other's positions.

The original code:

    
    
     	struct line *b_dotp;	/* Link to "." struct line structure   */
     	short b_doto;		/* Offset of "." in above struct line  */
     	struct line *b_markp;	/* The same as the above two,   */
     	short b_marko;		/* but for the "mark"           */
     	struct line *b_linep;	/* Link to the header struct line      */
    

The new code:

    
    
      	struct line *b_dotp;	/* Link to "." struct line structure   */
     	struct line *b_markp;	/* The same as the above two,   */
     	struct line *b_linep;	/* Link to the header struct line      */
     	int b_doto;		/* Offset of "." in above struct line  */
     	int b_marko;		/* but for the "mark"           */
     	int b_mode;		/* editor mode of this buffer   */

~~~
Someone
Even though Linus wrote _" it probably made sense 30 years ago as a way to
save a tiny amount of memory"_ the rearrangement probably is to optimize
structure layout.

If a pointer is 8 bytes and an int 4, the old structure typically (this is
compiler dependent) would be 40 bytes, the new one 36.

But yes, that should have been two commits, and he should have updated the
comments.

This also is somewhat of an argument for giving the compiler the freedom to
choose filed order, as the order that is optimal for memory usage isn't the
best for human understanding.

~~~
saurik
> If a pointer is 8 bytes and an int 4, the old structure typically (this is
> compiler dependent) would be 40 bytes, the new one 36.

Ah: I actually did work out the alignment math, as I considered that as a
reason to reorder the fields, but I was hoping that the goal would have been
only to do that if the change was making the struct even larger (which it
doesn't) and failed to think about a 64-bit computer :(. However, I personally
don't consider "save four bytes per window" to be a particularly compelling
reason to rearrange fields (which seem to have been put in that order more for
semantic purposes than compression), and yeah: the real issue is that this is
all in one patch and scrambled the comments :(. It is essentially at best a an
unrelated optimization that seemingly didn't even notice how it broke the
comments. (BTW: an issue with having a compiler decide the field order is that
you then make separate compilation and sharing structures between libraries
really really hard.)

~~~
Someone
It would be simple to either have compiler directives that control structure
layout or (a bit more limited) some way to indicate that the compiler should
keep your field order (just as, in Pascal, the 'PACKED' modifier to a struct
directs the compiler to omit most padding)

Also, since Linus worked on kernel code a lot, that reordering probably was
like a knee reflex; it didn't involve his brain. His brain wrote that we
shouldn't be bothered by this in an editor, but his spine made the edit,
anyways, and, apparently, his spine doesn't read comments.

------
marcoms
Why not fixed-size ints - uint8_t, etc. from stdint.h? I've not been using C
for a while, so I wonder what the viewpoints are regarding this.

~~~
jhallenworld
You typically want to use the native integer size for something like this, for
the code to be more portable. The native integer size should really be int,
but it's not defined as such in the standard. My preference at the moment:

off_t for file offsets

ptrdiff_t for memory offsets (vs. size_t for unsigned)

int for return flags

A problem with this is that printf does not have sizes to match these.

~~~
omtose
But then you get different behavior depending on the platform, which doesn't
seem any more portable to me.

~~~
accatyyc
I'm in the "never use fixed size integers unless needed"-camp. For example, if
you write code that's supposed to run on an 8-bit computer (something
embedded) as well as your 64-bit desktop, it makes no sense to limit yourself.
For example, say that you know that your number will be between 1-200, you
could use uint8_t. But why not use int? It will be the native width of the
platform, and likely to be faster. Calculations on an 8-bit int can be slower
than 64-bit ints on your 64-bit CPU, because the CPU might have to mask out
the relevant part of the register to only operate on your 8 bits. And you
don't gain anything by writing uint8_t, you still need to occupy a register on
your CPU (which will be 64-bit). The behavior of your program will be the same
on both platforms.

~~~
coldpie
In practice, how much of my code is going to be run on an 8-bit CPU? Zero. I'd
rather have the clarity and consistent behavior of explicitly typed ints than
pre-optimize for something that will almost certainly never happen.

~~~
prodigal_erik
There were so many bad programmers who assumed 32 bit machines would last
forever, that now we're stuck with compilers that don't default to 64 bit ints
even building for 64 bit targets.

~~~
emsy
This will bite you if your code depends on int being 32 bit (for example if
you depend on overflows or use asm). If your code explicitly uses an int32,
this will always be true.

Silently increasing the size of an int is more likely to break things than
it'll be helpful. If you really want an int to be 32 bit on a 32 bit machine
and 64 bit on a 64 bit machine, make it explicit (e.g. intptr_t for pointers)

------
Kretiini
Stupid question: why not unsigned int?

~~~
jerf
In my experience with languages that have both signed and unsigned ints, but
have no overflow or underflow protection (which is pretty much all of them),
you really want to use signed ints so that you can write assertions that
numbers you expect to be positive are positive. You can't do that with
unsigned ints since they are all by definition >= 0, so you always end up
finding bugs where suddenly you've got MAX_INT - 34 floating around in your
program horking things up with no easy ability to tell there's a problem
early.

A uint that actually threw an exception or something if you tried to underflow
it would be useful, but most unsigned ints nowadays aren't all that useful.

~~~
saurik
You still have to deal with wraparound all the way off the bottom of the
negative end (which is absolutely possible: these kinds of massive additions
and subtractions have been the basis of various exploits, often involving
array math), and so you need to solve that problem anyway; but now you also
have to constantly verify the number is positive, and it is extremely common
to see checks in the wild which only verify that a sized index is less than
some maximum :/.

------
zeveb
> And if you have a line that is longer than 2GB, you only have yourself to
> blame.

This is why I like languages like Lisp: by default, integers are only limited
by the amount of memory available (as an example: (- (expt 2 1024) (expt 3
27)) →
179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356321998626652229).

For efficiency one can of course limit things to fixnums, and in a kernel one
would want to be careful not to let reading a file eat all memory — but that's
all doable in a Lisp.

------
andrewprock
For structs where there are only a dozen or so in memory at any one time, this
makes sense.

For structs where there are thousands, or millions, in memory using the
smallest type required to get the job done will improve performance because
you will fit more data into the cache.

~~~
mikeash
Only if makes the overall structure actually becomes smaller. A bunch of these
shorts were surrounded by fields whose alignment means that changing to int
results in no increase in the overall size of the structure. In fact, I think
some of them actually got _smaller_ due to reordering the fields to be more
alignment-friendly.

~~~
andrewprock
Struct packing is something you can control at the compiler level. In gcc, you
can add __attribute__((__packed__)) to your structs.

~~~
mikeash
That can add a performance penalty. If it didn't, compilers would always pack
tightly, after all.

In any case, these particular structs don't have any special packing applied.

------
grabcocque
I wonder if uemacs is now that rarest of beasts, a piece of software that is
actually _done_.

Edit: apparently it has no undo. The rules:

1) Linus does not make mistakes

2) In the event of Linus making a mistake, see rule 1

3) ...

4) git reset --hard HEAD

------
rwmj
This reminds me of my first job (in ECAD/EDA) where microemacs was part of the
design process. We had microemacs "programs" (like macros) which post-
processed the output from other CAD tools. Microemacs had an internal C-like
(interpreted) language, and combined with the fact that it ran on DOS and
Windows 3.1 made it sort of suitable for this. However I really hope those
have been replaced now.

------
mjs
So has Linus learnt another editor, as threatened??

~~~
mhd
He had one commit to uemacs after that, in 2014 (with a similar objective).
Either he gave up after that or that fixed all his problems so far...

------
xook
It was astounding to me when I watched that talk and he mentioned this issue.
Part of me still hopes he uses this same editor.

------
amelius
Well, it is called MicroEmacs for a reason ...

------
tomtoise
Original Title, for Posterity:

"Linus has been polishing a turd for two years"

~~~
ams6110
It's turds all the way down.

