
Tips for stable and portable software - begriffs
https://begriffs.com/posts/2020-08-31-portable-stable-software.html?hn=3
======
CJefferson
I've currently involved with a system, written in C, which has been going for
30 years: GAP - [https://www.gap-system.org](https://www.gap-system.org)

While I write a lot of C, I immediately disagree with the idea that C has a
"simple (yet expressive) abstract machine model". Every so often we find a bug
which has been present for over a decade, because some new compiler has added
a (perfectly legal by the standard) optimisation which breaks some old code.

Picking one example: in the "old days", it was very common (and important for
efficiency) to freely cast memory between char, int, double, etc. For many
years this was fine, then all compilers started keeping better track of
aliasing and lots of old code broke.

Also, while POSIX is a nice base, it stops you using Windows, and also almost
every non-trivial program ends up with a bunch of autoconf (which has to be
updated every so often) to handle differences between linux/BSD/Mac.

Also, definatly don't distribute code where you use flags like '-pedantic', as
it can lead to your code breaking on future compilers which tighten up the
rules.

~~~
pwdisswordfish4
-pedantic only enables warnings, it cannot change the meaning of code; not even on newer compilers.

~~~
CJefferson
You are right, I mis-remembered what the flag did, sorry.

I've seen projects with -pedantics -Werror, which are particularly annoying
(-Werror in general to be honest, I understand why people might want it for CI
of course).

~~~
carlmr
I like using -Wall -Wextra -Werror for code under my control, but I would
disable that flag if distributed as a library to others.

------
ut6Ootho
This article certainly rings a bell, as I started rewriting my personal
projects to C in the last year, precisely because I wanted to make them
decades-proof. I still use the same vimscripts I wrote in early 2000', I want
the same thing for all my tooling and apps.

I'm not sure it makes sense professionally, though, as most codebase won't
survive a decade : after three years, the dev team will turn over, and the new
team will want to rewrite everything from scratch. Or start rewriting parts of
the exisiting system in a new language, until it ultimately eat it up. It may
be related to the kind of companies I work with, though (very early stage
startups).

Regarding interfaces, I think the author could have gone a step further. There
is actually a standard and portable interface system: html/js/css. If you
write a dependency free web app using things like webcomponents and other
standard techs, you know it will stand time, and it actually matches all the
reason why the author want to use C : standard and multiple implementations.

~~~
user5994461
It's highly dependent on the domain.

If you're in a web startup, software won't last 3 years, the next team will
systematically rewrite.

If you're in the bank, logistics, defense sector, it's very likely the
software will go for a decade, as long as it's not killed the first or second
year for being a pet project (initial manager left) and having no customer.

~~~
RandoHolmes
> If you're in a web startup, software won't last 3 years, the next team will
> systematically rewrite.

I have an old man rant about that actually... that rewrite is typically
unnecessary if you actually use discipline when developing and learn how to
read code.

I once took on a CakePHP 2 app and another developer asked me how in the world
I got into, and understood, the framework so quickly. My secret? I read the
CakePHP 2 source code. So many developers learn how to do that very well.

~~~
phone8675309
> that rewrite is typically unnecessary if you actually use discipline when
> developing and learn how to read code

"But developers that can exercise discipline and know how to read (and modify)
code instead of rewriting cost so much money..." is what you'll typically hear
in response to this.

It's cheaper (and often faster) to have cheaper, less disciplined, less
experienced developers rewrite something multiple times than it is to have
more expensive, more disciplined, more experienced developers write something
and maintain it. It's also harder to keep the more experience developers
because most developers I work with start looking for another job when their
project goes into maintenance.

The typical "we never have enough time/money to do it right the first time but
we always have to make the time/money to do it twice" situation.

~~~
tstrimple
> It's cheaper (and often faster) to have cheaper, less disciplined, less
> experienced developers rewrite something multiple times than it is to have
> more expensive, more disciplined, more experienced developers write
> something and maintain it.

I can't believe this. I've seen the sheer difference in speed and
maintainability a single solid web developer can deliver in a framework they
are familiar with versus teams of more Jr developers who spin their wheels for
weeks. Rewriting when you don't even understand the starting point is always a
waste of money.

> It's also harder to keep the more experience developers because most
> developers I work with start looking for another job when their project goes
> into maintenance.

This certainly resonates though. I've been that developer more than once.

~~~
RandoHolmes
> I can't believe this. I've seen the sheer difference in speed and
> maintainability a single solid web developer can deliver in a framework they
> are familiar with versus teams of more Jr developers who spin their wheels
> for weeks. Rewriting when you don't even understand the starting point is
> always a waste of money.

I agree completely.

It can be quite shocking just how much damage a poor developer can do to small
to medium companies. I know of 1 company that's holding on for dear life right
now because they lost their biggest client due to a very poor developer they
had employed. I told them 6 months before this all happened to get rid of him,
but they didn't. And Corona is just making it that much harder for them to
find new work.

------
kasperni
> Tips for stable and portable software

I think a more accurate title would be "Tips for stable and portable C
programs"

~~~
Cthulhu_
The author lists a number of languages considered stable, C being one of them
because of widespread support and portability. Java isn't portable for example
because it depends on the JVM (and I know GraalVM is a thing but will you
still be able to use it in ten years?).

~~~
kasperni
Show me a JavaScript developer that cares deeply about POSIX or the operating
system they are running on.

And what about Windows? It is still used on 80% on all computers? So why is
POSIX essential?

~~~
Shared404
Haven't done much work with servers I take it.

Almost any OS running on a server is going to be POSIX, probably Linux or BSD.

------
ludocode
This is mostly good advice. I don't love configure scripts, I don't agree with
the heavy reliance on POSIX if you intend to be compatible with Windows, and I
don't love the fact that the author recommends third party data structure
libraries that they haven't actually used. For container libraries in C, you
really have to use them to get a feel for their usability (this sounds like a
tautology but it's not.)

I disagree strongly with one recommendation. This is just an example, but it
holds for larger API design in general:

> we could add a fallback to reading /dev/random. [...] However, in this case,
> the increased portability would require a change in interface. Since fopen()
> or fread() on /dev/random could fail, our function would need to return
> bool.

No, definitely not. It is dangerous to expect the application to sanely handle
the case of randomness being unavailable when it is never going to occur in
practice. On all POSIX platforms, /dev/random exists and will block until
sufficient entropy is available. Something would have to go seriously wrong
for this to fail. This is so rare that any error handling code for it will
never be tested. The most likely outcome of forcing the caller to handle it is
that the return value is ignored or improperly handled and the buffer is used
uninitialized, leading to a security vulnerability.

My recommendation instead would be to error check your fopen() and fread()
calls within get_random_bytes(), and print an error and abort() if they fail.
This way if someone's system is improperly configured and /dev/random doesn't
work the program will just crash. Same goes for macOS's SecRandomCopyBytes()
and Windows' half a dozen calls to use an HCRYPTPROV. This way you still
return void and there is no danger of callers improperly handling errors.

In general, unless you're writing safety-critical software, it's fine for your
code (or even library code) to abort() in these sorts of exceptional
situations when there is no reasonable or safe way to handle the error. If
someone truly wants to handle the error, they can just not use your API and do
it manually.

------
chrisco255
From the title, I was hoping to hear about software systems that have powered
infrastructure for decades, but unfortunately it was more of a programming
language analysis strategy.

------
jankotek
Hm, decades is not that much, most enterprise code fits into that. But how
about 200 years?

It is about people. Documentation, paper trail why some decisions were made,
archiving build tools, VMs, dependency source code..

Also C, POSIX and Motif are terrible choice for their fragmentation. Java is
very booring, but compiling and debugging 20 years old code is very common.

------
Cthulhu_
I'm currently "betting" on Go for making a back-end (just a REST API + sqlite
database) that will last a decade; I'm betting on the tooling to stay
backwards compatible or with minimal changes in the codebase; I'm betting on
the readability of my own code for the next decade, and I'm betting on the
language + tools to continue to be developed whilst sticking to their original
goals.

Generics is going to be fun.

~~~
RandoHolmes
> I'm betting on the tooling to stay backwards compatible or with minimal
> changes in the codebase

This is actually why I'm pretty bullish on things like RoR, Laravel, et al.

The sheer speed at which they go to a new version that breaks BC is actively
making the web less secure. I've lost count of how many times I've found a new
client with this software that's been working for years but suddenly broke,
only to realize it's on an OS that's EOL, using a version of the framework
that's EOL and a version of the language that's EOL. And now it's my job to
bring it up to speed.

And typically the hardest part of that? The 3rd party dependencies that are
either abandoned and don't support the newer versions of anything, or have
moved onto Python 3 and no longer support Python 2.

It's why I vastly prefer something like asp.net core. I know in 5-10 years the
code will probably just work with the latest version, and if there's an
incompatibility, it's going to tend to be relatively small.

~~~
lukeramsden
> This is actually why I'm pretty bullish on things like RoR, Laravel, et al.

Do you mean bearish? I think you do as I was confused for about half of your
comment before I realised

~~~
RandoHolmes
Sorry, you're correct. I meant bearish.

That's what I get for posting pre-coffee :(

------
MaxBarraclough
Seems like good advice. I'd add another one that seems completely obvious, but
some sloppy developers ignore it: avoid undefined behaviour. If you're going
to work with C, you need to know about undefined behaviour and take it
seriously.

~~~
rini17
If it were so easy, there would be already specified a subset of C without
undefined behavior and you could be able to automatically check your code
against it.

~~~
MaxBarraclough
My point was only that C programmers should be keenly aware of the pitfalls of
undefined behaviour, rather than blithely ignoring it. I've been surprised by
the sloppiness of some developers on this point.

> a subset of C without undefined behavior

There are various projects out there that let you produce C code guaranteed to
be free of undefined behaviour, but they're not 'quick fix' solutions, so
they're not widely used.

[https://www.eschertech.com/products/](https://www.eschertech.com/products/)

[https://github.com/zetzit/zz](https://github.com/zetzit/zz)

[https://blog.regehr.org/archives/1069](https://blog.regehr.org/archives/1069)
(ctrl-f for _actually_ )

~~~
rini17
What does "keenly aware" even mean? For example: any time I add or subtract
two signed ints, undefined behavior can happen. Now what. Must I pepper the
code with bounds checks (which are prone to UB too if not done carefully)?

Anyway, any complicated thing that can be easily ignored, inevitably will be.

~~~
MaxBarraclough
> What does "keenly aware" even mean?

Keeping the threat of undefined behaviour in mind, and taking steps
accordingly, rather than complacently ignoring it. C is a highly unsafe
language, and the programmer shouldn't forget this.

> any complicated thing that can be easily ignored, inevitably will be.

The demonstrable inability of C programmers to write correct code is a strong
argument against the widespread use of C. Even old languages like Ada show
that you can use a language _much_ safer than C and still achieve solid
performance. Languages like Rust are making further progress on having safety,
performance, and programmer-convenience, all at once.

If you use an ultra-safe language like verified SPARK Ada, the language
doesn't even _allow_ you to, say, forget to check whether a denominator is
zero, or to forget to protect against out-of-bounds array access.

> Must I pepper the code with bounds checks (which are prone to UB too if not
> done carefully)?

Not necessarily; a tool can help check for undefined behaviour. Static
analysers, GCC flags, and tools like Valgrind, can automatically check for
out-of-bounds array access, divide-by-zero, or attempting to dereference NULL.
[0] Adding your own runtime assertions isn't a crazy idea though, especially
for dev builds. If this were the norm in C programming we'd have fewer
security vulnerabilities.

C lacks the kind of runtime checks that are 'always on' in languages like Java
and C# (out-of-bounds, divide-by-zero, etc). That's not because such checks
don't apply to C code, it's because of the minimalist C design philosophy. You
have the option to add your own checks, or use tools to do so automatically,
but if you develop without any checks anywhere you should expect to have more
bugs. Java added them for a reason.

The C++ language has a somewhat different design philosophy, but it's the same
reason its _std::array_ class-template has both a runtime-checked _at_ member-
function, and an unchecked _operator[]_. It would be against the design
philosophy to force you to pay the runtime overhead for checks, but it gives
you the option.

> which are prone to UB too if not done carefully

What kind of error do you have in mind here?

[0]
[https://stackoverflow.com/a/44820924/](https://stackoverflow.com/a/44820924/)

~~~
rini17
For example checking for signed overflow must be done carefully:

[https://stackoverflow.com/questions/3944505/detecting-
signed...](https://stackoverflow.com/questions/3944505/detecting-signed-
overflow-in-c-c)

"Design philosophy"...oh please! C was designed for transistor- and memory-
scarce microcomputers. Nowadays there is defacto supercomputer in every phone
and runtime bounds checks are cheap. Moreover, allowing CPU to know the size
of memory chunk pointed to could enable optimization which would make the code
actually faster (not even talking about security benefits). But you C
programmers insist tooth an nail against that...

~~~
MaxBarraclough
> For example checking for signed overflow must be done carefully:

Right, but we're talking about a simple bounds check. There should be no need
for any arithmetic, just comparison.

> "Design philosophy"...oh please! C was designed for transistor- and memory-
> scarce microcomputers.

Right. Hence its design philosophy.

> Nowadays there is defacto supercomputer in every phone and runtime bounds
> checks are cheap.

Cheap, but perhaps not cheap enough to dismiss entirely. Bounds checking costs
a few percent of performance [0], enough to put some people off in some
domains such as in the kernel.

It's a pity C makes it difficult to automate just about any kind of check.
Checking whether a pointer overruns a buffer that was returned by _free_ , for
instance, requires quite a bit of cleverness, as the system has to track the
size of the allocated block.

You have to rely on optional compiler features, elaborate static analysis
tools (often proprietary and expensive), and dynamic analysis tools like
Valgrind. Ada on the other hand enables all sorts of runtime checks by
default, but it's easy to switch them all off if you're sure.

> CPU to know the size of memory chunk pointed to could enable optimization
> which would make the code actually faster (not even talking about security
> benefits)

What kind of optimisation do you have in mind? Pre-caching?

> But you C programmers insist tooth an nail against that...

'Fat pointers' of this sort have been tried with the C language [1] but I
can't see the committee adding them to the standard. Part of C's virtue is
that it's extremely slow moving.

I'm not advocating continued widespread use of C though. I hope safe-but-fast
languages like Rust do well. We all pay a price for the problems associated
with C and, perhaps to a lesser extent, C++. For what it's worth I haven't
written serious C or C++ code for a long time.

[0]
[https://doi.org/10.1145/1294325.1294343](https://doi.org/10.1145/1294325.1294343)
(An old source admittedly)

[1] [http://libcello.org/learn/a-fat-pointer-
library](http://libcello.org/learn/a-fat-pointer-library)

------
jart
That forceinline definition is just tip of the iceberg. It's so hard to define
in a way that works with different versions of GCC, -Werror, instrumentation,
MSVC, and profiling. If you care about portability, consider just not caring
and using static. Too much special casing code can actually make it harder for
people in weird environments to use your code, since something is going to
break it, and reading past the ifdef soup becomes the biggest obstacle.

------
iso8859-1
Really weird that he recommends Motif. Motif is not comparable to Web/Gtk/Qt
since it has only the most primitive widgets, and no 3D support.

I would propose doing a web-app if you really care so much about
compatibility. Web also allows for more custom widgets.

~~~
timw4mail
Is Motif actually available on modern Linux systems? And is there a Windows
port as well?

I find it difficult to believe that Motif is actually that portable.

Web apps are only as portable as the browser features they use, and the
browsers available for the platform. A primarily backend-rendered app, with
minimal Javascript is much more portable than the average SPA app.

~~~
yjftsjthsd-h
> Is Motif actually available on modern Linux systems?

[https://www.archlinux.org/packages/community/x86_64/openmoti...](https://www.archlinux.org/packages/community/x86_64/openmotif/)
lists as being updated 2020-01-05, and
[https://sourceforge.net/p/cdesktopenv/wiki/SupportedPlatform...](https://sourceforge.net/p/cdesktopenv/wiki/SupportedPlatforms/)
claims that CDE supports a rather lot of platforms (which implies motif),
although I'll grant that most of those probably haven't been tested in a
while.

------
fjfaase
One day, maybe when I am retired, I am going to develop a programming
language-agnostic algorithm specifying language with which you can generate
code for programming languages ;-). A kind om Mathematica, but than for
software.

~~~
MaxBarraclough
> programming language-agnostic algorithm specifying language with which you
> can generate code for programming languages

That's just a programming language tailored for transpilation, no?

Theoretical computer science shows us there is no 'one true representation'
for algorithms.

~~~
fjfaase
I have to admit that I was little joking about this. But I do think it is
possible to specify an implementation of how a complex operation can be
achieve by combining more primitive operations. With digital computers, data
is usually represented by certain representation of bits. An operation is
usually defined on these kind of representation. Think for example of an
operation for adding two numbers in a certain representation, resulting in a
result with a certain representation. In computers, two most primitive adding
operations are usually adding modulo some power of 2. But with these, we can
implement adding for much large numbers (also using other kinds of operations
and/or intermediate storage).

------
divan
Obligatory mention of Ten Years Reproducibility Challenge
[https://github.com/ReScience/ten-years](https://github.com/ReScience/ten-
years)

