
A guide to better embedded C++ - ingve
https://mklimenko.github.io/english/2018/05/13/a-guide-to-better-embedded/
======
95014_refugee
Almost nobody in their right mind would do this. (To be fair, many people
working with hardware are not in their right mind anymore. Try it, you'll
see.)

1: Don't use bitfields when interacting with registers. The language does not
guarantee the behavior you want. Or even the behaviour you _think_ you want.

2: 'volatile' is the contract that we (language users) have hammered out with
the compiler industry over the last 20 years; it means "don't get smart with
this memory location". Use it.

3: For any SoC of more than trivial complexity, the vendor will supply headers
that describe the hardware (register names and addresses, field names, sizes,
offsets, enumerations, etc.). You will hate their naming convention. You will
hate the way they encode field sizes, offsets, masks, etc. Deal. Because the
alternative is you, or your intern, making dozens of mistakes attempting to
transcribe from the documentation. Here's a hint - the better vendors auto-
generate these headers from the VHDL. They are often right even when the
documentation isn't.

4: It's trivial to template most interactions with registers. See, for
example: [https://github.com/steffanw/laks](https://github.com/steffanw/laks)

5: A little investment in your abstractions, even if the lower levels look a
bit more complex or verbose, helps make your application logic easier to
follow.

The discussion here about re-ordering is largely moot; if you're having issues
with the core re-ordering your transactions, either your core is børked or you
haven't mapped your peripheral space correctly. The former is pleasantly rare,
the latter common enough that your compiler vendor is going to ask you about
it before they give you a bug number.

~~~
stevenhuang
Choosing between bitfields and masks for register twiddling was something I
looked into deeply. Like the article, I wanted the readability of bitfields,
but after reading about all the pitfalls and how loosely bitfields specs were
defined in the C89 standard, I decided to go with masks instead.

Here was a great summary I found from a comment explaining the dangers of
bitfields.

Seeing it spelled out like this made it clear which to choose, lest you're a
fan of nasal demons :).

"The royal mistake is to use bit fields in the first place. The following is
not specified by the standard:

– whether bit field int is treated as signed or unsigned int

– the bit order (lsb is where?)

– whether the bit field can reach past the memory alignment of the CPU or not

– the alignment of non-bit field members of the struct

– the memory alignment of bit fields (to the left or to the right?)

– the endianess of bit fields larger than one byte

– whether plain int values assigned to them are interpreted as signed or
unsigned

– how bit fields are promoted implicitly by the integer promotions

– whether signed bit fields are one’s compliment or two’s compliment

– padding bytes

– padding bits

– values of padding bits.

– and so on."

Source: [https://embeddedgurus.com/stack-
overflow/2009/10/effective-c...](https://embeddedgurus.com/stack-
overflow/2009/10/effective-c-tip-6-creating-a-flags-variable/#comment-2390)

~~~
stinos
So do I understand it correctly that you are saying the OP's code (or e.g. the
one like in AceJohnny2) is not guaranteed to work correctly according to C89?
Any idea about C99?

~~~
Gibbon1
My thoughts

Probably will work with a specific compiler. If it fails it'll likely fall on
it's face fail. Which with embedded is the best type of failure.

Also you probably can force the behavior you need with pragma's (not portable)

Portability is usually completely unimportant in embedded.

C89 just don't.

C++ in embedded. Lord god no, if you want to do that use rust instead. Or
embedded python or lua, anything but C++.

~~~
pjmlp
> C++ in embedded. Lord god no, if you want to do that use rust instead. Or
> embedded python or lua, anything but C++.

First Rust needs to reach C++'s tooling maturity on embedded space.

When there is High Integrity Rust certification, compiler backends for all
major CPUs, SDK support on OSes like INTEGRITY or Mbed, UI embedded tooling
for IoT displays, maybe we can start seeing production code there.

~~~
Gibbon1
> First Rust needs to reach C++'s tooling maturity on embedded space.

Yeah and that's soon. Anyone starting an embedded project in C++ is making a
grave mistake.

> Mbed

Those guys make a bad mistake and will pay for it.

~~~
pjmlp
Touching C is an even bigger one.

It took C++ about 20 years to reach maturity in this space, with help of
several OS and compiler vendors.

You might be willing to bet the house on Rust, most business aren't, not in
the current state of its toolchain.

In 10 years from now, yeah I can clearly see it.

~~~
v_lisivka
I use Rust for embedded right now. Part of project is in C, part is in C++,
part is in Rust. NRF52 and i.MX6. Can you point me to the problem with Rust or
toolchain, which I missed, so I will be aware? Thank you in advance.

~~~
pjmlp
Some ideas:

\- LLVM backend is limited regarding the supported CPU architectures;

\- Rust compiler is not certified for high integrity domains, where human
lives might be endangered;

\- Features like SIMD are still nightly

\- Lack of integration with closed source toolchains from OEMs for mixed
debugging and other development workflows, e.g. Keil, Microchip

These issues might be complete irrelevant, or showstoppers, depending on the
company.

~~~
steveklabnik
SIMD is stable in five weeks, incidentally.

~~~
pjmlp
From what I understood it is just phase I from SIMD support, right?

~~~
steveklabnik
Yes, but also the most foundational part. There’s already high level libraries
written on top.

~~~
pjmlp
Thanks, will have a look into it.

~~~
steveklabnik
Specifically what I'm referring to is [https://doc.rust-
lang.org/beta/std/arch/](https://doc.rust-lang.org/beta/std/arch/) (last two
modules are stable, note this is beta docs for the next release)

Next step is [https://doc.rust-
lang.org/beta/std/simd/index.html](https://doc.rust-
lang.org/beta/std/simd/index.html) but you can use
[https://crates.io/crates/stdsimd](https://crates.io/crates/stdsimd) in the
meantime to try it out, as they're the same.

Stuff like [https://crates.io/crates/faster](https://crates.io/crates/faster)
is already built on top of this and is just waiting for the initial intrinsic
stabilization, too.

------
AceJohnny2
> _Embedded is a wonderful versatile world which allows developers to create
> various interesting everyday devices (in collaboration with the hardware
> team)._

ROFL!

I've been working in embedded for over a decade, and it's always quagmire of
buggy hardware patched over with C code, half of whose programmers have
actually transitioned over from hardware design and don't know what they're
doing.

With that in mind, I'm going to assume the author was sarcastic.

I love the field, for getting to work deep inside and very early on bleeding-
edge hardware, but by god most days feel like being in the coal mine of
computer systems.

~~~
keithnz
I've done embedded stuff for 25+ years. I find that it's always far better to
stick with simple C code. Because it's the native language of the embedded
world. It has a lot of good advice on how to write robust code. If needed,
layer something like lua on top, but for the most, stick with simple C and
proven embedded techniques. The article mentions a HAL. That's definitely a
must. But don't get too fancy with trying to do that. Do I keep wanting to use
something else? yes. Am I fan of C? No. Is my embedded device really just a
scaled down PC with an OS and a MMU and good amounts of memory? use what ever
language you want.

Rust is promising, but will never be a good option on low level devices.

I have seen some nice stuff done with Forth

~~~
Narishma
> Rust is promising, but will never be a good option on low level devices.

Why not?

~~~
kabdib
The last low-level device I shipped code on, we had 2K of program space and
about 1K of RAM. Started writing in vanilla C, crunching pieces into hand-
written assembly and rewriting fat code when space got tight. 6 bytes free in
the final code. I might be going out on a limb when I claim that Rust won't
touch this kind of system . . . but probably not.

The system before that was largely just a USB driver, an event loop and some
interrupt handlers, supporting a large amount of ported-over and mostly
already existing stuff in C that did the interesting and product-defining
algorithmic heavy lifting. I guess that we could have written the
infrastructure in Rust, but it's hard to see the win in using two languages
(the algorithmic stuff was not going to get a rewrite).

Rust isn't great for other reasons, too. Finding good embedded systems
programmers is hard enough; require something bleeding edge like Rust and the
population of candidates plummets to "well, we _might_ be able to find and
hire one person this year" territory.

~~~
steveklabnik
Rust can meet those requirements, though there may be other ones that you
didn’t mention it that would disqualify them. With similar techniques as the
C, the smallest Rust executable ever produced was 151 bytes.

The CPU architecture is more likely a problem. And, as you say, the stuff
that’s not purely a technical requirement. We’ll get there!

~~~
keithnz
smallest device I program in C has 20 bytes of RAM

~~~
steveklabnik
What device is that, and what do you do with it? Just curious.

~~~
keithnz
sorry, the device itself has 25 bytes :) But I was left with 20 for program
logic.

[http://ww1.microchip.com/downloads/en/DeviceDoc/41236D.pdf](http://ww1.microchip.com/downloads/en/DeviceDoc/41236D.pdf)

PIC12F508.

It did a number of things. It had logic for controlling power on a low power
device, coordinating between a number of other micros. It also acted as a last
ditch watchdog system that triggered a recovery of the device

~~~
steveklabnik
Thanks, neat!

~~~
pjmlp
Those PICs can also be targeted with Basic and Pascal, so that you get an idea
of the competition. :)

[https://www.mikroe.com/mikropascal-pic](https://www.mikroe.com/mikropascal-
pic)

[https://www.mikroe.com/mikrobasic-pic](https://www.mikroe.com/mikrobasic-pic)

------
AceJohnny2
> _Unfortunately, a lot of embedded low-level libraries deal with registers
> this way:_
    
    
        *(volatile std::uint32_t*)reg_name = val;
    

Who still writes code like this!? (maybe the ST folks responsible for the
CubeMX BSP for STM32. Ugh...) Most places have finally moved on to using
structs and bitfields. GCC and Clang support them well and consistently on the
various architectures I've worked on (MIPS, some weird DSP, and a slew of ARM
variations).

Some proper C code will define the 32-bit register like this (usually in a
header provided by the hardware vendor, generated from the HDL):

    
    
        typedef union some_reg {
            uint32_t raw;
            struct {
                uint32_t bits1:4;
                uint32_t bits2:3;
                uint32_t rsvd:5;
                uint32_t bits2:20;
            }
        } some_reg_t;
    
    

Then use it either like:

    
    
        some_reg_t reg;
        reg->raw = READ_REG(reg_addr);
        reg->bits1 = 4;
        WRITE_REG(reg_addr, reg->raw);
    

Or:

    
    
        volatile some_reg_t *reg = (some_reg_t*)reg_addr;
        reg->bits1 = 4;
    

The latter assumes the register address is mappable, and that programmers are
aware that writing the field performs a Read-Modify-Write of the whole
register. Programmers being fallible, it's a more risky approach that I don't
see often, despite how concise it is.

~~~
kbumsik
> Who still writes code like this!?

Believe or not, a majority number of ARM MCU vendors does that with CMSIS
headers in my experience (ST, TI, NXP, Nordic, EFM32...). To be more precise,
the registers are accessed using mask and shift macros. I've work on many ARMs
too but only Atmel's header files use struct bitfield you mentioned.

But yeah, I've also seen that most of non-ARM vendors use struct bitfield.

~~~
AceJohnny2
Hm, is armcc still a thing? I don't remember what it guaranteed (or not) for
bitfields. If it doesn't work consistently, that would explain that they're
using masks and shifts.

That, and because the C standard leaves it implementation-defined.

~~~
monocasa
The ARM ABI defines a lot of useful semantics for bitfields (particularly when
combined with volatile), but GCC perennially regresses and breaks them so next
to no one uses them.

------
jhack
The author says setting a register like this is hard to read and maintain:

    
    
        *(volatile std::uint32_t*)reg_name = val;
    

And the author's solution is to replace that one line with this:

    
    
        struct DeviceSetup {
            enum class TableType : std::uint32_t {
                inphase = 0,
                quadrature,
                table
            };
    
            std::uint32_t input_source : 8;
            TableType table_type : 4;
            std::uint32_t reserved : 20;
        };
    
        volatile auto device_registers_ptr = reinterpret_cast<DeviceSetup*>(DeviceControlAddress);
    

I'm not really seeing the readability and maintainability advantages here.

~~~
tropo
Both methods are buggy, but the second is also needlessly complicated.

The first possible bug is aliasing. The type of reg_name and
DeviceControlAddress are not given, so it is possible there isn't a bug.

The second bug relates to memory ordering. Adding "volatile" will tell the
compiler to do things in order, but the compiler will not pass that
requirement on to the CPU. The CPU itself may reorder things. I saw this
affect an embedded system back in 1998, and the problem has only become more
common in the 2 decades since. Generally you will need assembly code to avoid
the bug.

~~~
magila
The re-ordering you were seeing may have been due to a lack of sequence points
between the ordered operations. For example: If you have two 32 bit registers
which correspond to a single 64 bit value and which must be read in a specific
order to ensure consistency, then you cannot write (code simplified for
clarity):

    
    
      int64_t val = (*reg0 << 32) | *reg1;
    

The compiler is allowed to re-order the register reads even if they are
volatile pointers. To guarantee a specific order you need something like:

    
    
      int64_t val = reg0 << 32;
      val |= reg1;
    

Sequence points are one of those dark corners of C/C++ which most people don't
know about because they rarely matter unless you are dealing directly with
hardware.

~~~
burfog
No, a problem with sequence points is at the compiler level, just the same as
"volatile". Assume the compiler doesn't reorder anything. Maybe you turn the
optimization off. You even inspect the assembly code, and all the operations
are there in the correct order.

You can still hit the problem. In my case, the code ran fine until we got an
upgraded CPU. (from MPC6xx series to MPC74xx series)

Suppose you store to registers at 0xf0000ffc, then 0xf0000104, then
0xf0000ff8. You need the stores to happen in that order. The instructions
execute in that order, creating 32-bit chunks of data headed out toward the
memory bus. There are multiple write buffers however, so they can go in
parallel, each getting a distinct write buffer. They then head out onto the
memory bus in some randomish order determined by timing issues internal to the
CPU.

In my case, I had to add "eieio" instructions between each pair of accesses
for which ordering mattered. FYI, that is a real instruction, supposedly
meaning "Enforce In-Order Execution of I/O".

~~~
comex
I found this surprising so I looked it up. According to the MPC7410 user
manual[1], assuming the register bank is mapped as caching-inhibited, a
sequence of stores is required to take effect in program order without needing
`eieio` between them. However, a store followed by a load to a different
address can be subject to reordering and does need `eieio`.

The ARM architecture does this more sanely. Page table entries have a flag
that lets you choose between regular "Device" memory or "Strongly-ordered"
memory; the latter performs all memory accesses in order without needing any
synchronization instructions, and is more convenient in simple situations.

[1] [https://www.nxp.com/docs/en/reference-
manual/MPC7410UM.pdf](https://www.nxp.com/docs/en/reference-
manual/MPC7410UM.pdf) \- Table 3-8

------
tcbawo
I was surprised not to see anything about RAII techniques
([http://en.cppreference.com/w/cpp/language/raii](http://en.cppreference.com/w/cpp/language/raii)).
Tightly-managing resources via object lifecycle is probably one of the killer
features that C++ brings to embedded.

~~~
stevenhuang
In a lot of embedded systems, the penalties associated with RAII and other
runtime goodies like vtables are often prohibitive, so they're commonly
disabled. It's a factor in why C continues to be the lingua franca instead of
C++ in the embedded space--it's simpler and still "good enough" not to warrant
a switch-over to C++ (in addition to many other reasons).

The compile-time features of C++ are a great reason to switch over though, and
for devices with greater resources, C++ becomes an even better fit.

~~~
proverbialbunny
Speed is not the issue for embedded development.

The problem with C++ is it has a few language features that are not
deterministic making standard C++ not ideal for real time code. Most of the
stl uses these features, so most vanilla C++ is disabled for many embedded
environments.

Also, there is a push for C++23 to have deterministic exception handling. If
this passes (and it likely will) most of the stl will become available for
embedded, and when that time comes C++ is going to become far more of a viable
option than it currently is.

If you're curious you can read about it here: [http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2018/p070...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2018/p0709r0.pdf)

~~~
aseipp
Nobody who uses C++ for embedded development uses or cares about the STL
though, and its absence isn't exactly surprising for a number of reasons...

The selling point is features like RAII and templates, which can greatly
reduce duplication and improve readability. Sure, you technically can't use
the STL for whatever reasons, but practically it doesn't actually matter at
all, and its absence comes with the territory for everyone involved.

Being able to use more of the STL in contexts like this would be nice for
future projects, perhaps.

~~~
BenFrantzDale
Keep in mind, the STL isn't limited to the containers, there are tons of
algorithms that are very useful for embedded programming that don't throw
exceptions, are vetted, and are as efficient as hand-written algorithms.

------
tripletao
So maybe everyone but me knows this, but what's the behavior of bitfields on
the bus? Like, if r.x is the bottom 6 bits of a 32-bit register, then does the
compiler generate an 8-bit access or a 32-bit access?

What if it's bits 13 through 18? An unaligned 16-bit access? Two 8-bit? A
32-bit? Where is this specified?

Depending how your peripheral is implemented, this absolutely can matter.
Nothing stops you from building devices where the low byte of

    
    
       *((uint32_t *)x)
    

doesn't work like

    
    
       *((uint8_t *)x)
    

and people do. I always just code the 32-bit access and pull the bits out
myself, because I know what will happen.

This is even before considering stuff like a GPIO output implemented with two
write-only registers, where 0 leaves the bit unchanged and 1 sets/clears it.
The result of layering a bitfield on that does not seem intuitive to me.

The article also says that you should build abstraction layers for your
peripherals, instead of making all your code write to registers everywhere.
That seems (a) pretty well-settled as true, and (b) entirely independent of
whether you do the register accesses with bitfields or explicit shifts and
masks.

~~~
JoachimSchipper
Yes, what bitfields do on the bus is very undefined.

Admittedly, a compiler is free to implement a dereference of uint32_t by
loading four separate bytes, too, hence the imprecise "very undefined" in the
first line...

~~~
tripletao
I just tried this with IAR for Cortex-M3. I didn't check thoroughly, but so
far it looks like right-aligned bitfields of eight bits or less generate an
LDRB (8-bit). All other bitfields generate an LDR (32-bit), even if they lie
entirely within a byte.

I see the makings of a fun trick to play on a colleague. I don't see anything
I'd be too inclined to use in production code.

------
mschwaig
I picked C over C++ in an quasi-embedded environment given we would have a
team with highly variable C/C++ skills.

I was worried people would get too fancy with C++. Probably it would have been
fine if we would have taken away most of the language.

It turns out the problem with C in that setting was that good C code seems to
require a lot of ceremony and conventions for things like error handling and
modularity.

I hope that Zig ([https://ziglang.org](https://ziglang.org)) will become a
viable alternative in that space. It's quite a small language, more comparable
to C than to C++ or Rust. It's meant to result in similar LLVM bitcode as
well-written C, while being highly compatible with C and and giving you some
extra safety guarantess over C as well.

~~~
youdontknowtho
Help me out... Why not use a C or Algol derivative syntax?

~~~
mschwaig
I think the syntax is not too far off from C.

I am not affiliated with the project, only supporing it on Patreon, so I
cannot speak about the designer's intentions with Zig specifically.

Generally in language design, if something works the same way as in other
languages, that's great. You can re-use the keywords/syntax and build on
peoples prior knowledge. However if you have some feature that actually works
quite a bit differently from other languages the syntax should probably be
different. Otherwise people might feel at home right away on a superficial
level, but actually get bitten by the surprising ways in which the new thing
works differently to the one they are used to later.

------
yason
The article could do without any references to C++. Surely writing long,
imperative lists of register writes makes the reader's mind boggle. C has more
than enough tools to encode all that into something that is maintainable and
readable. Because if you just take C++ to implement some handy, convenient
constructs you also invite the closet full of demons that come with the
language.

And quite frankly the abstractions given in the article are quite trivial. Any
decent programmer would create similar abstractions as a part of normal course
of work because decent programmers know they are forgetful and they want to
spend their modest number of brain cycles on the higher levels of thinking
instead.

~~~
billforsternz
Superb comment. Good programming is about building effective layers of
abstraction. Bit twiddling is fine, but encapsulate it. Mass setting of
registers should be presented as a readable table (possibly an instantiated
structure) with no special executable code in sight.

------
bigcheesegs
> N.B. Accessing registers modified by the hardware may be treated as a multi-
> threaded application. Therefore, it is worth considering using
> std::atomic<T> instead of volatile T.

This is incorrect. Compilers are allowed to reorder and combine atomics as
long as they follow the rules. See: [https://github.com/jfbastien/no-sane-
compiler](https://github.com/jfbastien/no-sane-compiler)

~~~
tripletao
If I understand correctly, that makes the article grossly wrong. Like, if you
have a serial port transmit register and execute

    
    
       SERIAL_TX_DATA = 0x01;
       SERIAL_TX_DATA = 0x02;
    

to enqueue two bytes to transmit, then the compiler is allowed to skip the
first write entirely.

~~~
Gibbon1
A naive implementation of the SERIAL_TX_DATA macro will totally give you
behavior like that.

~~~
tripletao

       #define SERIAL_TX_DATA (*((volatile uint32_t *)<addr>))
    

is pretty naive, and correct. The article's advice seems not to be.

~~~
Gibbon1
Friend smarter than me ran into a bug where when DMA was active on a multi-
core uP you needed to do a read op between two writes to a IO address or the
first write wouldn't happen.

I think on an ARM processor if it's normal memory you can't guarantee
ordering. But ARM supports a device memory type which disables caching and
reordering memory accesses by the hardware.

~~~
tripletao
I think ARM's "strongly-ordered memory" concept is supposed to solve this? Was
your friend's problem intended behavior, or a silicon bug?

[http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/DA...](http://infocenter.arm.com/help/topic/com.arm.doc.dai0321a/DAI0321A_programming_guide_memory_barriers_for_m_profile.pdf)

~~~
Gibbon1
I think the DMA accesses were confusing the memory controller. He said that
sort of thing was common on internally developed silicon. They won't do a spin
if there is a work around. Also my experience as well. I've had to deal with
peripherals with an async clock where access timing becomes important.

Observation: CS appears to teach students that code with side effects is evil.
AKA hide you side effects behind os calls. With embedded code side effects are
important.

------
berti
Nothing in the presented article that can't be done just as (or more) easily
in C. The article really just presents the absolute basics from an intro
embedded systems class.

~~~
minipci1321
yes, I somehow expected to see something like Kvasir:

[https://github.com/kvasir-io/Kvasir](https://github.com/kvasir-io/Kvasir)

------
Khoth
I'm not a fan of anything that writes to a register just by assigning a value
to some variable.

I much prefer having some kind of READ_REG/WRITE_REG inline function (or macro
at a pinch). Sooner or later you're going to find yourself wanting to log all
register accesses, or run the firmware against a model of the hardware, or
something, and when you do it's a great help to have to handle it in only one
place.

------
youdontknowtho
It would be great if everyone who knows better than the author would write
articles instead of just critiquing them. I'm not trying to be facetious. It
would be awesome if more of the advanced developers here would write things
like this and share their knowledge.

