Hacker News new | past | comments | ask | show | jobs | submit login
Embedded C: Struct and Union (Part 2) (atadiat.com)
59 points by YTusername on April 30, 2018 | hide | past | favorite | 51 comments



No. Please.

Struct bitfields may be useful to save memory (e.g. 1 bit for boolean-like variable) but don't rely on it when accessing register data or implenting protocols.

The biggest issue with struct bitfields is that its implementation is ABI-specific, meaning it is not even compiler-specific. For example, his "Application #2: Implementing Protocols" example is not portable across archtectures because the order of bits are swapped when compiling for a Big-endian CPU. We should use bitwise operations for this purpose.

Look at his last example, "Application #3: Access to MCU Registers". The SDK source code he mentioned does not use bitfield but bitwise operations to access register bits.


But for embedded software, ABI-specific isn't (usually) an issue. I have never seen an embedded project that changed either architectures ore compilers in the middle. In embedded, once you start the project, you're almost always using the same tooling until it dies. (Even moving to the latest rev of a compiler is rarely done, in my experience.)

So you can use bitfields, however they have to be defined to work with your compiler/architecture/hardware, and expect that they're going to continue to work for the duration. (Whether bitfields are a bad idea for other reasons is outside the scope of my comment.)


Bitfields are too useful of a feature not to use. The solution is simple: add a unit test to verify that setting one bitfield to all 1's will result in the overall 32-bit or 64-bit word having the expected value.

Add a comment explaining this too.

That will catch nonportable use in the future.


ABI-specific means you can count on the behavior as long as you know the target hardware and OS. If you’re twiddling registers, it’s highly likely you know the target.


They're perfectly usable between different architectures, languages, operating systems and compilers, if precautions are taken. I did this many moons ago between big and little endian machines both in C for various Unixes and Delphi under Windows to exchange data in a network. Of course we used ntoh() and hton() functions extensively, #pragmas to adapt data alignment and similar features under Delphi. The use of unions allowed us to use a single buffer for each packet received going to the relevant part after matching the header which described what was being sent. All worked perfectly for years on SCO/Unix, IBM AIX, Linux and Windows.


"ABI-specific" should be broader, more portable than "compiler specific"; different compilers must implement the same ABI on a given system (or else not interoperate). Of course bitfields are endian-specific, but two different compilers should agree on them for a given endian.


I'm not sure this guy really knows what he is talking about. Bit fields are almost universally hated and not allowed in style guides. The Arm guide he links to states:

> Please note that peripheral locations should not be accessed using __packed structs (where unaligned members are allowed and there is no internal padding), or using C bitfields.

Using bit fields for protocols is even more ill advised because often you may wish to implement the same code across different devices using the same protocol but the compilers will arrange the bit fields differently.

The only valid example he gives is the last one which is the correct way to do it: a struct made up of 32 bit values. This makes sure that everything is 4 byte aligned and the processor can access those variables in a single instruction.

If you wish to use the dense packing of bit fields but without the compiler ambiguity then you should use explicit bit shifting and bit masking (again shown in his last example)


Style guides are universally written for systems programming where RAM is in abundance. For embedded, memory usage matters and packing data in is a thing. It is ill advised because of the portability traps but portability is low on the priority chain if you just have to make it work. Any decent compiler will do what you want if you pad out the unused bits with a reserved field to guarantee alignment.

Reminds me of the C# jockey who complained endlessly about the bitfields (using manual masking) I put into an IoT data structure that was supposed to be kept as small as possible to minimize hosting costs. He had no clue about how to use bitwise operations to extract bits. They managed to bloat that into a horrendous monstrosity but I wasn't going to play their game. If they had their way, a half dozen bits packed into a 32-bit word would have consumed 20x more space with unnecessary schema boilerplate.


There's a distinction to be made between packing bits in a structure and using bitfields to access packed bits (as opposed to doing the shifts and masks manually).

From a compiler writer's perspective, bitfields are one of the worst things ever proposed for a standard. The nonportability of which way bits are laid out is the least of their problems. The logic of how you actually pack bits in the struct is surprisingly nontrivial. The underlying semantics of bitfields are so weird pretty much every time they could be used, they need to be special-cased--both in the standard sense and in the implementation sense. This means that using bitfields requires navigating a field of landmines of potential compiler (or, worse, spec) bugs.


The solution is just to make sure you have padding fields that result in an overall size that's a multiple of however many bits e.g. 32 or 64 bits. Use static_assert() to enforce the size.

See my earlier comment about unit testing too.


Well, to have a good chance of avoiding the landmines, you need to:

* Pack the bitfields in a struct all by themselves. Don't mix bitfields and non-bitfields in the same struct. (This means that you limit the scope of insanity to other bitfields, not potentially expanding it to other regular fields).

* Make sure that the underlying type of all the bitfields within the struct have the same type. And make it be unsigned.

* Pad the bitfields to the appropriate size of the type.

* Only access the bitfields by copying them to/from the appropriate local variable (you don't want to trip code up on the bitfields-aren't-quite-the-type-they-say-they-are logic).


I've only used bitfields for register access. I get the first 3 points you make. (I use uint32_t for all fields) What is the danger in #4 if you've followed #1-3?

You seem overly paranoid. Is there something from your experience that makes you nervous about bitfields?


I work on compilers for a living, which means I have a first-hand impression of the fallout of "the semantics seem clear until you actually think about all of the corner cases." I also have an appreciation for compilers blindly doing the wrong thing when you start mixing weird constructs together, and bitfields are very much a weird construct.

The last rule is a recognition that bitfields are weird constructs, and their interactions with other weird constructs are likely to cause problems. One example: try using a bitfield in a ternary lvalue operation, especially with clang.


You mean you didn’t use JSON???

The horror...


Agree. For Embedded coding suggestions you have to know what you are talking about really well otherwise it could be just misleading. Bit fields should be avoided even for resource restricted MCUs, period.


This article has some issues, the first struct in Example #1 is only compiled by GCC with a size of 2 if it has the packed attribute. Other compilers may behave differently which is exactly why bit-fields should be avoided. The behavior of bit-fields even changes between compiler versions, for example:

"""Packed bit-fields of type char were not properly bit-packed on many targets prior to GCC 4.4. On these targets, the fix in GCC 4.4 causes an ABI change. For example there is no longer a 4-bit padding between field a and b in this structure:

    struct foo
    {
      char a:4;
      char b:8;
    } __attribute__ ((packed));
There is a new warning to help identify fields that are affected:

    foo.c:5: note: Offset of packed bit-field 'b' has changed in GCC 4.4
""" (https://gcc.gnu.org/gcc-4.4/changes.html)

Also, the code in Application #3 isn't using bit-fields at all (the definition of GPIO_P_TypeDef is here: https://github.com/SiliconLabs/Gecko_SDK/blob/master/platfor...)


Many people talking about portability of but fields, but;

The assumed behaviour of bit fields is well defined between gcc and clang with the proper pragmas. It is therefore probably a non-issue. You even get better aliasing optimisations* using union/bit fields over just using bit shifts.

*) this means, if your union contains a way to read the whole field as ie uint_32t and also to read each field individually and also write struct as u32 via pointer, compiler can properly detect which actions cause read/write invalidation of each other instead of having to assume all volatile-like.

Use case 1: https://gist.github.com/donkeybonks/8749545

Use case 2 (with asm showing better codegen with union): https://gist.github.com/donkeybonks/11103152

nb these gists are ~4 years old because I don’t much care about micro-optimisation anymore.

nb2 it’s a really good idea to static assert the size of all your structs below their declaration for confidence and splash a few unit tests for ordering.


> The assumed behaviour of bit fields is well defined between gcc and clang with the proper pragmas.

From the GCC documentation[1], it says:

> Determined by ABI.

This means that it may be different when compiling for different architectures. How you can say it well-defined? Here is a good example [2] on the compability issue when using bit fields

[1]: https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumera...

[2]: http://mjfrazer.org/mjfrazer/bitfields/


> This means that it may be different when compiling for different architectures. How you can say it well-defined?

In an immense majority of case you won't be serializing or exchanging your bitfields over the network ; this is entirely a non-problem.


Yeah, this is a case where it matters. But you can also see the workaround is pretty straightforward here.


I was wondering recently whether I could depend on bitfields being packed in the "obvious" way. Few people would pass a judgement on it, but the desktop targets I tried all did it the same way. Any common setups that pack them differently?


It depends on ABI but for x86, x64 and ARM targeting Cortex, I got the same results on all using gcc-4.8.


I also got the same result for vc++/x64.


> You even get better aliasing optimizations* using union/bit fields over just using bit shifts.

My own experience with bit field optimizations was that they were worse in general for GCC and ARM compilers from 4+ years ago. LLVM was much better. For example if you tried to set a few fields that fit within a machine word to a constant value, it should be possible to set them in a single opcode but only LLVM takes the liberty to do this.


The behavior of bitfields had better work across gcc and clang without using pragmas too!!!


Generally ST Micro did some fairly abhorrent stuff around mapping their registers into bit field packed structures for the Cortex M line. One of the reasons I don't use their library for coding them.

The authors use of looking at lines of assembly language code as a metric of goodness for the choices was also problematic. If they coded that stuff for a VAX they would be treated to variable length bit field addressing (see [1] below, page 8-6). Easy to implement very complex bit manipulation in just a few assembly language statements :-).

A better measure is CPU clocks + Memory Accesses (typically measured in kilocoreseconds [another VMS joke])

[1] http://www.ece.lsu.edu/ee4720/doc/vax.pdf


Mapping registers into bit field packed structures is actually okay because most ARM Cortex M processors have bit banding enabled on all the peripherals. Unfortunately not all peripheral libraries use that functionality and GCC doesn't support it well.


Ya,but the newest Cortex-M chips (M7, M22, and M33) do not even support bit-banding, so I wouldn't rely on that being around forever.

Not quite sure why ARM removed it. Maybe there were performance costs to all instructions due to the extra hw/decode, while bit-banding only helped a small number of instructions.

If anyone has more info, would love to hear!


I remember a discussion on the Linux Kernel Mailing list regarding memory corruption on the ia64 architecture[1] because of a strange way GCC handled bit-fields. While oftentimes packed bit-fields are a decent way of representing/serializing data for memory-constrained devices or even network applications, it's always a good reminder how they can wreak havoc for scenarios where portability is important.

1: https://lwn.net/Articles/478657/


Wasn't that bug triggered in the case of adjacent non-bitfield 32-bit fields too?


In TXR Lisp's FFI, I developed detailed support for bitfields, alignment and packing. You can comply with any kind of struct type no matter how it's declared.

In the reference manual, I added a section on what I believe are the bitfield allocation rules used by GCC, as I empirically reverse engineered them.

https://www.nongnu.org/txr/txr-manpage.html#N-027D075C

Hope someone finds it useful.

Endian considerations are not covered in that section because that topic is documented elsewhere in the document.

In a nutshell, on big endian, bitfields are packed from the most significant end of the storage word, and on little endian, from the least significant end of the storage word. Thus the lowest addressed byte of the storage unit is filled first, and so on.


The only issue with C bit-fields is as KR says it: "Almost everything about fields is implementation-dependent" (bit order, endianness, padding size, etc.)


An example of how not to use bitfields: https://lwn.net/Articles/478657/

I am not sure whether the underlying spinlock_t struct is something in memory or mapped to a hardware register. But it seems like this was being used to save space rather than to access hardware.


Line 20 in Application #1 has an unfortunate typo.


:D

Also, I think the comments are off in lines 5-6 of Application #2: "0 .. 1" should be "1 .. 2", "2 .. 7" should be "3 .. 7"


Oh my oh! It's my bad. It's fixed now ;)

This was distracted for sure!


Interesting stuff.

I have been a C coder (I'm not so much any more) for about 12 years of my career, and never came across bitfields in all that time, even when doing embedded work.

One style question - why not use the stdint.h types all the way through?


> I have been a C coder (I'm not so much any more) for about 12 years of my career, and never came across bitfields in all that time, even when doing embedded work

These days it shows up in databases stuff, either because you have e.g. one of these flags per row in your database (or imagine keeping track of used/unused fields in an in-memory block of data), which might add up to millions of booleans, and packing them by a ratio of 64 is a big improvement in space and also time it takes to process that data because the denser the data is, the more of it you can read per second and the more of it can fit in CPU caches. Also you can then use SIMD instructions to process it quicker too, etc etc.


I've been a C++ programmer for almost as long, with some work in C. I discovered bitfields while writing emulators, in particular, building representations of registers. I'd suspect that if you haven't run into them in embedded work, that maybe they used macros and bitmasking instead.


I believe you are correct. Lots of that.


weird.... most embedded platforms do something like ( following from an actual lib header, which is full of the same ). The entire interface to the hardware is basically bitfields.

  typedef union {
    struct {
        unsigned REC                    :8;
    };
    struct {
        unsigned REC0                   :1;
        unsigned REC1                   :1;
        unsigned REC2                   :1;
        unsigned REC3                   :1;
        unsigned REC4                   :1;
        unsigned REC5                   :1;
        unsigned REC6                   :1;
        unsigned REC7                   :1;
    };
} RXERRCNTbits_t;


I've seen this a lot in Texas Instruments' bare-metal BSPs for their processors that are generated using HalCoGen. Their own proprietary compiler handles them as you'd expect. Other BSPs typically follow the coding style for their operating system. I've never seen QNX BSP drivers use bitfields. I've also never seen them in Linux drivers. QNX and Linux typically use variants of GCC to compile their drivers unless you pay for another toolchain.


Did you forget to include field width specifiers?


One place you can come across them is in IP header processing. Very first byte of an IPv4 header is defined as:

  struct ip {
  #if BYTE_ORDER == LITTLE_ENDIAN
        u_char  ip_hl:4,                /* header length */
                ip_v:4;                 /* version */
  #endif
  #if BYTE_ORDER == BIG_ENDIAN
        u_char  ip_v:4,                 /* version */
                ip_hl:4;                /* header length */
  #endif
As you can already see, they're kind of a pain to work with, so most people avoid them when possible.


isn't the ordering of bitfields undefined, therefore useless when trying to map out register bitflags?

I just end up creating configuration structs and mapping those to/from register values in some get/set functions as needed


No, the ordering of bitfields is implementation defined rather than undefined. So if you figure out a bitfield pattern that works on your compiler for your architecture it will continue to work for you. It is not very portable but for embedded applications that often isn't a problem.


>So if you figure out a bitfield pattern that works on your compiler

More like: if your compiler is documented to implement bitfields a certain way.


It is also not portable to new versions of the same compiler, or possibly different flags passed to the same compiler and version.


also, given this is "Embedded" C, the compilers often provide libs to target micros with bitfields, so it has a known definition.


Pretty sure nipples suppose to be nibbles in the code


It's fixed now ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: