Hacker News new | past | comments | ask | show | jobs | submit login
So you want to build an embedded Linux system? (jaycarlson.net)
535 points by cushychicken on Oct 16, 2020 | hide | past | favorite | 111 comments

> Yocto is totally the opposite. Buildroot was created as a scrappy project by the BusyBox/uClibc folks. Yocto is a giant industry-sponsored project with tons of different moving parts. You will see this build system referred to as Yocto, OpenEmbedded, and Poky, and I did some reading before publishing this article because I never really understood the relationship. I think the first is the overall head project, the second is the set of base packages, and the third is the… nope, I still don’t know. Someone complain in the comments and clarify, please.

Bitbake is the generic build system around which openembedded recipes (for particular packages) were implemented. Anyone could create a distro using those recipes, dates to the early to mid 2000s.

Yocto was/is a project of the Linux Foundation, basically a working group, where they looked at the state of embedded linux and said, "We want to contribute to this project with documentation and more recipes", starting externally but with hopes to get it mainlined. Poky was/is their reference distro for this effort.

Nowadays all the packages contributed by the Yocto project have been consolidated into openembedded, but poky remains the reference distro.

tl;dr: Yocto is first and foremost an organization of people. Bitbake is the build system. OpenEmbedded is a community of distro-agnostic build system recipes. Poky is a distro maintained by the Yocto organization utilizing OpenEmbedded recipes.

> But here’s where Yocto falls flat for me as a hardware person: it has absolutely no interest in helping you build images for the shiny new custom board you just made. It is not a tool for quickly hacking together a kernel/U-Boot/rootfs during the early stages of prototyping (say, during this entire blog project).

Let me suggest looking into the `devtool' utility. It's a yocto utility that enables on-the-fly work that the author enjoyed with buildroot. For instance, running `devtool modify virtual/kernel' will place the configured kernel source in a workspace directory where you can grep and modify and patch to your hearts content; on a new board, I might work for weeks in this state bringing up a new board as I patch drivers, or develop patches to play out-of-tree code over the mainline kernel. When I'm happy with my changes, I add them back into my recipe and test it by disabling the temporary workspace `devtool reset virtual/kernel' and building my recipe from scratch again.

Yocto has other amenities that ease iteration on existing boards. For one, it straightforward to cross-compile my python3 extension modules in a recipe in one base layer for my product family. Later, when I'm spinning up a derivative project, I can setup a product-specific layer to override the CPP flags, configurations, or patch the source to better target my board.

The yocto learning curve may be steeper, but the benefits of proper dependency tracking and layers far outweigh the drawbacks. At this point, if I use a board from a vendor that ships a buildroot BSP, I'll take a day to port it to yocto before moving further.

I agree completely. Yocto/OE gets a bad reputation as overly complicated, especially because a lot of people are doing this with hobbyist boards where they just want blinkenlichten. Yocto is definitely not easier for quick spinup on weekend projects.

However, if you're doing this full-time, and you want to do anything remotely complicated (you will), and especially if you have multiple products (you will), you start yearning for OpenEmbedded. It takes some time to learn, and the abstractions are hard to understand outside-looking-in, but it's well worth the effort.

And if you're working in userspace, use the SDK!

bitbake -c populate_sdk <image-name>

This will output a self-extracting archive in your deploy directory. You can install this mostly self-contained SDK to a directory of your choosing. Then source the environment file it installed to get the correct build variables setup (CC, CXX, CFLAGS, LD, LDFLAGS, PATH, etc). From there, the variables will be setup to use your embedded image's sysroot. I do almost all of my userspace development using the SDK. If you're using CMake to build your native code, it will pick up the variables for you.

There are some gotchas with it, in particular, there are some modifications that might be necessary for your configuration to get the SDK to include more things relevant to your build. Probably the most notable is that static libraries are not output into the SDK by default.

Echo the devtool advice.

I've used bare crosstools, buildroot and now yocto in production projects spanning 14 years. Personally I find it a lot faster to move with yocto once you grasp its principles and structure: especially if you have several related product lines that can be expressed as combinations of layers.

I've used Buildroot for the last 8 years and Yocto/OE for the last year.

There is a significantly steeper learning curve for Yocto when compared to Buildroot. Buildroot is faster for the initial build, but often slower than Yocto after the initial build.

Here's what I like about Yocto:

1. It forces you to be organized, everything has a home and things can't conflict with each other.

2. By using Yocto's shared state cache, you can have one central build server and automatically fetch pre-built artifacts on other computers. With this I can get a developer hooked up with everything that they need to build a full system image on their computer in just a few minutes -- and completely build the image at that time.

3. I am confident that incremental changes are built correctly. If you change a build-time parameter of a package in Buildroot, things which depend on that package are not rebuilt. This is not the case with Yocto. This can also result in unfortunate rebuilds of many packages just because of a minor change to, say, glibc. I know that they do not need to be rebuilt but Yocto does not.

4. Buildroot puts built items in a single staging directory. Package install order differences mean that you can overwrite files accidentally. Consider /usr/include/debug.h in two different packages, or something like that.

If you are not explicit with dependencies, the build may actually succeed but it may not be deterministic. If package A happens to be built before package B, you're golden. This does not always happen, and sometimes this is not found until you do a clean and a rebuild. Yocto forces you to be explicit -- the build tree only includes artifacts for recipes which have explicitly been defined.

5. Yocto can use the same tree and shared state cache to build multiple images for a given product without having to clean the world.

I loved buildroot -- it was fast, nimble, and easy to use. It also lets you cut corners and find yourself in situations where builds would unexpectedly fail after a clean. I am also very happy that I took the time to learn how to effectively use Yocto.

These are all excellent points, it just saddens me that embedded has still not moved past the "recursive Makefile" phase.

Part of that fault lies with the hardware manufacturers. They are invariably hardware companies that don't value software. They pick an open-source project like OpenWRT or Buildroot and literally hack at it until the resulting monster can build an image for a reference system that can stay up just long enough to pass a few end-to-end tests. And the damage is incredible. Nothing is spared mutilation at the hands of their incompetent developers, the entire software stack from u-boot, over to Linux, across essential system services and concepts all the way up say the LuCI interface OpenWRT ships is modified, mostly haphazardly to support one specific configuration. The resulting garbage is frozen in time, zipped up and thrown over the fence to their partner companies trying to turn their hardware into a portfolio of consumer products increasingly defined by software first. It's hard to describe the level of stupidity; they will base their shitty proprietary Linux modules on LTS versions of the kernel, then never update anyways! They adopt "standardized" upstream things like nl80211, then require you use all the proprietary interfaces they previously had and just stuffed into some side-channel.

The other problem is using something like OpenWRT or Buildroot in the first place. This is not to disparage these projects, obviously these are mostly driven by hobbyists who are free to use their time however they want. But there is certainly a tendency in these projects with adherence to arbitrary, mostly terribly old and shitty Linux standards and 'practices' grossly unfit for what you would want in an reliable embedded systems. There is a focus on breadth, expansion and freedom instead of relying on robust building blocks. They try to collect the entire history of open-source software and bend their build systems to make and package the original .tar.gz downloaded from some FTP server. Shell scrips rule supreme, not just in the build but often on the resulting firmware images. A lot of these choices are supremely unfit for the purpose of making long-term supported firmware for embedded devices.

Lots of praise here for Android. Sure, they started with the same recursive Makefile stuff in their original startup roots. But they iterated. They saw the problems. A monumental achievement in the field to have a build system that will first draw up a plan, then go about executing it with considerable chance of success instead of failing randomly in the middle of some lazily recursed Makefile. They critically look at all the pieces that build and end up running on the device; they standardized on the Clang toolchain, they don't try to give you a choice of three compilers and four standard libraries. They didn't scare away from the long haul of pushing that singular toolchain across the entire stack; being able to compile the Linux kernel with Clang is the result of foundational Android work. They revolted at the sight of glibc or uclibc and build and maintain their in-house libc, on a tight feature leash. Their focus with bionic isn't to be truthful to some obscure corner of a POSIX standard circa 1983, it's to enable things like a safe allocator or uniform crash handling and report generation across all of userspace. Any sort of shell is intentionally hamstrung and scripts absent. No patience for oldschool crap like SysV init here.

Just as a data point. Google WiFi is built with Qualcomm WiFi radios, but it uses none of Qualcomms proprietary software. They preferred to use the open-source upstream drivers. Zero confidence in any of Qualcomms "software".

> They revolted at the sight of glibc or uclibc

I thought the Android team wrote bionic just to avoid the LGPL. Do you know for sure that there's more to it than that?

Bitbake is a bit further than 'recursive makefile', it's much more along the lines of a package manager like nix or portage (though it's less well designed in most aspects, the build file syntax is insane and debugging it is a nightmare. I think it's grown the necessary features instead of stepping back and understanding the problem). And it's important to realise it's mostly focused on building packages like a linux distribution, even if the systems are rarely managed by installing/uninstalling packages. This is where the whole idea of taking an upstream tar or repo and patching it together comes from, and it makes perfect sense when 90% of your workflow is 'take someone else's code and modify it slightly to integrate into your system', especially when that code gets updated (it's still not painless, but you have some hope of doing it). When you're google and can afford to rewrite huge parts of the system and have no need for compatibility then you can make a more nicely integrated system, but most embedded applications cannot afford this.

IMHO with modern architectures these days, and from the last couple of years in particular, if you start feeling the need for something like Yocto I'd rather use a full blown distribution like Debian or Arch Linux if its available for your platform.

I have tried Yocto through the years and deployed a bunch of projects with it. Although it gives you the sense that it has a more coherent environment than Buildroot, I find that it is difficult to maintain since it has too much unnecessary complexity for embedded targets, and sooner or later that ends up getting in the way and biting you back. Not enough visibility, which is crucial for embedded systems. More so if the systems have network connectivity and need to be properly secured.

It could be that I am too acquainted to Buildroot's pretty flat and its spartan, no frills architecture. With highly constrained devices that is an advantage. Automating configurations takes low effort, they build quite fast, and you can maintain testing and deployments easily with your own bash or python scripts. There are few places to fiddle with variables and you can easily assemble images on the fly or through overlays. Many times you just need to modify the configuration file and rebuild.

The last years I have been progressively using Buildroot along with docker. They complement each other well and you can create base systems and then grow them tailored to your application taking advantage of the docker incremental build system. I regularly maintain cross-compilation and qemu-based docker image toolchains for my target architectures that way. They can be recreated and tracked reliably, 100% of the time. I use them to build the application images or just augment them with a buildroot overlay with the extra files and then also deploy them remotely through the docker facilities.

Using a slow release distro as debian (either directly or as a reference) also have the benefit of compatible versions. I.e. no need trying to figure out how to get both $A and $B compile against a common library.

FWIW, I still use ltib (originally a Freescale open-source build environment) and it's mostly great. Fewer layers of abstraction, builds all the source modules using essentially the "linux-from-scratch" system, with .rpm intermediaries that get installed to a rootfs.

I played with Bitbake, but the learning curve seemed much worse than ltib.

Edit: s/yocto/Bitbake/

Does anyone have experience using either BUildroot or Yocto to build a virtual appliance? That is, a VM image that runs on a typical x64-based hypervisor. I'd be particularly interested in experience building an immutable image with an A/B setup for updates, in the style of Chromium OS or the old CoreOS (now Flatcar Linux).

I was able to load up yocto's standard VMDKs into HyperV on my windows desktop, but AWS's import tool barfed on it. I build an embedded linux and wanted to use AWS as a hypervisor for larger scaled testing. (It was fairly easy to setup a custom machine config in yocto. (Require tune-core2.inc and you're off to the races.) I use the variable overrides system to tweak build build arguments on some packages to have them stub out hardware in the virtual images.

I've found though that making your own image classes to be the most direct way of formatting your image if you need something exotic. The image classes system is kind of fun - you can keep stacking on more reusable classes that act as a kind of pipeline.

To get it working on AWS I ended up doing what I needed to do by spawning a VM with an off the shelf AMI, attaching a second disk, dd-ing my image on, and then capturing that as an AMI. (I'm kind of annoyed that I couldn't find a proper API for this, but maybe I'm not looking in the right place. Their import tools all want to understand things about your image and mess with it, and I just wanted to send them a flat disk image.) This process was an annoying enough slow down for testing that I ended up making an image that would boot up just enough to get networking up, then fetch the real image from S3 based on directions in user-data, over-write itself, and reboot. If going with newer AWS instance types, don't forget to include their kernel modules, and have fun debugging when it doesn't boot :)

Yes, Buildroot comes with a bunch of configs you can use to build images that run on x86_64 and qemu.

'make list-defconfigs' will list everything that is available

The A/B update thing typically works by having two disk partitions and then using your boot loader to switch between them and only updating one or the other. You can probably write a u-boot script to track failed boots and switch between images or something like that though I've never ventured down that road.

buildroot has defconfigs for building uefi or bios images capable of booting on most any hypervisor. you can easily extend them.

you can also easily strip them down and get itty bitty tarballs suitable for import as a docker container images


Standing ovation! This is a decade's worth of knowledge in a single article. I'll be forwarding this one to coworkers and acquaintances when they ask about how to do this kind of stuff.

I can understand why the author focuses on cheap entry-level parts. It's cheap and it's fun. He also kind of dismisses the major SOMs for this reason so maybe there can be a more comprehensive review of those (I build almost exclusively with these), but that's probably out of his view.

Going with a mystery AllWinner part is really compelling, but I can't go into production (with a 5+ year expected run lifetime) on that.

Just out of curiosity, in your opinion, which are the major SOMs that you would consider for a 5+ year expected run lifetime?

If product lifetime is a concern, it's important to pick a vendor that guarantees product availability through a certain date. Either that, or you need to be prepared to do a lifetime buy of all of the parts you'll ever need when the EOL is announced.

Companies like NXP will guarantee certain parts for at least 10 years after launch date. Search for 'i.MX' on this page to see their date guarantees: https://www.nxp.com/products/product-information/product-lon...

It’s two-pronged. You need to find a chip line that has manufacturer guarantees for a long span, then a SOM integrator that will also guarantee a time span using that part.

I’ve been happy with NXP (Freescale) for the former, obviously being a large automotive supplier has a benefit.

For the SOM I’ve recently been using Toradex. They do a great job of making their SOMs comply to a common signal layout at the connector. I also recommend Boundary Devices. TechNexion is also interesting if you need a smaller footprint. They’re the group that also ships as WandBoard.


I'm working on a project that is using Nvidia Jetson modules. The Jetson TX2i is supported to 2028.

This is an absurdly impressive writeup, and it's great that it shows that there are SIP's out there for such low pricing (sub $5 for a ic whuch includes a decent few MB of dram is great, albeit single core).

Many years back, I also did a writeup (much less intense) that goes more so into what it took for me to get Linux running on an IMX6 based Soc on the software end, including how I found a bug in the USB clock tree in the kernel for the IMX6, how I debugged it, and how I mainlined it (my first and only kernel commit to mainline). https://brainyv2.hak8or.com/

Such projects are increadibly satisfying to work on, it makes you feel much more competent when working with embedded systems. There is much less "black magic" going on.

Jay also did a great rundown of $1 microcontrollers a few years ago:


This embedded Linux article is a really worthy followup!

Thanks for the link on microcontrollers. Embedded writeups and microcontrollers are super fascinating.

I have been doing embedded for a number of years. I'm the only fulltime SWE at my company and it just gets exhausting managing 5 different products on completely different code bases/technologies. Plus a web portal that does device management and reporting.

So a big thing people should consider when building something to scratch. Am I making something I can use again when I make my next product. Its a much higher mental load if you have no commonality between products. Also it can cause problems for sales, support and management when they have to remember nitty gritty details about each of your very divergent products.

Absolutely true. One of the things I learned decades ago as a solo entrepreneur was to force for as much commonality as possible across everything I was doing.

When you wear many hats you have to mentally task switch between domains and this time can sometimes be measures in weeks (example: you've been doing mechanical design and you now have to switch to FPGA design after months of not touching the codebase).

This is why jumping on the latest and greatest trends isn't always the best idea. Languages, tools, frameworks, new chips, new FPGA or CPU architectures, etc. Things that don't mesh at a certain level impose a cognitive load that can really slow you down.

The simplest example of this I have from years back would be using C and Verilog on embedded systems. While Verilog is not software, it's hardware the "C" feel to the language --as opposed to VHDL-- makes for an much easier transition between hardware and software development, at least for me.

Sounds like my day job of the last 10 years or so.

Although I'm mostly focused on the app code, I spent about four months this year debugging various wifi stacks (one for each generation of product). Only to have most of the problems fixed by moving all products to a 5.7 kernel base. Which was no easy project in itself, but mostly not my work.

In my experience, the processor, memory, flash, and all other things combined are trivial compared to getting reliable wifi.

That's good that 5.7 fixes some of the wifi issues. I actually ended up forking the driver for our particular usb wifi device and hacking whole sections of code out of it. There were some optional things that would just randomly crash the kernel (4.x).

Sounds like a nightmare.

At a casual glance it looks like your products are mostly [processor] connected to a network of CAN and/or 4-20mA devices. Not possible to back-port newer control modules to older product lines?

That said, impressive that you are handling all of it on your own. Hire somebody!

I have a couple contractors I bring in part time when I need some heavy lifting on the embedded or web side.

I have back-ported somethings, but really a large chunk of the products had terrible technical decisions before I started. I spent the first couple years just dragging the company back from the technical and financial brink. I'm slowly eliminating old products from our sales sheet in favor of ones I engineered. So it will eventually be ok.

So true. Switching from one vendor to another is a huge deal. Documentation, code bases, tool chains, etc. require a huge investment in time. It's easy to do a quickie hobby project on a new micro but for a product you really want to internalize the details.

The fact that almost everything is ARM now makes it a lot easier but the on-chip peripherals and tools are usually still very idiosyncratic.

> To this end, I designed a dev board from scratch for each application processor reviewed. Well, actually, many dev boards for each processor: roughly 25 different designs in total.

This is an astonishing amount of effort!

The hardware design effort aside: one article to demonstrate why the ARM ecosystem is such a garbage fire. UEFI, anyone?

What about it?

Good read!

Some others that I have enjoined in the embedded Linux realm are the posts by George Hilliard, Mastering Embedded Linux series and the Designing my Linux-powered Business Card.



thanks for sharing

Wow, what a great write up – thank you, Jay!

I have seen this exact confusion with many friends and colleagues in the past who don’t understand just how different a micro controller is from an application processor, and what a difference it makes and even simple things (ha ha simple) like circuit board layout.

And application programmers are so used to things like virtual memory and thinking or hoping the system, never mind an application garbage collector, will manage memory that when you suddenly tell them their stack is limited to a few kilobytes and please please limit your heap usage, too they’re … Cast adrift!

If you want Wi-Fi, FCC certification is hard and expensive. For an off-the-shelf SBC you don’t have to, only need to place a sticker with FCC ID of the SBC.

When embedding an SBC, you can fork the firmware from whatever Debian/Android/etc. variant is best supported by that SBC. Stripping down working and well tested Linux image is much easier than building a new OS image from scratch.

There are precertified laydown modules that can also do this. Laird (LSResearch), Telit, Redpine, and Murata are big producers of these.

I especially liked the section on DDR memory PCB routing and how it is less complicated than people make it out to be. At least for this application. Makes one want to try something like this

Huh. I almost missed that, but can confirm.

On a past project we had two DDR RAMs and a NOR Flash on the same bus. There was one hardware engineer, and he laid out the memory subsystem in about a day. (The power supplies were dramatically more complicated.)

It almost worked first time. I noticed about one memory error per day per device, which I repro'd with repeated `md5sum /tmp/largefile`.

We just dropped the clock rate in U-Boot.

Failed devices? Heat it up with a hot air gun for a while. Fixed 90% of them. Still doesn't work? Throw it away.

There are no mysteries, just stuff you don't know yet.

The important caveat of that claim: it's less complicated than you think, for point to point topologies. I strongly prefer point to point designs just for their simplicity. It makes it way easier to fix any SI or timing issues through write leveling/launch calibration adjustments.

Once you get into flyby or T-routing topologies - good luck!

The part not discussed is how many PCB layers (signal, ground, more signal, more ground, etc) you need when laying down DDR.

If you are playing with a 2x2" EVK then 8 or 10 layers isn't a budgetary problem. When the board starts getting larger it's now it's a huge cost adder as opposed to low speed 2-layer PCB.

This is why designers switch to doing it with SOMs. The SOM can be a high-speed part and the carrier PCB is cheap and slow.

He mentioned all the PCB designs were using 4 layer, which is pretty cheap even in prototype quantities.

Choosing a four layer stackup was a really instructive point for the purposes of his blog post, but kind of a bad rule of thumb to carry forward on a professional level.

PCBs are built in a sandwich stack. The middle dielectric layer, or "prepreg", is way thicker than the dielectric layers in the outer layers. This doesn't seem like a big deal, but for the speeds that a DDR interface runs at, it creates a higher impedance return loop for whatever you end up routing on layer 3. (Typically DQs, in my experience.)

You can avoid this entirely by building with a six layer stackup, and routing your DRAM signals on layers 1/2/3, keeping layer 2 as an unbroken ground plane. The result: nice, low impedance return tracks.

Yes, Jay is right - most of the time, this doesn't create problems. But when it does, it's pretty challenging to fix without a board spin, and when you're doing this professionally, that loses you time, creates risk, and is generally hella stressful.

Well said. And we're not even mentioning what happens when you start adding other criss-crossing traces for parallel LCDs and cameras into the mix. Those usually creep into the 30-40 MHz range, 18-24 lines per device with no differential lines. And I haven't even thought about USB-C yet.

And you still need to pass radiated emissions testing. Even those right-angle DIMM connectors for the SOMs spray RF noise everywhere when you put video on them.

cough cough METAL ENCLOSURE cough cough


(injection molded ABS with thin-film deposition aluminum)


Why no mention of gentoo (or funto)? I concede very little experience of yocto, etc, but I’ve a fairly complex system I maintain for both x86 and amd64 custom boards, all with quite heavily patched packages. I was challenged to add an imx6 board to the mix but I concede it was rather straight forward... for kicks I also switched out uclinux-ng for musl and switched up binutils versions, but it was all fairly painless...??

If you are familiar with gentoo it has a package manager called portage. You type “emerge packagename” and it installs it. Prepend ROOT=blah and it installs it in that dir instead!

So your build scripts are pretty trivial. Just write a bunch of emerge lines and you are done. Install in your new root, then package that ROOT. Simples!

Building from source could be slow, so cache binaries once built. Subsequent binaries will be used by adding “-k” to emerge above.

Profiles allow you to build target versions of software, E.g. customising compile flags or software versions. I use both a libc and a processor profile. The arch profile is also split between host and target, so for example static libs might be installed on host but not in target.

I used an overlayfs system to trim the delivery.

I think the big difference is I can rely on this huge distribution. Yet if I need some platform patch I need only drop it in my patches directory and hit rebuild. Put the patches dir in git and I’ve got a trackable distribution!


Sounds like gentoo has improved their cross-compilation story. I had several attempts at getting a cross compilation toolchain running on gentoo and they all failed. yocto was actually the first time I succeeded at building a cross compiler in any form, but this was a fair few years ago at this point.

To be honest, bitbake mostly seems to be a less principled/polished version of portage or nix. I would much prefer an embedded toolchain based on either of them than dealing with bitbake's syntax and quirks. The one thing that seems relatively unique to bitbake is layers, which essentially mean you can avoid making modifications to the base layer while still being able to patch/tweak just about anything in the build. For embedded stuff, where you're basically almost always working off of someone else's patches to an upstream project which you are then patching yourself, and all of them can update independently, it's the only way of keeping somewhat sane (but the action-at-a-distance can make for difficult debugging, which is something I would wish bitbake was better at: there's no way to query e.g. 'what statement in what file added this compiler flag?', and the set of possible files can be huge).

I just started a job working on embedded Linux devices, coming from a microcontroller world. This was extremely helpful, what a great writeup. Thank you!!

The article really seems to be more about "so you want to design a custom PCB for a given embedded Linux capable application processor". While it is full of useful information and candid findings, there is usually negative business value in this approach as Jay admits lower down the article: I don’t think most people design PCBs around these raw parts anyway — the off-the-shelf SOMs are so ridiculously cheap (even ones that plausibly are FCC certified) that even relatively high-volume products I've seen in the wild use the solder-down modules. It is, however, necessary sometimes.

Zooming out to a broader management context, non-trivial systems usually comprise multiple processors. In these circumstances production volumes are generally lower and subsystem iteration more likely, therefore it makes more sense for managers to worry about overall architecture (eg. bus selection, microntroller vs application processor, application processor architecture) and consider factors such as speed of prototyping, availability of talent, longevity/market-stability of vendor's platform, feature-set and nature of software toolchain(s), etc. rather than worry about optimizing specific processor part selection.

The management priority generally only flips closer to the article's fine-grained approach in the event that you are designing a consumer electronics widget or otherwise very high production run product with a low change expectation, where individual processor selection and BOM optimization become greater concerns.

Conclusion: For most people, for most products, most of the time, you don't want to start by worrying about specific processor part selection. Further, prototyping should be done on third party dev boards/SOMs and when production volumes justify it, final PCB design can be handed to an outside EE. This is not always a viable strategy (eg. due to form factor or other application constraints), but it's fast, cheap and hard to go too far wrong.

Please don't quote me out of context: "these raw parts" referred to the MediaTek parts specifically, not all the parts in this review.

The only thing that SOMs provide is a processor + DRAM + PMIC. If you practice and become proficient at designing around application processors, it should take you no longer than 3-4 hours to get this component of the system (the processor, DRAM, and PMIC) laid out when working with these entry-level parts.

SOMs aren't some magical remedy to all the problems. It's still up to you to design the actual system, which takes hundreds of hours. The difference between using a SOM or a raw-chip design is negligible at this point.

I have no problem prototyping on EVKs --- in fact, I link to EVKs for each platform in my review. But a lot of these evaluation boards are pretty crummy to prototype with; some don't have all the pins of the CPU brought out, others use proprietary connectors that are a hassle to adapt to your hardware. You shouldn't be afraid to spend an 8-hour day designing a little breakout board for a part if you're interested in using it in a product that's going to span 6-months' worth of development time.

Of course there are caveats. I'm entirely focused on entry-level parts; if you need a Cortex-A72 with a 128-bit-wide dual-rank DRAM bus, sure, go buy a SOM. Also, it should go without saying that it completely depends on you and your company's core competencies. This article is aimed at embedded designers who are usually working on hardware and software for microcontroller-based platforms. If you work at a pure software shop with no in-house EE talent then this article is likely not relevant to you.

I think there's a lot of companies scared to roll out application processors instead of microcontrollers. I wonder what your thoughts are on if it can save development time due to Linux dev environment Vs baremetal.

Obviously lots of applications it would be impossible to use a 264 bga but I've been there with projects with 3 CAN buses and Ethernet plus flash memory and thought maybe we should be using a cortex A7.

I'm sure you know but file storage on microcontrollers can drive you insane ha ha and the same with graphics+TCP IP

Hi Jay, that wasn't my intent. Thanks for clarifying your scoped comment. Your post is very interesting. However, I believe the core point regarding the potential commercial questionability of embarking on EE design work with a processor part-centric mentality still stands. As you point out, it is sometimes justified. Few people outside of Asia have competent "in-house EE talent" idling in their organization.

Just out of curiosity, what are the some off-the-shelf SOMs that you would tend to reach for/recommend?

Can someone more knowledgeable than me elaborate on this statement?

> when compared to application processors, MMUless microcontrollers are horribly expensive, power-hungry, and slow.

What is it about the lack of an MMU that causes a hunger for power?

"MMUless" was intended as a nonrestrictive modifier --- I'll remove it to avoid ambiguity. Microcontrollers tend to be built on larger processes than microprocessors are, so they eat through more active-mode power. The point I'm trying to get across is while you can run Linux on a microcontroller (which doesn't have an MMU), there's not a lot of good reasons to do so. Thanks for the feedback!

I might be wrong, but the key point between a microcontroller and a application processor is deterministic execution. When controlling a motor say, it might be vital that your interrupt handler finishes in less than 100 clock cycles.

A microcontroller usually[1] doesn't have fancy out-of-order execution, fancy caches etc as that would make the execution less deterministic in time. A MMU would as well.

Lacking these features also make microcontrollers a lot slower, and I'm guessing he's thinking about cost/watts per MIPS or something like that. Yes the application processors draw more power overall, but (I assume) they are so much faster it more than makes up for it in dollars per MIPS or watts per MIPS.

So it appears more of a symptom than a cause. But again, I might be wrong.

[1]: https://en.wikipedia.org/wiki/ARM_Cortex-M#Silicon_customiza...

None of these application processors I reviewed have out-of-order execution. Cortex-M7 microcontrollers have icache/dcache. You can run bare-metal code on any of these application processors and it will behave more or less like a microcontroller. The lines are really pretty blurry, but the MMU is the big dividing line in my opinion (but it's obviously open for discussion).

> None of these application processors I reviewed have out-of-order execution.

You're right, I was thinking of the A8, A9 and similar.

> the MMU is the big dividing line in my opinion

I'm no expert, but seems like a reasonable line in the sand to me.

How does he find the time? I thought maybe it was very surface evaluation given the number of parts but I scrolled down to the part I have recent experience with (STM32MP1) and his summary was spot on.


Guessing from his update frequency: in fits and spurts, over many, many months.

He's posted some stuff on twitter that alluded to this work - lots of debugging DRAM and swapped bit signals.

Ths section on "Why not just use an RPi?" is a model of clarity and economy of writing. Incredibly clear, amazingly authoritative. So much to learn here.

HOW DO WE THROW MONEY AT JAY for this epic write-up?

Not sure if this is right place but thought I could ask here with regards to embedded development.

1. I once had a temperature/humidity sensor, but never figured out if it was analog or digital (it had three prong. someone in field thought it could be digital). What would be some ways/tooling needed you could use to independently test this out?

2. In/Out on embedded boards is any suggested way to capture reads (or write), for example raw data? What about the other way around?

3. Which board here might be good to practice (or something else) doing a hello world on making device drivers. Something on easy, learning side and possibly practical that interacts with a real device.


Those questions are all too open-ended to lead to specific answers, but in general, if you don't actually need a serious OS in your project, I'd suggest getting your feet wet with Arduino and then moving on to ARM-based platforms such as Teensy when you need more power (which you eventually will.)

The advantage of Arduino is that it will give you a good introduction to embedded development and I/O principles. Working with Arduino, you'll find that the CPU is reasonably capable but that the programming abstractions are clunky, and that'll give you the excuse you need to get comfortable with direct I/O port access. What you learn along the way will apply on more sophisticated platforms.

As for the temperature sensor, it's most likely digital (I2C or similar interface). Without the part number it may be difficult to get it working. If you don't have the part number, it's best to chuck it and order one from Adafruit or Sparkfun that comes with the necessary documentation and support code. Reusing salvaged parts isn't the cost/productivity win that it used to be.

If you need more horsepower for your project than an Arduino or Teensy-class board can provide, then you would want to look at Raspberry Pi or Beaglebone Black. If you aren't already comfortable with *nix development, you have a massive learning curve ahead, for better or worse, and you're going to be spending a lot of time in Professor Google's classroom.

> Because Linux-capable application processors have a memory management unit, *alloc() calls execute swiftly and reliably. Physical memory is only reserved (faulted in) when you actually access a memory location. Memory fragmentation is much less an issue since Linux frees and reorganizes pages behind the scenes.

This paragraph is confusing to me.

- How would a MMU help for malloc to execute "swiftly and reliably" ?

- What does "actually access a memory location" implies exactly?

- "since Linux frees and reorganizes pages behind the scenes": Is this not what people dislike from malloc? That it's none deterministic?

MMU allows linux to provide a "flat" memory space to each userspace process. This memory space is assembled from 4KB pages that can be randomly dispersed throughout physical memory. Say, you want to malloc() 1MB of memory on a system that's been up for some time. Physical memory may not have a fitting continuos chunk of memory, but virtual memory of a newly created process will always have one, provided that there is enough free pages available.

In other words without MMU all programs share the same memory space, and if it get fragmented, generally you'd have to reboot. With MMU fragmentation can still be an issue, but it's greatly reduced, since each process has its own memory map. And if memory of the process gets framgmented, you can just restart it. If runtime supports object compaction/relocation, then fragmentation may be not an issue at all.

Re access to memory locations -- mostly any direct access to instruction or data memory in userspace program automatically goes through MMU remapping.

Thanks. I realized that there are several types of MMU. I was thinking of the simple MMU (on 32bit MCU) that limit memory access. Not the one that provides the illusion of virtual addressing.

To be precise, that'd be called MPU -- memory protection unit.

If you don't have time to read the whole article, the conclusion is worth reading on its own. Has some great advice on the importance of practice for improving skills.

Shouldnt you avoid malloc and free (after the startup phase of your application) completely in long running applications?

If your system is basically one long-running process, why would you need Linux? Maybe something lighter weight, like FreeRTOS, would be a better fit?

well the article was mentioning long running. But I agree, I shd use FreeRTOS.

> Because C doesn’t have good programming constructs for asynchronous calls or exceptions, code tends to contain either a lot of weird state machines or tons of nested branches. It’s horrible to debug problems that occur.

Linux is in C though. So, what is really going on?

Linux uses a lot of kernel threads (as well as a lot of code which runs on application threads). It's pretty synchronous in style for the most part.

Great writeup Jay. Been working in embedded since 8031 and one of these days I will find a need for project that is not just main(){ sleep(); } with interrupts and when I do this will be my goto document, already saved it off-line just in case :-)

Great article.

He sings the praises of Linux memory management and virtual memory, which are unavailable on a microcontroller. Would be interested in comments on the downsides of swapping in a system with a small amount of onboard Flash that may wear out.

Swapping to EEPROM is a bad idea - not just on wear but it's a massive I/O bottleneck.

If you're out of RAM on an embedded project, rethink your design or add more RAM (if time and budget allow)

Why would you enable swap?

I assume he sees "virtual memory", and jumps to thinking "swap". Which is rather common, but arguably a mistake.

Virtual memory is what allows for copy on write pages when forking. It is the preferred basis for implementing memory mapped files. It allows for sharing a single copy of shared library in ram, without causing everything to break if one program decides it wants to patch "its" copy of the library in memory.

Yeah sure it allows for paging out anonymous memory too, but that is hardly its sole function.

Correct, this was my (apparently very widespread) misconception that I just corrected with a bit o' googlin'

Wow, what a resources. Thanks so much!

Does anyone have suggestions if one wanted a super fast single core? i.e. to make something for a software synth where the bottleneck is likely to be speed of a single core?

Very impressive article with lots of knowledge on the metal and OS. Good to know for people who want to work between the OS and the hardware.

Same here! I am forwarding this article to all my coworkers!

I am currently using the Allwinner V3s, and the review is covering this part accurately.

No, I don't want. I want to use mature distro, with thousands of packages, with known bugs fixed, with fresh software, like Fedora.

This comment doesn't make any sense to me - it's like responding to an article titled "So, you want to build a transistor radio from scratch?" with "No, I want the latest iPod Touch".

There are lots of embedded systems beefy enough that Fedora is an appropriate choice.

Not really. Fedora IoT only supports x86_64, aarch64, and armhfp, which has some intersection with this space, but not a lot.

Regardless, that wasn't really the point of the article.

There is lots of embedded that uses x86 or arm. Lots of embedded isn't significantly power or size constrained and volumes are very low so software development costs dominate over hardware costs.

But agreed, those aren't the point of the article.

"Embedded" is a big space.

Something that is not power nor size constrained doesn't really fit most people's definition of "Embedded."

Example: the computer in a CNC machine.

Just because you can doesn't mean you should

Article title doesn't contain "from scratch", so I will rewrite your example as "So, you want to build AM radio?" and my response will be "No, I want to build FM radio, because of tons of FM radio stations available to listen".

The article isn't even close to being about building a linux distribution. It's about embedded electronics engineers moving from microcontrollers with custom embedded firmware to SBCs using minimal linux for embedded systems. So accepting your revision, the response would be "no, I want to buy a stereo system".

Well you can have that, those are the repositories mostly, which you have access to when building your linux. So you have a mature source of code, but what makes a distro 'mature' isn't really of any value in the embedded space. Kernel, core utils, init system, some packages and you custom code, and your done. You now have your own distro, code is mature, boots really fast and is probably more stable and secure that the mature distro which is trying to be everything under the sun. When you only need the linux to do one specific thing you can make something that is really rock solid and actually have a sense of everything that is going on under the hood. Having some guy whose, honestly mostly uneducated, solution is to just throw fedora on there is going to create a crap system.

Um .... OpenWRT?

Yep. Good choice for systems with 16-64Mb of RAM.

You want "to use mature distro, with thousands of packages, with known bugs fixed, with fresh software, like Fedora." on an embedded system. Really?

Yep. I used Fedora and Yocto on iMX6 SoloLite. Both use rpm&dnf. Fedora is much better. In fedora, I feel like at desktop: everything is the same, zero issues. In yocto, I feel like I returned in time for about 20 years: same f*ing bugs, which I reported or fixed years ago, again and again.

It really depends on your hardware, there are many systems with limited storage and memory where Fedora is impossible to fit. I do believe Yocto is the Gentoo for embedded linux, which is over-engineered to hell. The good news is that, unless you have some very special requirement, just use buildroot, which is 1/100 of the complexity and get the job done well. KISS.

Using a package manager on an embedded system is just asking for trouble. Or read-write rootfs, for that matter.

I usually have a r/o rootfs as squashfs with overlayfs for changes. It makes it easy to wipe things back to factory defaults.

Then you want a small desktop system, not an embedded one.

What wrong with that? My first Linux (SuSE) system had 16MB of RAM. I successfully compiled X on it in just few days. Why I should use outdated software on embedded chip with say 256MB of LPDDR?

What does it have to do with outdated software? You still use kernel version and userspace libraries of your choice - arguably more flexibly and up to date than on any mainstream distro.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact