Hacker News new | past | comments | ask | show | jobs | submit login
Raspberry Pi 3 Fastboot – Less Than 2 Seconds (furkantokac.com)
455 points by solarized 3 months ago | hide | past | favorite | 115 comments

This is a common technique in embedded products, but I should point out that it's not useful for general purpose Raspberry Pi work.

The author disables several important kernel features to boot faster, including network support and USB support. Obviously, if you plan to use network or USB then it's not an option to disable those. You can disable some of the other debugging features and little extras for a slight boost, but it's going to be negligible (<1s, in my experience).

The biggest change comes from removing the init system entirely and replacing it with the user-facing app. This is the fastest way to get your app started, but now you're responsible for doing everything that was previously handled by the init system. The author can do this because they don't have any network, timekeeping, or other system functions normally handled by the init system. They only needs to mount filesystems, which is easy to replicate inside of the app.

Anyone planning to ship a Raspberry Pi based product should read this article carefully as a great starting point. Anyone who wants to use their Raspberry Pi for general purpose use or even with network connectivity or USB devices will be disappointed if they try these techniques. Great article, but it's a narrow use case.

A few years ago, there was an article on getting a fully functional Linux booted on an rpi in 3-4s which has a few other, maybe more generally useful tips for speeding up startup times: http://himeshp.blogspot.com/2018/08/fast-boot-with-raspberry...

The bootlin training slides on embedded linux boot time optimisation is also quite informative: https://bootlin.com/doc/training/boot-time/boot-time-slides....

Great points, a tangential thought:

MacOS and Windows have figured out that showing a GUI ASAP satisfies the user. Even if in the background its still negotiating networks, initializing sub-systems, etc.

I wonder why Linux has not yet taken that approach? That is move GUI far up in the init-queue and let it initialize networks, avahi, resolved, firewalld, etc as GUI is showing login screen...

> satisfies the user

Does it though? Has this ever really satisfied you … that you get to login and then circle your mouse for a few minutes while you wait for things to kick off.

Na, all or nothing please I don’t want to be teased by an is-it-isn’t-it experience I want my boot to take its time, let me know what’s going on and when it’s all there let me enjoy a responsive experience.

It’s just a cheap trick and it probably doesn’t matter as much as they think it does.

Cheap tricks can create a much more responsive experience.

Whenever a user interactable element will be ready in <~100ms, you can display it as ready now, because it is faster than the reaction speed of a human, which makes all interactions with the software faster, smoother and more responsive than if I used a loading indicator. It's a free win.

Whenever you know in advance you need to display something to the user and need to load/process something that takes much longer than 100ms, you can load in the background while displaying whatever needs to be displayed and if the time spent on the screen is slower than the loading time then the next interaction will be instant (common case), if the user is faster on the screen you can still go to a loading screen, but the time it is on screen for would still be much much shorter.

> Whenever you know in advance

Knowing things in advance is not a cheap trick. I’m not talking about a well structured boot. I’m talking about the far more common approach of just showing a moving mouse pointer and a few icons and “pretending” the computer is ready to use, a notion that is quickly disabused as soon as you try to get to work. Then even, you will her no way to know when it “is” actually ready.

Do you believe that you can reason about a computer? I'm going to assume yes. If you'd rather your computer do the countable number of things you know it needs to do, before it draws to the display because it's faster. That's only because you know thoes things exist, and how long they should reasonably take. Most users can't count them, nor do they trust their computer. The loading screen does satisfy them because they know the computer hasn't randomly crapped out... *again*. People who don't understand how to fix computers don't trust them. Rightfully so, they generally dont do what their owner/users want, and randomly will just break costing more money, for seemingly no reason. Those loading screens prevent some of the anxiety of using something you need when you don't understand any part of it.

Today's laptops and the majority of desktops boot more than 10x faster than they did a decade ago, so for all but the youngest users I feel like this point is moot. There was an age where you had to go get a coffee while your hard drive started spinning up, I remember kids starting their Macbooks before they got in the car for school so they'd be booted by the time they were in class...

When I was a kid, kids weren't issued MacBooks for class. There was a single, lone Apple II in the back we played Oregon Trail on -- and it took a fair few minutes to load from floppy.

A floppy drive? Get off my lawn

On my C64 I'd start loading a game from cassette tape before dinner so I could play it afterwards.

Cassette? Pah!

I had to type the games source code in from a magazine for my acorn electron.

Cassette tape?

Punched tape[0]: Hold my beer

[0] https://en.wikipedia.org/wiki/Punched_tape

I don't think punched tape or punched card was ever used for a personal computer though.

The Altair launched with a cassette tape interface anyway.


Fair enough, but would it be fair to argue that personal computer is a pricing distinction, rather than a technology distinction?

In my mental model, the technology distinction is more between single user and multi-user systems. And arguably that is more a a software differentiator than a hardware difference.

Floppy drive?

On my 9 track tape I would be lucky if the thing loaded and was ready to use 72 hours later [1].

[1] https://en.wikipedia.org/wiki/9_track_tape

I … think I agree with you …

Yes, it does matter. If Microsoft switched things up to provide users with a responsive desktop just after logging by loading more services before the login screen comes up, existing Windows users would perceive it as slower. If Linux distributions delayed the loading of services at the expense of a less responsive desktop just after logging in, existing Linux users would perceive it as slower. Simply put, each user base is using different metrics for boot time and their metrics are based upon their prior experiences.

> for a few minutes

When is the last time you cold booted a modern Apple Macbook for example?... Minutes is an absolute exaggeration.

> I wonder why Linux has not yet taken that approach? That is move GUI far up in the init-queue and let it initialize networks, avahi, resolved, firewalld, etc as GUI is showing login screen...

Because it boots faster than a Windows system in that scenario. I personally like to be able to use my system as soon as the desktop is visible.

Also this is the way. When your system boots, it boots.

I just timed my Windows 11 vs my Ubuntu. Windows 11 booted faster to a usable GUI by a factor of about 2:1

> Windows 11 booted faster to a usable GUI by a factor of about 2:1

Probably used fast boot and returned from an image. :)

From a fully fresh start on an NVMe, Windows 10 does start up approximately just as fast as a brand new Debian install.

With the added benefit of not having X11 setting modes 5 times in the mean time.

This is quite surprising. My experience has been quite different. Have you run systemd-analyze to try to locate what's causing the slower boot?

Surprising indeed. Windows 10 boots quite quickly (don't know about Windows 11) but the boot of a fresh install of the latest version of Debian feels instantaneous. In a systemd container, it boots under 2 seconds even with a few extra services.

> but the boot of a fresh install of the latest version of Debian feels instantaneous.

Starting with latest iterations CentOS 7 and Debian 10, booting Linux is almost instantaneous. We have quite a few VMs, and after restarting one (i.e.: reboot <enter>), I just need to sip a little coffee to be able to SSH into a fully booted system.

On the other hand some of the higher end servers we have can fit a meaningful chat to the booting process (in fast boot mode, no less).

This means that most of the overhead is hardware initialization waits, IO and GRUB.

BTW, X11 is not modesetting 5 times on me. Maybe the first modeset you see is kernel's terminal modeset?

Windows has a trick for that. It actually hibernates the OS, GUI, Drivers etc when you shut down, just without any apps running. This way it can come back up in seconds.

You'll notice that after you install a new device driver it'll boot much slower. This is because it invalidates the hibernate image and does an actual boot next time.

It's also why it won't let you fiddle with the BIOS settings before it does a quick boot. Because the hibernate image presumes the exact same hardware configuration.

And it doesn't unmount partitions when you shut down so if you use Windows one day and Linux the next you have to boot up Windows first and then reboot into Linux so that Windows will unmount its partitions, otherwise Linux can't mount them.

Oh yeah good point. I forgot about that.

Linux updates also wreck bitlocker if the windows partition is on the same drive (which I have to use for work) so I left the whole dualboot thing behind.

I have to say I agree. Presenting a login screen early is a great idea, as most people take a short while to log in, and in that time system can finish initialising - as long as there is enough free CPU and interrupts to service the keyboard and mouse during that time.

I even do this in my web apps. The login screen shows immediately, and while the user is filling it out, the WebGL app and assets are loading in the background. The login button doesn't become enabled until the audio system is loaded, so that the click on the login button can be used as the User Gesture to activate the AudioContext.

why do you need WebGL and an AudioContext to click a login button??

I don't? Why would you think I did? My app is a WebXR social app. You need to login first, and I use the login process as an opportunity to make a better user experience.

I have rewritten my init system in the past to start xorg before e.g. DHCP initializes.

It takes about 5 seconds to login anyways, so DHCP is usually done before my browser wants network

If you're using systemd and NetworkManager, this is already what happens. Services that depend on network are delayed until a connection is made, that generally only happens after login.

If you're using something else than yes, the start of network services are made during boot. But it can be argued that if you're not using NetworkManager, you are a server and wouldn't benefit from delayed connection.

Because you would have to assume too many things.

What if the GUI bring-up required mounting filesystems that need a kernel module that hasn't loaded yet?

What if the filesystem is on a network?

What if the network isn't up yet?

What if someone's xinit script needs access to the bluetooth stack which isn't up yet?

And even if the all the "correct" assumptions were made for a default install... doing anything different from that would require unraveling a mess to fix.

Because Xorg is just a UNIX app ported to Linux and not an integral part?

If only the init system was lazy, i.e. running parts of the boot sequence only when required.

It's very possible in linux. Modules can be loaded and unloaded for some kinds of hardware. I don't know if that's true for rpi though.

systemd fully supports that.

I believe it was meant to be ironic comment, pointing at systemd

Doesn't disabling USB cut you off from most of the hardware on a Raspberry Pi?

Come to think of it, doesn't the SD card slot sit off of the USB bus?

I have at least a few Raspberry Pi's that are programmed to simply boot and hit the aux out. This could be the difference between my car's white noise system coming online before I even shift the car out of park, vs. when I'm in the roundabout a few hundred feet away. Lifechanging.

Curious: why does your car have a white noise system?

> Come to think of it, doesn't the SD card slot sit off of the USB bus?

It doesn't.

You're probably thinking of the ethernet controller, which is often cited for the Pi 3 and earlier.

Theoretically (I have no idea about the Pi specifically) it's entirely possible for the bootloader to speak USB, load the kernel + initramfs from the device into memory and then then boot a kernel which knows nothing about USB.

This can be done with (for instance) u-boot.

If it did it wouldn't be able to boot, right? The OS/bootloader is on the SD.

The bootloader would still support SD, but the kernel would come up just fine without supporting it, as by the time the kernel is booting it's already in memory. You just wouldn't be able to mount any filesystems from that kernel, so you'd be stuck with whatever userland you packed into your initramfs

Most the drivers can be configured as ko and one can load them later on - such as after GUI is up. It should give impression of fast boot.

Talking about optimizations for raspberry pi, does anyone knows if there is any low power mode configuration available ? I've always been fascinated by how smartphones can stay like one/two days in idle while a raspberry pi can't stand 10 hours with a similar battery.

The Raspberry Pi is based on an SOC from Broadcom that was derived from SOCs designed for Set top box and Over The Top box products (i.e. Roku, AppleTV, etc.).

The Beaglebone is based on an SOC from TI that was designed for (at the time High End) Smartphones. Therefore it supports much lower power consumption and idle modes.

The BCM2835 used in the PiZero was used in the Nokia 808 PureView as a coprocessor for graphics and the camera: https://en.wikipedia.org/wiki/Nokia_808_PureView

The BCM2763 and BCM2835 are the same silicon in a different package: https://raspberrypi.stackexchange.com/questions/840/why-is-t...

Didn't know that. That makes sense. As a discrete coprocessor to the primary Smartphone SOC for a very high end smartphone, for power management the discrete coprocessor chip can be completely powered down when going into idle mode.

The trick/indicator about the design goal of primary Smartphone SOC usage for the OMAP SOCs and most mobile CPU is the existence of deep sleep, RAM retention modes. Additionally every GPIO pin is configurable for Pull-Up/Pull-Down to prevent current leakage.

I haven't looked extensively but I don't believe the Broadcom SOCs support this mode.

Actually after more extensive review - I think the story is probably that Broadcom attempted to make a smartphone processor from Alpamosaic's Video Coprocessor, but didn't adequately support the low power modes and didn't get any design wins for smart phone.

They then pivoted and found traction in the Over the Top TV space.


There aren't a whole lot of power management features available in the hardware, the SoC just wasn't designed for mobile/battery use. About all you can do is disable USB/ethernet/HDMI if not needed for your use case, or drop the CPU/GPU clocks.

It is relatively low power, in the ballpark of a mobile phone. Running headless the Pi3 consumes 4-5W with an out of the box Ubuntu install, about 9W with the 7" screen, and after shutdown it remains on at about 2W. The iPhone SE takes about 9W while charging and 4W fully charged with the screen on. Numbers aren't exact, read out from a Jackery display and it's been a few months. I think the Pi4 8GB was about 1W more than the Pi3.

My Lenovo t460 draws 4-5w while idle with Wi-Fi and lcd goin… 4-5w is a decent draw for headless

Do you know why does it still consume half of the power (2w out of the 4-5w) after shutdown?

The power circuit in the Pi isn't super-efficient.

tl;dr you can drop power consumption to between 80-100 mA at idle if you don't need to use HDMI. And like 75-80 mA if you also disable WiFi/BT completely, as well as the tiny onboard power/activity LED.

But honestly, if you're going to that length and really need good battery life, maybe consider a fast microcontroller ;)

The RPI is horrific for power usage -- I've never understood why they didn't release a version to address this. I get better performance out of all the Friendlyelec stuff....

Certainly one idea was to use the RPi as a battery-powered, pocket-sized, general purpose computer, like a "smartphone". Solutions for this exist but are still awkward, IMO.

Another idea is to use an open source "smartphone" like the PinePhone as a battery-powered, pocket-sized, general purpose computer, like the RPi. (Depite whatever usage the Pine64 folks and "app" contributors envisage.)

The PinePhone boots from SD card, like the RPi. It will run Linux. What remains is further OS support. It will boot NetBSD, which is a start. The RPi battery problem has been solved.

For the avoidance of doubt, "general purpose" means, among other things, "cellular calling" is not a prerequisite to usefuless. Similarly, "pocket-sized" does not imply that the computer is always operated by removing it from a pocket and holding it in the hand; it refers to the form factor, not the usage.

Further, I am not seeking to stimulate discussion of "average users". This is "Hacker News" not "Average User News". The idea has come up a few times on https://forum.osdev.org but I have not seen it discussed on HN.


ARM11 is what the raspberry pi 4 BCM module uses from what I could tell. Looks like your best bet is to set the cpu freq down. I didn't see anything useful in the Broadcom BCM2711 datasheet.

No it isn't? ARM11 are ARMv6 CPUs, Raspberry Pi4 are ARMv8, two generations of difference, and that's just the instruction set. ARM11/ARMv6 says nothing about power management, it's up to the SoC. (ARMv8 does say a few things about power management, but Broadcom/Rasperry Pi4 chose to ignore it)

Ah, well. I did allude to I wasn't certain. That's what I could tell from 5 minutes of sleuthing. Either way the conclusion is the same, you don't have good control to reduce power consumption other than clocking down.

Great job! I am building a CNC router in my spare time and I ended up using a raspberry pi 3 since I want to cram in a ton of features and esp32s and stm32s are simply not cutting it. And cutting the boot time is something I spend stupid amounts of time on. The best I was able to achieve was just under 8 seconds. I'll have to dig into this and see if there is something more I could do.

Are you using the PREEPMT_RT kernel? Driving steppers or servos directly from the Raspberry?

If you're driving steppers, especially on something like a mill, I'd think it'd make sense to use something like a Beagleboard with its PRUs, or an auxiliary hard-realtime microcontroller.

You might be able to shave a little bit of time off, but if you need network, USB, or an init system then be warned that this article removes all of those.

Interesting. I given up using Raspberry Pi 4 on a project because their crap proprietary bootloader took something like 4 seconds to start the kernel. Maybe it was faster on Pi 3. I ended up buying board with Allwinner that works with U-Boot.

That's interesting. I was thinking of going down the same route. Which Allwinner board? And how fast did it boot?

I went with Orange Pi 3. I would need measure exactly, but with U-Boot compiled without unnecessary things (USB etc.) it starts loading kernel pretty much immediately. In total boot to my graphical application which uses KMS/OpenGL was something around 2 seconds.

I used to work for an automotive manufacturer, and one of the challengers that we faced while exploring system design was startup time to get from 0 to a backup camera.

It was this kind of thing we considered (this was all POC) but it comes with the cost of having to get the rest of the system up.

This is a great write up and was really interesting to read. My question is why would the proposed changes are not already included upstream? Isn't a faster boot what everybody wants?

> Isn't a faster boot what everybody wants?

Yes, but glancing quickly at the article some solutions are not generally acceptable, such as moving filesystem processes to the application code. If you have a RPi project where you plan to run a single Qt app then you can do stuff like this, but that is generally speaking not the use case.

> My question is why would the proposed changes are not already included upstream?

Because these changes remove networking, USB support, sound, debugging support, and even the entire init system. It's useful if you need to boot into a single, simple app without networking or USB as fast as possible, but it's not useful for general purpose computing.

I guess there's an obvious follow-on question: Can USB and networking support be implemented in such a way that startup is deferred or parallelized?

Can they be loaded as modules on demand? Sounds like it should be possible …

So my computer has booted but I can’t ping it or use a keyboard. Great.

I’d rather my computer booted and then signalled when everything I want was ready, rather than hide the init time after claiming to be ready.

Lazy troll is lazy. I like that my Linux-based head-unit comes ready 3 seconds after starting the car, so the volume control works, and if I was listening to a simple input like the broadcast tuner at the last shutdown, it returns to that and I've got music right away. It takes a little longer for maps to render, and longer still for the Bluetooth module to load, but that's okay. Delaying simple functions until all the most complex stuff had loaded would be a terrible UX.

That’s not a general purpose computer

That's what systemd does for you.

> Isn't a faster boot what everybody wants?

I’ll rather have slow boot and proper UEFI support so I can boot any vanilla ARM64 Linux distro (Debian proper), instead of images/distros which have been crafted to be device-specific (Raspbian).

I boot this thing once every second month at most. I honestly couldn’t care less about boot-times.

Luckily for me, there are solutions to my problem too ;)


https://github.com/pftf/RPi4 for Pi4 users.

Incidentally, I have used this for for the ESXi Fling for the Pi when benchmarking it against KVM performance (for those curious, KVM far outperformed ESXi), but I heard that it doesn't work as well for Linux distros (some hardware was broken last I heard [a few months ago]).

But yeah, I agree that getting UEFI support and standardising the ARM boot procedure is very useful for all of us.

If you have an immutable system partition I wonder if you could boot, and then take a snapshot of memory (excluding caches etc) and save that to disk. On a reboot you would 'simply' load your RAM image from disk and everything would be immediately at steady state?

Caches aren't a problem, you can just persist memory. The problem is the non-RAM internal state of everything. You can preserve CPU register state and various operating modes, but persisting the state of every hardware device you use is a big, undocumented challenge.

Virtualization systems like qemu-system or VMware support suspend/resume, which is essentially the same problem.

I only mentioned caches to reduce the amount of storage that would be read as storage isn't always very fast.

That said the internal state is a valid concern, you'd need the involvement of hardware drivers as it's essentially hibernation...

With what mass storage driver?

I've been booting Void Linux on a raspberry pi 2 and it comes up in like 5s. If you really need to shave boot time down, like you're cutting power to the machine except when it's necessary and need it to come up instantly, something like this may be helpful though you might be in the realm of wanting an RTOS rather than Linux.

For most Linux use cases, it really comes down to distro (and especially init) choice.

So, it seems like the main speedups were from removing a ton of kernel modules, replacing the init system with the target application, and then some optimization of the target application. That's pretty neat!

But let's go further - while we're trading flexibility for performance, why couldn't the entire kernel and userspace be precompiled+loaded into a single image file - like a Smalltalk/Lisp image - and loaded directly into memory, modulo some really-truly-has-to-be-done-upon-power-up initialization?

Sure, there's hardware that has to be brought up - so, right after initializing basic access to the SD card and RAM, this hypothetical bootloader could initialize another core and use that to bring up the random SoC subsystems while the kernel image is being copied into RAM.

And, yes, this would mean that you would need to re-compile that image every time you updated the kernel (or any of the userspace stuff inside) - but, again, we're sacrificing flexibility for performance.

Is this possible? Has anyone done something like it before?

At that point, why not just get rid of the OS?

That's what small embedded systems basically are --- there's not bootloader or any other extraneous complexity, the MCU starts up from the reset vector directly into the "application code".

Using Linux or some other full-blown OS is why a lot of newer embedded stuff like "smart" TVs, set-top boxes, etc. need to "boot" before they're usable. Regardless of how much you try to optimise the process, it won't beat a simple MCU or the "old school" non-computerised systems from power-on until usable.

Because the thing I described is still more flexible than having no OS at all.

If you bundle some of the userspace into the kernel image, then you trade-off some security, and needing to have to rebuild this large image periodically, for much faster startup times, and (optionally) load times for some userspace applications - but you can still install and load other userspace tools, they'll just start up "normally".

I'm not super familiar with the concept, but isn't this sort-of what a "unikernel" is?

Sounds like installing the application as init inside the kernel’s initramfs.

"bootcode.bin: This is the 2nd Stage Bootloader, which is run by the 1st Stage Bootloader which is embedded into the RPI by the manufacturer. Runs on the GPU. Activates the RAM. Its purpose is to run start.elf (3rd Stage Bootloader)."

That is Fastboot w/o network support (and a plethora of other features disabled). The trade-of appears to be worth for this particular application, but is hardly a general solution.

Learned a lot, thanks for the write up!

FWIW my Pi4 running nerves/elixir boots to the full beam environment in less than ten seconds.

This is a quad core machine that can do billions of cycles per second. That seems unnecessary and wasteful by at least two, possibly 3 orders of magnitude.

So much of this is just tradition.

I have server machines that take minutes after power on before they even get to the kernel. I know they aren't designed to be rebooted often but this burns tons of time for no reason whenever they need to be serviced for whatever reason. Testing bios changes or doing network reconfiguration takes 30-60 minutes instead of 5, it sucks.

That's because of the POST procedure allowing time to initialize and run the ROMs in all PCIe devices. Also, HBAs take time to do staggered initialization of disk, etc. There are usually some tweaks you can do in the BIOS to skip certain things but unless you're seeing >5min boot times, I wouldn't bother.

I understand the excuses, but the "time to initialize" thing is also just outdated tradition. These things (even the HBA stuff) are modern chips that can do zillions of operations per second.

It takes a long time because we have simply accepted the tradition of "initialization takes a long time".

There are certain bottlenecks like loading boot images over relatively slow SPI, but the synchronization margins between components are like 10x bigger than they need to be simply out of poor engineering.

Aren’t most initializations done with the CPUs in real-time mode? In that case, they are probably only able to run in MHz and not ghz

> This is a quad core machine that can do billions of cycles per second. That seems unnecessary and wasteful by at least two, possibly 3 orders of magnitude.

Booting a system goes beyond just the CPU. Most of the time isn't spent computing things. It's spent waiting for data transfers, initializing peripherals, and so on.

It's not just "tradition" to make it slow.

Did I read that the initialisation of entropy for random boot sequence (which needs time to get enough unrelated external asynchronous events to cause it to be 'unlike' another boot) was one of the things he disabled?

Can't seem to get ahold of _any_ embedded SBC these days. RPI 4s are being sold 3x their normal price, and alternatives like the RockPi are nowhere to be found :/

ESPs are still easy to get though and are surprisingly powerful enough for many tasks

How difficult it would be to make it work with RaspberryPI 400?

Really nice effort and info. Thanks for sharing!

neat! I had no idea that rpi used a closed bootloader SoC? are there no open source hardware alternatives ?

Anyone know of something like this for a beaglebone black, with Qt/QML as well?

Does anyone know how to get hold of a couple of raspberry pies. It is always a drag to get hold of

Go to the Raspberry Pi website, find the page for the product you want, the scroll down to the bottom to find a list of dealers. Availability probably varies by country, but I was able to find several configurations of the Pi 4B in stock in my country and pricing was in line with their recent price increase (sigh).

I usually search for suppliers using Google. Did you try that yet?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact