
Booting embedded Linux in 0.37 seconds on an ARMv7-A CPU at 528 MHz - eerimoq
https://github.com/eerimoq/monolinux-jiffy#boot-time
======
nimish
One interesting reason there's a been a bunch of progress in this space is
that automotive systems are required to show the rear view camera in a certain
amount of time. Progress driven by the oddest of things.

~~~
abdulmuhaimin
My car takes around 10 seconds(I think more im not sure) for the reverse
camera to work after starting the engine. Its annoying having to wait, thats
for sure

~~~
PudgePacket
Is it pulling a fresh docker image each time?

~~~
saltedonion
Gold

~~~
sushshshsh
I've been deep in the interview cycle lately and I just had a dream last night
that I was asked, in a non-technical interview with a product manager, "what
are the 5 ways to dockerize an application?"

I then said that I could only think of one way, and he responded "how do you
not know docker if you are applying for a java developer job?"

Thankfully I woke up

~~~
merb
well most of the time, my company deals with docker aswell. but my coworkers
have nothing to do with it, it just is fully automated. the only problem we
had, when introducing long running jobs that can be run while clicking a
button inside our ui, which runs a k8s job. that was hairy for my coworker,
but with enough shell scripts it started to be easier and easier.

~~~
sushshshsh
That's cool :) I mean, it's fine to not know the answer to everything. Usually
it's not a deal breaker, especially if you provide a general answer of "here's
how i tHink it works"

But what's always hilarious to me is 3 seconds after they ask the docker
question, they'll then ask "ok so tell me a bit more about how
CompleteableFutures, Consumers, and Threadpools work together and why you
would want to use them"

or my personal favorite, the predictable trifecta of

"whats the difference between an abstract class and an interface"

"ok tell me how garbage collection works"

"ok and whats the difference between final, finalize, and finally?"

~~~
merb
well docker should not be your problem the latter are things an java developer
should've heard about. well tbf CompletableFutures is relativly new in java,
not in other languages of course (i.e. scala, c#, etc...)

------
ndesaulniers
I'm impressed. I don't know of too much work that's going on upstream for
optimizing boot times, other than some of the clear linux stuff:
[https://www.phoronix.com/scan.php?page=news_item&px=Clear-
Li...](https://www.phoronix.com/scan.php?page=news_item&px=Clear-Linux-Ubuntu-
Eoan-Boot)

There are folks looking into improving boot times on Android; turns out init
and kernel drivers are a tangly mess of {dependency} spaghetti. Loading kernel
modules can induce delays in processing relocations.

The kernel patches disable a bunch of stuff, including ethernet it looks like?
Most of the kernel changes comment out blocks of code, or trade long delays
for shorter delays with more iterations.

~~~
yjftsjthsd-h
Yeah, this appears to take advantage of the fact that if you _know_ exactly
what hardware you're working with, you can skip a whole lot of detection and
general support, which helps quite a bit for embedded but less so on ex. your
laptop. Honestly, while there may well be dependency issues, I was under the
_impression_ that at least the kernel side of things generally _is_ pretty
well optimized, and in most cases you're paying for flexibility.

~~~
Twirrim
dracut on RHEL7+ builds the initramfs in "host_only" mode, which attempts to
strip it down to just the kernel modules you need for boot time.

Which is also somewhat annoying because it completely trashes portability,
which can be really irritating in cloud environments.

Ubuntu has similar capability (and I'm guessing Debian upstream?), but you
have to specifically enable it, by default it ships the full modules set in
the initramfs.

I would imagine most places outside of embedded world and maybe microvms, this
stuff isn't that valuable.

~~~
dvdkhlng
Slight correction: by default Debian/Ubuntu put all modules _potentially
required for initial boot_ into the initramfs. That's still a very small
percentage of all modules. I.e. you only need those modules that will get you
so far into the boot process that you have access to the root partition with
all the remaining modules.

E.g. if you want to do network-boot over wifi, you'll have to add a initramfs-
hook script to add the wifi modules for your hardware into the initramfs [1].
They are not included by default.

[1] [http://www.marcfargas.com/posts/enable-wireless-debian-
initr...](http://www.marcfargas.com/posts/enable-wireless-debian-initramfs/)

~~~
Twirrim
You're right, I misinterpreted what "most" meant in the mkinitramfs config.
Interesting. I've not seen any difficulties with porting Ubuntu between
different hardware configurations, so it seems to include a reasonable amount
of them.

Every now and then I'm tempted to try "dep" instead of "most", but then I
realise there just isn't enough benefit!

------
Polylactic_acid
Does anyone know why x86 systems are so slow to start up? I got a x570 mobo
recently and it still takes about 5 seconds to get to grub.

~~~
gsliepen
I've seen wildly varying start up times on x86 hardware. I had an Samsung Ativ
Book 9+ running stock Debian that with a cold start had fully started X and
showed the login prompt before the backlight turned on (about 1 second). I've
also had the "pleasure" to manage some Dell servers that took an impressive 2
minutes just to get past the BIOS.

(Well I did turn off the 5 second delay in GRUB to make that laptop boot time
possible.)

~~~
close04
The reason it varies wildly is that the hardware and possible settings
supported varies wildly on a DIY PC. A few things I noticed a few years ago
while troubleshooting slow POST:

\- Having a HDD connected to the additional SATA ports provided by an onboard
controller (not the chipset) incurred a delay because the controller was
initialized later (slower?), the HDD would power up later, and the POST
sequence wouldn't finish until the HDD finished spinning up. Checking SMART on
boot just made it worse.

\- Switching between the 2 GPUs I had available at the time (one Nvidia, one
AMD) consistently made a couple of seconds of difference.

\- Using XMP made POST _much_ slower too (don't remember by how much).

\- Updating the FW on my SSD shaved a bit of time.

\- Devices connected to USB during post also increased POST time.

And as a side note, my cheaper, simpler mobos would always POST faster.
Gaming/OC mobos these days are _loaded_ and it all adds up to what the PC has
to do to initialize in POST.

Bottom line is it's easier to optimize for short POST when your config is
locked in place (like a phone) then on a machine that could have any number of
possible permutations of hardware and settings. Today that's x86.

------
WatchDog
> Networking takes by far the longest time to get ready. The main reason is
> that Ethernet auto-negotiation takes a significant amount of time, about 1
> to 3 seconds.

Is this a fundamental limitation of how auto-negotiation works? Is there a way
to speed it up?

~~~
mschuster91
Set fixed values for the speed (i.e. force 1 Gbit/s if you _know_ there will
be a capable cable). I'm not sure on if the feature that detects crossover
cables can be switched off in userspace.

As for the timing, as long as you don't specify speed / crossover detection
you will always have some sort of physical link training and negotiation...

~~~
derefr
> Physical link training and negotiation

That still shouldn't take _that_ long, though, should it? 3s sounds like some
O(N^2) process is happening.

Keep in mind that this stuff is happening close to the metal, on a nowadays-
unshared medium (no Ethernet hubs around any more), with negligible speed-of-
light delays because the nearest switch is probably ~100ft away at most. If
some high-level protocol like Steam Link can have no perceivable latency, then
certainly PHY negotiation shouldn't.

My naive guess would be that the medium is speed-tested in order, first seeing
if it works at 1Mbps, then 10Mbps, then 100Mbps, and finally 1Gbps; and
_alternating in_ the crossover-cable versions of those tests; satisficing with
the last-achieved line rate when the next up-clocking fails.

If that's the case, then I have a feeling that modern hardware could get a bit
of an advantage just from doing things in the opposite order: 1.
optimistically assuming everything is set up for 1Gbps, and then, if not,
ratcheting down the link-speed until the link _starts_ working; and 2. only
doing the crossover-cable tests after _all_ the non-crossover tests fail.

You'd still have the same worst-case performance (3s) as before, but now that
worst-case would be for old 1Mbps crossover cables: not a common case!

~~~
PinguTS
There is a thing called: compatibility.

Even when you don't have a hub anymore and Ethernet is not shared anymore, it
is just a "anymore", which means it needs to respect those old things and need
to test for it.

BTW, even that Ethernet is not a shared medium anymore is wrong. In Industrial
and Automotive Ethernet we are back at SPE (Single Pair Ethernet) and working
shared medium, because switched Ethernet is way to expansive.

~~~
derefr
You don’t _test_ for the medium being shared/unshared; Ethernet is just a
protocol that _assumes_ a shared medium, and does
[https://en.wikipedia.org/wiki/Carrier-
sense_multiple_access](https://en.wikipedia.org/wiki/Carrier-
sense_multiple_access), even when there’s no benefit to it.

The reason that Ethernet can afford to do that even in entirely switched
deployments, though, is that Ethernet’s CSMA is very aggressive/optimistic,
meaning that there’s almost no overhead to it in the case that there really is
nothing else sharing the medium. In fact, Ethernet’s “1-persistent” CSMA is
effectively _designed_ for low contention, falling over at high [100+ TXers]
contention—which is why we don’t just use Ethernet over shared-medium WANs
like a cable ISP’s (pre-fibre-backhaul) coax, but instead protocols like
[https://en.wikipedia.org/wiki/Asynchronous_transfer_mode](https://en.wikipedia.org/wiki/Asynchronous_transfer_mode).

My point with bringing up the low contention of modern media wasn’t that
modern devices could somehow skip CSMA sense-idle altogether; but rather that,
due to the aggressive nature of Ethernet’s CSMA, Ethernet when _on_ a low-
contention or no-contention media _should_ have basically zero sense-idle
overhead, which means one less thing standing in the way of fast Ethernet PHY
autonegotiation in an archetypal modern deployment; and so one less reason to
privilege the hypothesis of “it’s the laws of physics making PHY
autonegotiation slow” over “Ethernet controllers are doing something dumb.”

Here’s something to chew on: USB is _also_ a shared-medium PHY with many
layers of legacy compatibility. And yet, on every OS I know of, a USB3 analog
input device (e.g. a microphone) can go from “off/unplugged” to “negotiated,
registered, driver up, and transmitting data to the host, that the host has an
open buffer for such that it will acknowledge and process the data within the
soft-real-time window”, all with 0.5s of delay or less.

Heck, the _entire Bluetooth stack plus connections to pre-paired devices_ can
come up faster than Ethernet—and Bluetooth sits _on top of_ USB! Bluetooth
comes up fast enough, that Apple bothers on its desktops to bring up the
Bluetooth stack _within EFI_ , finishing quickly enough that Bluetooth
peripherals can be used to signal an interrupt to the EFI boot process within
its ~1s interaction window. (We all know, meanwhile, what EFI under Ethernet
control looks like: the modern server mainboard’s 6+ second “IPMI autoconfig”
delay.)

~~~
PinguTS
Comparing USB to Ethernet is difficult. Any USB-device is talking USB1.1 at
startup. So negotiation basically is transferring a data packet with the
capabilities at a pre-defined data rate.

While Ethernet has to negotiate the number of wires, full-duplex vs half-
duplex (which depends on the number of wires). The code like Manchester vs.
4B5B.

The main difference is, in USB the host decides on what to talk. In Ethernet
there is no such instance.

------
yjftsjthsd-h
> A reboot is even faster, only 0.26 seconds from issuing the reboot to
> entering user space.

Curious; it doesn't say how that works. Could be kexec, but if it's a real
reboot then I'd be interested to know why it's faster. Can you still skip some
hardware initialization somehow?

~~~
eerimoq
I'm curious as well, and yes, by reboot I mean calling "reboot(RB_AUTOBOOT)".
I was under the impression that the ROM code has to start over from the
beginning, which I think it does. Maybe it can skip some initialization since
the SoC is already up and running. Maybe there is some initial hardware setup
that is only done at power on. It's hard to tell as this part of the boot
sequence runs closed source NXP code.

------
dmitrygr
> Start with MMC clock frequency at 52 MHz instaed of 400 kHz.

Whoa there!

Spec violation. Not guaranteed to work. Might work some days but not others.

~~~
eerimoq
The goal is to not configure the MMC at all in Linux, but just rely on the
bootloader has already configured it.

Btw, where can I find the spec? And where in the spec can I read about this?
Is this true even if we know the MMC supports 52 MHz?

~~~
OnACoffeeBreak
[https://www.jedec.org/sites/default/files/docs/JESD84-B451.p...](https://www.jedec.org/sites/default/files/docs/JESD84-B451.pdf)

You can use
[http://bugmenot.com/view/jedec.org](http://bugmenot.com/view/jedec.org) to
get free-to-use credentials.

The section in point is "A.6.1Bus initialization".

I'd say that if your MMC works fine going full-speed out of the gate and the
IC on the board supports this speed, then you unlikely to encounter any
issues.

~~~
eerimoq
Thank you very much. Very helpful answer. I'll continue to use 52 MHz until I
encounter problems (if any).

------
m463
I have an intel system that takes 20x as long just to beep that there's no
keyboard.

------
Yanu-3452
Are super-quick boots vulnerable to having a (presumably?) lower entropy pool
exploited or do the steps taken to mitigate low entropy across freshly minted
cloud images also help here?

~~~
HPsquared
Isn't that more a question of how repeatable the process is?

------
pcdoodle
The hardware looks awesome! Looks like it hasn't been updated in 2 years
though, has anyone produced these PCBs based on the Jiffy?

Is that an EMMC Socket on there?

------
eximius
Why can't network boot up be done asynchronously?

~~~
eerimoq
You are right, it probably could. I should try compiling the fec-driver into
the kernel and make it asynchronous. Should save a couple of milliseconds, and
hopefully not delay entering user space.

~~~
mindentropy
Isn't network bring up done asynchronously in distros these days? I see my
network interface not up and running even after the system is fully booted.

~~~
eerimoq
I'm just referring to the fec driver kernel module, not the user space
software. But I might of course be wrong. I've done lots of iterations trying
to optimize the boot time on this tiny embedded system, and it's not always
easy to remember all details. It could be that the fec driver is
asynchronously probed. I guess I have to try it again at some point. =)

