Hacker News new | past | comments | ask | show | jobs | submit login
Playing with a Raspberry Pi 4 64-bit (cloudkernels.net)
116 points by _ananos_ on July 12, 2019 | hide | past | favorite | 67 comments

>Lightweight virtualization is a natural fit for low power devices

No it's not. The pi is punchy enough for it sure, but the above doesn't follow in my mind. Low power = limited resources so ideally you want it run on bare metal to minimise overheads

sure! bare-metal would be awesome. But isn't it a shame not to take advantage of all the virtualization/containerization goodies out there? After all, the virtualization overhead nowadays is only referring to I/O (network & storage). CPU/MEM are being virtualized using hardware extensions at (near-)native speeds.

>CPU/MEM are being virtualized using hardware extensions at (near-)native speeds.

Is that really true for rasp level gear? Haven't tested it but my gut feel tells me the hit is sizable.

Anyway - don't let that dissuade you from the mission. Container goodies on a rasp is a grand idea...just don't think it's quite as hit free as the article suggests ;)

definitely! that's our initial goal. Quantify the penalty and examine the trade-offs. Clearly, virtualizing workloads on such devices with standard VMMs/hypervisors isn't ideal. And we're working towards this direction; playing with the systems stack is what makes us tick, so, it will be a fun and (hopefully) useful adventure :D

>Quantify the penalty and examine the trade-offs.

If you do have numbers available I'd love to see a write-up on what the real world hit is.

I used to run a rasp3b for home server but that just wasn't punchy enough (crappy fake gigabit etc). So my old gaming laptop became a server w/ docker etc.

But itching to justify a rasp 4.

Anyway...thanks for exploring 64...thought the 32 on rasp 3 was unfortunate.

I have a raspberry pi 4. That thing runs HOT. 64c idle (44c above ambient) without any case, but with a PoE hat and disabled fan.

The PoE hat makes attaching a proper heat sink (or using the flirc case) impossible, so I have now rolled my own using copper coins.

I am not comfy with letting my pi run that hot 24/7, and I don't want active cooling. I bought a rock pi 4 which has a PoE hat that can be combined with a HUGE heat sink, and there I have no heat issues at all.

There are cases you can use when you don't have a PoE hat, but the original pi 4 case almost throttles the pi at idle.

I have been testing a rpi4 and comparing it to the rpi3. The Rpi4 is actually the first Pi that is usable as a desktop computer, but have seen thermal issues like you. Many the early reviews have only done short tests with the cover off. The thermal CPU throttling on the RPI4 is huge, while the RPI3 might slow things down 50-30%. The RPI4 will slow down to 100%. In the case I tested the RPI4 was running close the RPI3 speeds. I don't think a heatsink or CPU fan is needed there just not enough heat being generated to warrant the expense.I have had good luck with running the RPI with just the cover off, and some sort of air circulation in the room.

I have been trying to figure out the cheapest way to cool the RPI4, I am thinking a single slow 5V fan blowing across the card would be the best solution. This would also cool off the VLI, and take up less room on the card. I am just hoping someone will make the case, fan combination the works at a good price point. If things get too pricey then other SBC look very interesting.

I am really disappointed the thermal issues were not considered. A simple fan header like on UDOO boards, or an official case with space and air flow so a heat-sink could actually work would make me feel a lot better.

If you don't use a hat, you can probably fit the pi a real heat sink or with something like this: https://geekworm.com/collections/new-arrivals/products/raspb...

For my rock pi, that is more than enough.

It looks like a great heat sink, I wish I could order one now.

My biggest concern with that is the blocking of the camera connector.

It is also 4 times more expensive than a cheap 5V fan.

but without noise! My significant other is allergic to noise. Either I install a fan that is too big and run it on the lowest setting, or I do it passively. I don't like moving parts, so passively it is :D

I just got my rock pi 4. It is undocumented and is probably full of hacks (someone mentioned 5 different partitions just to get it to boot), but it has an official huge heatsink for $8 that can be combined with a poe hat. My rpi4+poe hat now has a heat sink consisting of 5 cent euro coins and one of those very very small heat sinks that does nothing :D

I am running a Noctua 40mm 5v fan on top of my Raspberry PI. It cannot be heard, I've been tempting to under volt it 3v since it has been keeping my RPi4 at 50C under load without heatsink.

I’m assuming you updated the firmware? There was a fix that lowered the temperature significantly.

Only 3-4c in my case, with some losses in USB data transfer rate.

I haven't installed the latest firmware, is there a way to backup my current firmware so I can compare them easily?

Try this one: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=2435...

I believe the old binary is there and you can revert using the same mechanism

will do -- that was only a high-level run to make sure there's enough compatibility with our framework & tools to move forward.

stay tuned ;)

No it's not... Containers "goodies" serve a purpose, but they also bring a big number of disadvantages to the table! It's not all fun and games in container world. We definitedly need better build tools, but I don't think containers are the solution here, or at least not the only solution.

agree, I'm just talking about the flexibility to build & deploy a workload in any device (cloud / edge). Think of it like "building your own buildroot image" vs. "docker build" based on busybox or alpine. In the first case you'll end up with exactly what you want, but you'll spend quite some time building it; in the latter case you'll have to mix & match various other stuff, but you'll end up executing the actual application a lot faster.

Currently u-boot does not support the RPi4 properly (some hardcoded registers and clock numbers), but patches are pending. So EFI support as required by Arch, openSUSE and some others is not available yet.

Progress for actual mainline support can be followed here: https://github.com/lategoodbye/rpi-zero/tree/bcm2838-initial

There seems to be some work-in-progress with supporting AArch64 for the rPi.

This looks like a fun FOSS subject to contribute to once I get my hands on a Pi4. I like the low-level stuff.

Does anyone here have recommendations where to contribute to?

By the way, the 3B+ is armv7 by default, but also works fine as aarch64. That's how I use mine under nixos.

Very impressive. In your RPi4 experiments would you benefit from having more DRAM if you didn't have the 1GB limit? I'm wondering how much difference the gic makes. Do you think there is a way to breakdown the performance uplift?

More RAM could help us with a scale test, or maybe with a storage-related benchmark.

Regarding the GIC, first I think we should spend some time examining the benefits from the A72 upgrade. Then, breaking down the time spent on each step of the VM lifecycle should be fairly straightforward (annotate EL changes, capture numbers with perf or something similar).

I have two 4 GB RPi4s, and can run any tests if you like.

Or is this about some 1GB barrier DMA limitation?

I did notice VideoCore can only access maximum 1GB of RAM. Related to that?

the issue about the available RAM is this: https://github.com/raspberrypi/linux/commit/cdb78ce891f6c636...

probably some kind of address mapping, but I'm no expert on this stuff ;-)

Saw that, but that doesn't really tell why, and how much effort is required to solve this. Maybe SD stuff is in the VideoCore side, and it simply can't DMA above 1 GB limit, and requires to have below 1GB DMA buffers + copy for transfers to upper memory. Maybe.

I hope they're going to figure out AArch64 support for the Raspberry Pi soon, especially since they have launched the models with larger amounts of RAM. Having a true AArch64 Raspberry Pi would be really nice.

I hope we can get true AArch64 support as well, for many different reasons. One is that Dolphin-emu has an AArch64 JIT but not a 32 bit ARM one. While it’s probably not terribly hard to get the Pi4 booting to 64 bit, which is actually probably good enough if you’re just interested in virtualization, I have no idea how to get the VideoCore 6 drivers in 64 bit and if I had to guess, I’d guess shit out of luck for now.

(I am not sure Dolphin-emu would run with considerable performance, but damn if I’m not curious.)

VC4 works great in 64-bit, I've had WebGL running on some browser under Weston on ArchLinuxARM on a Pi3. I would expect V3D (VideoCore 5/6) to work out of the box on 64-bit too. Mesa very rarely introduces any unportable code – e.g. RadeonSI pretty much just works on FreeBSD/aarch64 :)

Wait - is the VC4/V3D driver open source? I feel like I missed that bit.

So basically, I should just cross compile RPi Linux and Mesa and it should work?

To be honest, I have been confused about the VideoCore driver situation from day 1, but I thought it was a blob.

VC4 and V3D are open source Mesa drivers for VideoCore 4 and VideoCore 5/6 respectively. I don't remember what the proprietary driver is called.

> ...AArch64 support for the Raspberry Pi soon, especially since they have launched the models with larger amounts of RAM...

1 TB RAM supported by 32-bit ARM is not enough? I don't think increasing process virtual address size is the most pressing need at this point. Although it would help with things like WebAssembly, and anything else that benefits from mapping a lot of stuff in the address space.

The myth that you need to have 64-bit to access more than 4 GB RAM seems to still be very strong, despite not being true for a long time on modern x86 (since Pentium Pro in 1995) and ARM platforms (since LPAE designs, like 2013 Cortex A7/A15, etc.).

Of course I'd also like to get AArch64 on RPi4, but for entirely different reasons. To support double precision NEON SIMD and to get more registers to help Cortex A72 core out-of-order engine to keep the execution units busy. More registers helps to reduce dependencies, giving OoO more freedom to rearrange instructions for optimal execution.

I'm no Linus Torvalds fan, but his rant (and follow-up) on PAE explains why this is actually a terrible idea:

1) https://www.realworldtech.com/forum/?threadid=76912&curposti... 2) https://www.realworldtech.com/forum/?threadid=76912&curposti...

So cute. The times before Meltdown.

Anyways, Linus is being... Linus. He is of course right in some ways, from kernel point of view it does suck, when you can't map all of the caches and buffers simultaneously. But PAE is pretty good for a bunch of large processes.

actually, the host OS for this post is a ubuntu 18.04.2 server image.

root@pi:~# cat /etc/issue.net

Ubuntu 18.04.2 LTS

root@pi:~# uname -a

Linux pi 4.19.57-v8+ #2 SMP PREEMPT Tue Jul 9 20:31:37 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

root@pi:~# file /bin/ls

/bin/ls: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-, for GNU/Linux 3.7.0, BuildID[sha1]=de05fcef79d88af9cf9a71ed38e73af0b179bfb2, stripped

Wait - what did it take to get 18.04.2 server running?

Is there an SD image that works?

baking one right now -- it is based on http://cdimage.ubuntu.com/ubuntu/releases/18.04.2/release/ub... but slightly modified (no root password, KVM setup, docker installed etc.)

will provide a link once I manage to upload it somewhere

there you go: https://cloudkernels.net/rpi4-64-bit-kvm-docker.img.xz sha1sum: 1b96a6be5256182eaceb5894fb993c8ffce8c2a2

it's quite beefy, 2.5GB -- apologies for that. We should be able to craft a much smaller image based on the ubuntu server release. Apparently, something was off about mounting the root partition on the first boot, so we went with our original image.

Ok, so I must be a bit thicker than the rest, but what's the login? I tried ubuntu with no password, as well as root with no password, as well as root/root, and root/admin, and the default pi/raspberry, but to no avail.

What am I missing here?

hmm I'm pretty sure its root with no password... you can edit /etc/shadow and remove the 2nd field on root and ubuntu and try again just in case

I just tried root with no password, didn't work for me. I am getting bounced. Rewriting the SD card one more time...

Ok, so I am deeply confused. DOwnloaded the image, the SHA1sum is correct. Unpacked the image, wrote it on an 32GB SD card from Samsung. I am booting from an RPI4 with 4GB of RAM. Are the 4GBs the issue? I can ssh to it, it will send me its public key, and I can give the login and the password, but providing "root" as login and hitting enter upon the password gives me this:

C:\Windows\System32>ssh -l root root@'s password: Permission denied, please try again. root@'s password: Permission denied, please try again. root@'s password: root@ Permission denied (publickey,password).

root w/o password won't work for me. How is anyone else connecting? I am using SSH.

password-less SSH is not supposed to work, use a console cable or just inject your SSH key to /root/.ssh/authorized_keys and it should work.

built a stock ubuntu (cloud-init ready) image: https://cloudkernels.net/ubuntu-18.04.2-preinstalled-server-... sha1sum: 0b1d8b72ea5410fb7928925fd76dd0218b4f7a94

The behaviour should be identical to installing on a RPi3 (user/pass ubuntu/ubuntu, ethernet networking setup correctly etc.)

Ubuntu Mate has an image that works on the Pi3.

That's what the article is about. Also openSUSE (and SLES) run in 64 bit mode on a RPi 3 by default without issues, using a mainline kernel and FOSS userspace (no proprietary /opt/vc stuff).

Shortly after the Pi 4 was released I looked into it and found that, for the Pi 3 at least, the foundation weren't really interested. It essentially adds another distro for them to maintain. It's a shame.

It's not like anyone is forced to use their distro. It's newbie-oriented and extremely annoying for Unix geeks anyway (Debian base with its always-outdated packages, etc)

You don't even have to run Linux at all. FreeBSD/aarch64 works great on a Pi3 (as a headless server at least, VC4 is not ported)

How is Debian annoying for Unix geeks? If outdated packages are an issue, you can always install stable backports, or stuff from testing/unstable.

True but the newbies will slowly drift away from all the projects that don't want to split focus from aarch64.

I thought the whole point of things like docker is they gave native performance? Isn't docker just a combination of cgroups, network namespaces, pid namespaces, fancy filesystem mounts, etc.

The overhead should be zero by design.

What am I missing?

there is a number of factors to consider when comparing this kind of technologies.

Linux containers provide native performance for almost all applications, true; but there are tons of implications when it comes to multi-tenancy (security, QoS, trusted execution etc.).

Moreover, spawning an application as a docker container can incur significant overhead on startup time, fs setup, FS access etc.

So in theory yes, containers provide native performance. But do we really want to run apps on multi-tenant edge devices as containers ? I would argue not necessarily ;)

Very interesting! When running KVM on the Pi, does that use virtualisation extensions if the processor has them? (I don't know anything about ARM and virtualisation).

yeap, that's the case with the older models too. A53 has also virtualisation extensions. The main difference about the Pi4 is the GIC which removes the need for emulating interrupt handling.

Other than that, the A72 handles VMEnters/VMExits by trapping in/out of EL0/EL1/EL2.

I know that this is for 64-bit on the new Raspberry Pi specifically, but would the Pi 4 be able to run a Playstation 2 emulator?

I'm assuming you'd want to have playable framerate.

Maybe. With a herculean effort, way surpassing tricks and techniques used by x86 PCSX2 and other PS2 emulators.

Pure CPU computational performance wise yes. You'd need to have very clever ways to emulate the "Emotion Engine", like VPU0 and VPU1 vector processors. A single RPi4 core would far exceed those, but using it in an emulator might be very hard, because you'll probably need to synchronize execution and to emulate PS2 internal memory model. Emulating PS2 300 MHz MIPS core would be a walk in the park compared with the other issues.

But memory bandwidth might be the true showstopper. I think RPi4 has just one 32-bit DDR4 channel at 2400 MHz, which yields just 9.6 GB/s theoretical maximum bandwidth. More than what PS2 got, but the margins for emulation are uncomfortably low.

So while it's "possible", I don't think it'll happen at playable framerates. It'd be easier to reverse engineer the rendering engines in popular PS2 games and to simply port them.

> It'd be easier to reverse engineer the rendering engines in popular PS2 games and to simply port them.

What do you mean by that?

That the technical challenges and amount of work might be less to port the games compared to simply emulating the games on this platform.

Interesting question.

Emulating the PS1 is a lot easier on modern hardware than emulating e.g. the SNES, because most of what it does is push polygons, and PS1 games just don’t push very many of them; so even the most anemic mobile GPU core is more than enough to handle the load.

I imagine this is still true of the PS2 to an extent. It’s even more just a polygon pusher than the PS1; and the 3D is still generations-old.

There’s this forum post linked to on the PCSX2 website: https://forums.pcsx2.net/Thread-Sticky-Will-PCSX2-run-fast-o..., that says the recommended specs are something like:

• Intel Core 2 Duo / Core i3 @ 3.2Ghz or faster

• Nvidia Geforce 9600GT / 8800GT or better

Probably all you need here, to answer this question in theory, is to compare raw benchmark scores between those components and the Pi4’s CPU/GPU.

Or, for another angle, here’s a forum post about Play! (an emulator that has aarch64 support) running on the Nintendo Switch’s Tegra X1: https://gbatemp.net/threads/play-ps2-emulator-is-running-on-...

Spoilers: it gets 10FPS. And the Pi4’s GPU is no Maxwell. (And, while Play! isn’t the most optimized of emulators, it’s the one you’d have to use on aarch64. Even if you optimized it 6x, so that the Tegra got 60FPS, the Pi4’s Videocore GPU wouldn’t be pulling nearly that.)

It about how accurate the emulator is. ePSXE used to run fine on a 700mhz PIII with 128gb of ram. There were SNES emulators that also worked fine.

However they were inaccurate and lots of games weren't exactly great. MGS for example didn't work well on ePSXe for years because it used many more hardware tricks than same the original ridge Racer. Ridge Racer R4 doesn't even work properly on a PS2 backwards PS1 compatibility because the PS1 on a chip isn't perfect.

Like I said, PS1 emulators like ePSXe run fine on everything. They would run fine on an Apple Watch. Modern GPUs (and even a PIII’s IGPU is “modern”) are scary compared to back then.

Accuracy makes things slower, sure, but the PS2 emulators in question (like Play!) aren’t striving for accuracy. They’re striving just to get the darn thing to run. But there’s only so much accuracy you can trade off when the only thing you’re really being asked to do is push polys. What are you going to do; drop some on the floor?

But actually, something like that might be possible. You know how the mobile port of FFXV looks—same game, lower-detail textures, all the models replaced with re-rendered low-poly versions? It’d totally be possible to build a PS4 (or any other 3D console) emulator that can achieve that effect “automatically.” We already have “HD texture packs” for emulators like Dolphin and Citra. We could have “LQ texture+model packs”, to reduce PS4/3/2-era graphics down to PS1-era graphics. That’d take most of the workload off the emulator, and let basically anything run these games. (And, in theory, you could compute such a pack automatically. For the textures, you’re just downsampling. For the models, 3D-geometry simplification algorithms exist, although they’re not realtime. But you could run them over a whole game’s worth of assets in a couple hours; and so, with disc images in hand, you could stand up a server and work your way through the whole console library over a few months.)

Oh okay. I think I misunderstood what you were saying slightly.

Some of the SNES/NES emulators require a lot of resources (relatively) to run accurately. I wondered if it was the same situation. So yeah it appears the GPU isn't capable.

I do wonder what the performance numbers of the Nintendo Switch vs the RPi 4 would be.

>128gb ram

I think you mean 128mb. Otherwise, that must have been some Pentium system!


Not sure, but maybe-useful point of comparison:

I used an RPi2 as a Retroarch (Lakka) box maybe... oh, two years ago or so. At that time, it managed the handful of PSX games I threw at it just fine, which kinda surprised me, but almost no N64 games ran at a playable framerate—only one that really worked was Mario 64, which I gather is one of the best-optimized and lightest-weight games on the system. I didn't try PS2 because PSX was OK but from its performance and how the N64 was doing I figured it didn't have a chance.

16-bit consoles were pretty good. Couldn't run the full-accuracy cores for SNES but that's not surprising. Everything before that 100% fine. Filters (say, for mimicking a CRT) weren't feasible on the hardware, beyond the simplest ones.

The software's gotten a bit better—I think the Rarch team has done some major work pushing more of the computation of later emulator cores, like N64 and up, onto the graphics chip instead of the CPU—so I'm not sure how the RPi2 would fare now.

The only benchmark I can find compares the RPi4 to the RPi3, not the 2, and the 4's single-core performance measures 3-5x better than the 3 (wow!) so I'm going to answer your question with a very confident probably, pending actually trying it.

My circa-2011 "gaming" laptop with a quad core i7 CPU, 4GB RAM, and discrete GPU ran PS2 games vi emulator at 10 FPS or so.

The state of emulation has come a long way since then, but still, that was a "desktop replacement" rig back in the day. I don't know how well a Pi can pull that off.

The PS2 architecture was famously a pain in the butt to program for, to boot.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact