I helped write parallelknoppix when I was an undergrad - our university's 2nd cluster ended up being a bunch of laptops with broken displays running it. Took me a whole summer.
Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.
That I still had root on.
So I added a script that would renice the prof.'s jobs to be as slow as possible.
The school I went to had similar but more insane policies
* I frequently took computer science graduate courses and received only undergrad. credit because they could not offer the undergrad course
* Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate
I would like to point out that 300 and 400 level courses in the CS program usually had no more than 8 students. I distinctly remember meeting in a closet for one of my classes, because we had so few students they couldn't justify giving us a classroom.
Contrast that with the math department where I wanted to take some courses in parallel rather than serial. After a short conversation with the professor he said "ok sure, seems alright to me".
Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate
I went to an institution that did the opposite; seats were reserved for non-cs majors despite a shortage of sections. This resulted in CS undergrads waiting for courses just so they could graduate. It was frustrating because it felt like the department was taking care of outsiders over its own.
Part of that is because “average class size” is used in metrics for the university ranking systems and every university out there wants to game those rankings.
My guess is they wanted you to pay the higher CS tuition to take the CS classes.
Math departments also tend to be a lot more lax in my experience. Case in point: I got sign-off to take a 300-level pure math class without its 300-level prereq. As well as just replace one required course for another to graduate.
i think i've read this kind of reply before. and i was not wrong [1]. nice story to tell. too bad my uni. didn't have that kind of facilities and opportunities.
Almost certainly user level libraries. Coordinating at the kernel level is either a single system image[1] or a distributed operating system[2], which are very uncommon in scientific computing.
Interesting. It's likely the point of constraining the graduate course to grad students had nothing to do with their maturity. But you see what you think is a sign of immaturity, and turn the constraint into a maturity filter.
I'm quite grown, and I wonder about the ownership/control of the cluster and why he didn't simply lock the professor out entirely, contingent on the approval of his waiver.
If anything, doing something as small as lowering the priority of his jobs instead of brazenly stonewalling him might be the sign of immaturity.
The computers were mostly the school's, the carpentry was ours (mine and a friend's), the cabling and network switches were ours (scavenged, eventually they bought a nice big switch), the labor was ours.
It's not really much of a prank war if you do a small prank and the other guy tries like hell to pretend he didn't notice and also tries like hell to pretend he didn't see you derping around in the building... escalating would have been evil on my part :)
I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc) but lack of ECC and OOB management kinda disqualifies it for anything serious.
>He's running forty Blades in 2U. That's:
>
> 160 ARM cores
> 320 GB of RAM
> (up to) 320 terabytes of flash storage
>
>...in 2U of rackspace.
Yay that's like... almost as much as normal 1U server can do
We were hanging out in the garage of a mutual friend, chatting. Got to the "what do you do" section of the conversation, and he says he works in massively parallel stuff at XYZ corp. Something something, GPUs.
I make the obvious "can you make a Beowulf cluster?" joke, to which he responds (after a pregnant pause), "you... do know who I am?"
Yep. Donald Becker. A slightly awkward moment, I'll cherish forever.
I like user big.ears' speculation on what someone could possibly do with that much parallel compute:
I don't think there's any theoretical reason someone couldn't build a fairly realistic highly-complex "brain" using, say, 100,000,000 simplified neural units (I've heard of a guy in Japan who is doing such a thing), but I don't really know what it would do, or if it would teach us anything that is interesting.
Lame, that just gives the opportunity to release for a higher price the Neural Unit Pro, Neural Unit Max, and Neural Unit ProMax. You consider it lame because you're acumen in business nuance is lame ;-)
You and me both. The funny thing is, I wound up writing a program that would benefit from clustering, and felt my way around setting up MPICH on my zoo. I laughed out loud when I realized that, after all these years, I'd built an impromptu Beowulf cluster, even though the machines are scattered around the house.
Installing MPICH from source instead of from your distribution is best if you can't have all your cluster members running the same version of the same distro and/or have multiple architectures to contend with. But it takes forever to compile, even on a fast machine.
A cluster of workstations (COW) is usually opportunistic exploiting existing systems, and lower density than a dedicated (usually rack-based or datacentre-based) cluster.
In practice, COWs usually turn out to be not especially useful, though there are exceptions.
Then add ArgoCD for deployment and istio for a service mesh!
While you are at it, also setup Longhorn for storage. With that solved, you might as well start hosting Gitea and DroneCI on the cluster, plus an extra helm- and docker repo for good measure. And in no time you will have a full modern CI/CD setup to do nothing but updates on! :-)
Seriously, though, you will learn a lot of things in the process and get a bottom up view of current stacks, which is definitely helpful.
It's too bad Apple never bought Gerry Popek's LOCUS (https://en.wikipedia.org/wiki/LOCUS), which could do process migration between heterogeneous hardware!
I'm not sure that Plan 9 does process migration out of the box. It does have complete "containerization" by default, i.e. user-controlled namespacing of all OS resources - so snapshotting and migration could be a feasible addition to it.
Distributed shared memory is another intriguing possibility, particularly since large address spaces are now basically ubiquitous. It would allow users to seamlessly extend multi-threaded workloads to run on a cluster; the OS would essentially have to implement memory-coherence protocols over the network.
Sure do! It is interesting that these technologies evolve more slowly than it seems, sometimes.
On the graybearding of the cohort, here’s a weird one to me. These days, I mention slashdot and get more of a response from peers than mentioning digg!
In 2005, I totally thought digg would be around forever as the slashdot successor, but it’s almost like it never happened (to software professionals… er, graybeards)
Sterling and his Goddard colleague Donald J. Becker connected 16 PCs, each containing an Intel 486 microprocessor, using Linux and a standard Ethernet network. For scientific applications, the PC cluster delivered sustained performance of 70 megaflops--that is, 70 million floating-point operations per second. Though modest by today's standards, this speed was not much lower than that of some smaller commercial supercomputers available at the time. And the cluster was built for only $40,000, or about one tenth the price of a comparable commercial machine in 1994.
NASA researchers named their cluster Beowulf, after the lean, mean hero of medieval legend who defeated the giant monster Grendel by ripping off one of the creature's arms. Since then, the name has been widely adopted to refer to any low-cost cluster constructed from commercially available PCs.
yeah, wanted to replicate something like that by proposing a hardware vendor who visited my uni. decades ago. didn't go nowhere because i was intimidated by the red-tapes.
I do think this is sort of fool's gold in terms of actual performance. Even though the core count and RAM size is impressive, those cores are talking over ethernet rather than system bus.
Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.
Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.
I built such a toy cluster once to see for my self and gave up. It is too slow to do anything serious.
You can be much better off by just buying older post lease server.
Sure it will consume more power, but conversely you will finish more tasks in shorter time, so advantage of using ARM in that case may be negligible.
If it was Apple's M1 or M2, that would have been a different story though.
RPi4 and clones are not there yet.
I overall think people tend to underestimate the overhead of clustering. It's always significantly faster to run a computation on one machine than spread over N machines with hardware of (1/N) power.
That's not always a viable option because of hardware costs, and sometimes you want redundancy, but those concerns are on an orthogonal axis to performance.
the fastest practical interconnects are roughly 1/10th the speed of local RAM. Because of that, if you use interconnect, you don't use it for remote RAM (through virtual memory).
I don't think anybody in the HPC business really pursued mega-SMP after SGI because it was not cost-effective for the gains.
Both Single System Image and giant NUMA machines were and are still pursued because not everything scales in shared-nothing message passing well (some stuff straddles it by doing distributed shared memory over MPI but using it mostly for synchronisation).
It's just that there's a range of very well paying problems that scale quite well in message passing systems, and this means that even if your problem scales very badly on them, you might have easier time brute forcing the task on larger but inefficient supercomputer rather than getting funding for smaller more efficient one that fits your problems better.
Cray did some vector machines that were globally addressed but not coherent. That’s an interesting direction. So is latency hiding.
The really important thing is that the big ‘single machine’ you’re talking about already has numa latency problems. Sharing a chassis doesn’t actually save you from needing to tackle latency at scale.
Well, a complete M1 board, which is basically about as large as half an iPhone mini, is fast enough. It's also super efficient. So I'm still waiting for Apple to announce their cloud.
They're currently putting Mx chips in every device they have, even the monitors. It'll be the base system for any electric device. I'm sure we'll see more specialized devices for different applications, because at this point, the hardware is compact, fast, and secure enough for anything, as well as the software stack.
Tangential, but it is so funny to me that “TFA” has become a totally polite and normal way to refer to the linked article on this site. Expanding that acronym would really change the tone!
I'm not sure it is 'totally polite'? I usually read it as having a 'did you even open it' implication that 'OP' or 'the submission' doesn't. Maybe that's just me.
Maybe it isn’t totally polite, but it IMO it reads in this case more like slight correction than “In the fucking article,” which would be pretty aggressive, haha.
Unless they need something Pi specific I don't understand why this would be preferable versus just virtualizing instances on a "big ARM" server. I'm sure those exist.
It probably lends itself to tasks where CPU time is much greater than network round trip. Maybe scientific problems that massively parallel. Way back in the 90s I worked with plasma physics guys that used a parallel system on "slow" Sun boxes. I can't remember the name of the software though.
It's actually 3U since the 2U of 40 pis will need almost an entire 1U 48 port PoE switch instead of plugging into the TOR. The switch will use 35-100W for itself depending on features and conversion losses. If each pi uses more than 8-9W or so under load, then you might actually need a second PoE switch.
If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.
In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.
I agree but can't you get the same effect with VMWare ESXi? If I just wanted to "have fun" managing scores of tiny computers, and I emphasize that this sounds like the least amount of fun anyone could have, I can have as many virtual machines as I want.
I can understand why some people want something physical/tangible while testing or playing in their hobby environment. I'm still a fan of virtualization - passmark scores for an RPi4 (entire SOC/quad core) are 21 times less than a per-single-core comparison in a 14-core 15-13600k (as a point of reference, my current system) and while am running 64GB RAM, can easily upgrade to 128GB or more on a single DDR4 node.
Hard to see to an advantage given obvious limitations, although it may make it more fun to work within latency and memory constrictions, I guess.
Haha jellyfin would eat through all your memory and cpu time transcoding or remuxing on a SBC. I'm seriously thinking of getting another home server just to run that.
There are Intel Atom CPUs that support ECC. I had a Supermicro motherboard with a quad core part like that and I used it as a NAS. It was not that fast, but the power consumption was very low.
>Corsair SF450 PSU
>ASRock Rack X570D4U w/BMC
>AMD Ryzen 7 Pro 5750GE (8C 3.2/4.6 GHz)
>128GB DDR4-2666 ECC
>Intel XL710-DA1 (40Gbps)
>LSI/Broadcom 9500-8i HBA
>64GB SuperMicro SATA DOM
>2 SK Hynix Gold P31, 2TB NVMe SSD
>8 Hitachi 7200rpm, 16TB HDD
>3 80mm fans, 2 40mm fans, CPU cooler
That was an at the time modern “Zen 3” (using Zen 2 cores) system on an X570 chipset. The CPU mostly goes in 1L ultra SFF systems. TDP is 35W, and under stress testing the CPU tops out around around 38.8-39W. The onboard BMC is about 3.2-3.3W of power consumption itself.
Most data ingest and reads comes from the SSD cache, with that being more around 60W for high throughput. Under very high loads (saturating the 40Gbps link) with all disks going, only hits about 110-120W.
By comparison, a 6-bay Synology was over double that idle power consumption, and couldn’t come close to that throughput.
thanks for the parts list, especially because I think ASRock Rack paired with a Ryzen Pro offers better performance than a Supermicro in the same price range.
I could drop a few more watts if ASRock could put together a decent BIOS where disabling things actually disables things.
SuperMicro costs what it does for a reason.
—- ————-
If you’re looking for a chassis, I’m using a SilverStone RM21-308, with a Noctua NH-L9a-AM4 cooler, and cut some SilverStone sound deadening foam for the top panel of the 2U chassis.
Aside from disks clicking, it’s silent, runs hilariously cool (I 3D printed chipset and HBA fan mounts at a local library) and it’s more usable storage, higher performance (saturates 40Gbps trivially) and lower power consumption than anything any YouTuber has come remotely close to. That server basically lets me have everything else in my rack not care much about storage, because the storage server handles it like a champ. I really considered doing a video series on it, but I’m too old to want to deal with the peanut gallery of YouTube comments.
If you don't mind me asking, how do your other workloads access the storage on it, NFS? The stumbling block for NFS for me is identity and access management.
Wow I just picked up an ASRock Rack X570D4U and put my 5950X into it.
Do you know how to make the BMC not a laggy mess when using the “H5Viewer”? I’m getting basically unusable latency when the system is two yards away compared to a RDP server 1,000 miles away.
That's impressively low, considering the amount of storage capacity and the performance potential for the time you need it. It goes a long way towards paying for itself if you replace some old Xeon server with it.
I think it was idling at something like 30-40W with four HDDs and a UPS. I didn't have an especially efficient PSU and the UPS must have taken some power too. The motherboard alone would draw as little as 15W, I suppose.
Most AMD desktop platforms support ECC, and if you don't use overclocking facilities, they are pretty efficient (though their chiplet architecture causes idle power draw to be a good fraction of active power draw, still much less than 50W though).
How much RAM is that with? My home server idles at ~25-27W, but that's with only 16GB (EEC DDR4). However, throwing in an extra 16GB as a test didn't measurably change the reading.
Intel also has now up to 480 cores in an 8-socket server with 60 cores per socket, though Sapphire Rapids is handicapped in comparison with AMD Genoa by much lower clock frequencies and cache memory sizes.
However, while the high-core-count CPUs have excellent performance per occupied volume and per watt, they all have extremely low performance per dollar, unless you are able to negotiate huge discounts, when buying them by the thousands.
Using multiple servers with Ryzen 9 7950X can provide a performance per dollar many times higher than that of any current server CPU, i.e. six 16-core 7950X with a total of 384 GB of unbuffered ECC DDR5-4800 will be both much faster and much cheaper than one 96-core Genoa with 384 GB of buffered ECC DDR5-4800.
Nevertheless, the variant with multiple 7950X is limited for many applications by either the relatively low amount of memory per node or by the higher communication latency between nodes.
Still, for a small business it can provide much more bang for the buck, when the applications are suitable for being distributed over multiple nodes (e.g. code compilation).
This is the exact space I’m in, high cpu low network. By my estimates it’s about 1/4 the cost per CPU operation to use consumer hardware instead of enterprise. The extra computers allow for application level redundancy so the other components can be cheaper as well.
One problem with 480 cores in single node: 480 cores is a shitload of cores, who needs more than a single node at this point? The MPI programmer inside me is having an existential breakdown.
its a nice hobby project, but of course a commercial blade system will have far higher compute density. supermicro can do 20 epyc nodes in 8u, which at 64 cores per node is 1280 cores in 8u, or 160 in 1u, so double the core density, and far more powerful cores, so way higher effective compute density.
Also not noted: 320 TB in 40 M.2 drives will be extremely expensive. Newegg doesn't have any 8 TB M.2 SSDs under $1000. $0.12/GB is about twice as expensive as more normally-sized drives, to say nothing of the price of spinning rust.
What about them? 1U servers from vendors are reliable and efficient - people use them in production for years. As for the cost, those hobby-style board are very expensive for dollars/performance. Indeed I'm not getting why would one want a cluster of expensive, low spec nodes?
Just the Pi’s are $35 a pop, right? So that’s $1400 of Pi’s, on top of whatever the rest of the stuff costs. Wonder how it compares to, I guess, a whatever the price equivalent AMD workstation chip is…
ECC RAM is more robust (fewer crashes due to random bit flips), and OOB management means if a server has issues, you can remotely view it as if you were jacked in, and force reboot, among other things (like installing an OS remotely).
The Pi's will be using those 200Watts at near full tilt. The main use here would be larger computational tasks that you can easily split up among the blades. Or you run a very hardware-failure tolerant software service on top.
Hrm, interesting to see how the TDP of those 8358s drives overall power consumption. I'm looking at the idrac consoles of a couple R720XDs with 12 3.5" hdds, >128gb ram, two E5-2665s per, and they're all currently sipping ~150W at < 1 load average. The E5s have a TDP of 115W to the 8358's 250W, so I assume that's what's most of it. I admittedly do some special IPMI fan stuff, but that only shaves off tens of watts.
Umm, I'm not sure I can afford the electricity to run kit like that :)
I'm currently awaiting delivery of an Asus PN41 (w/ Celeron N5100) to use as yet another home server, after a recommendation from a friend. Be interesting to see how much it draws at idle!
I feel that way about the ClockworkPi consoles [1]
There's a 5% chance that I fall madly in love with this thing and go tinker on some project in a coffee shop every weekend... but it's much more likely that I end up almost never using it :|
Exactly that. I used to thumb through the back pages of Personal Computer World[0] under the covers as a kid looking at the palmtops. I think it's mostly nostalgia
Yeah that’s my thought. The main benefit to this is High Availability. You’re not going to get compelling scale-out performance, but you can protect yourself from local hardware failures.
Of course, then you have to ask if you need the density. There are lots of ways to put Rpi in a rack.. and this approach gives up Hat compatibility for density.
For example, I’m considering a rack of Rpi with hifi berry DACs for a multi-zone audio system. This wouldn’t help me there.
I don't feel like I have zero actual use for them. The amount of Docker containers I have running on my NAS is only ever going up. These could make for a nice, expandable Kubernetes cluster.
As for if that's a good use-case is a whole another thing.
That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back. Connect them to a jelly bean switch chip (8 port GbE are like $8 in qty) Signal integrity on, at most 4" of PCB trace should not be a problem. You could bring the network "port status" lines to the front if you're interested in seeing the blinky lights of network traffic.
The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.
Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.
Ah the fun I would have with a lab with an nice budget.
> That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back
That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.
Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.
Couldn't you do 1ki cores /4U with just Epyc CPUs in normal servers? At that point surely for cheaper, also significantly easier to build, and faster since the cores don't talk over Ethernet?
The Chinese made ones are even cheaper, open up a TP-Link "desktop 8 port Gigabit Switch" and you will find the current "leader" in that market. Those datasheets though will be in Chinese so it helps to be able to read Chinese. (various translate apps are not well suited to datasheets in my experience)
Yeah, I found the ones on mouser and digikey. $20 is a bit much (not for a one off, but if you are aggregating low end processors you will need a lot of them).
I'd love something like a 12-20 port 1Ge with a 10Ge uplink. If you find a super cheap 1Ge switch chip and docs (I suppose you could just reverse engineer the pcb from a tp-link switch), please post it.
No worries, the key though is cross section bandwidth. The "super cheap" GbE switch chips can have as little as 2.5 GBPS of cross section bandwidth which makes them ill suited for cluster operations.
Well for one, I'd build a system architecture I first imagined back at Sun in the early 90's which is a NUMA fabric attached compute/storage/io/memory scalable compute node.
Then I'd take a shared nothing cluster (typical network attached Linux cluster) and refactor a couple of algorithms that can "only" run on super computers and have them run faster on a complex that costs 1/10th as much. That would be based on an idea that was generated by listening to IBM and Google talk about their quantum computers and explaining how they were going to be so great. Imagine replacing every branch in a program with an assert that aborts the program on fail. You send 10,000 copies of the program to 10,000 cores with the asserts set uniquely on each copy. The core that completes kicks off the next round.
These would be awesome for build servers, and testing.
I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).
Multiple server vendors now have Ampere offerings. In 2U, you can have:
* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.
* lots of RAM, probably about 4TB ?
* ~92TB of flash storage (or more ?)
Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.
An open secret of the server hardware market: public prices mean nothing and you can get big discounts, even at low volume.
But of course the config I talked about is maxed-out and would probably be more expensive than 20k. It would be interesting to compare the TCO with an equivalent config, and I wouldn't be surprised to see the server hardware still win.
Wait this is pretty sick! What's the full build on that? How do you even get started on finding good cases that aren't just massive racks for a home build?
The case is "LZMOD A24 V3" - found it on caseend.com - there are smaller ITX cases, but I wanted to fit in standard components only, and not to mess with custom PSUs (for example).
The rest of the components are:
Board: AsRock Rack X570D4I-2T (2x 10GBe and IPMI!)
NVMe: 2TB Transcend TS2TMTE220S TLC
SSD: 2x 8TB Samsung 870 QVO
PSU: Seasonic SSP-300SUB (overkill, went for longevity)
CPU Cooling: Thermalright AXP-100 Series All-Copper Heatsink with Noctua NF-A12x15 PWM
Exhaust fans: 2x INEX AK-FN076 Slimfan 80mm PWM
On the air intake side, there's a filter sheet that I replace (or vacuum) once in a blue moon - the insides are still pristine after running for over a year now.
Interesting thing about cooling: one of those cases has a PSU with custom made cabling (reduced cables by about 90%). I was hoping it will reduce the temperatures a bit. Surprisingly there was basically no change. At full load all keep running at around 70 celsius.
Important: in such a small case, if you want silence you'd better disable AMD's "Core Performance Boost". This will make the CPU run at its nominal frequency, 3.4GHz for 5950x, otherwise it'll keep on jumping to it's max potential, 4.9GHz for 5950x, which will result in more heat, and more fan noise.
I bought a retired dual-socket Xeon HP 1U server on ebay with 128GB of ECC RAM for like $50 on ebay a while back. It only had one CPU, but upgrading it to two would be very cheap.
Sure, it's hulking, obsolete, and very loud beast, but it's hard to beat the price to performance ratio there... just make sure you don't put anything super valuable on it because HP's old proliant firmware likely has a ton of unpatched critical vulnerabilities (and you'd need an HP support plan to download patches even if they exist)
I picked up a HP 705 G4 mini on backmarket for $80 shipped the other day to run Home Assistant and some other small local containers. 500gb ram, Ryzen 5 2400GE, 8gb ddr4 w/ a valid windows license.
Sure it's not as small or silent, but there's no way to beat the prices of these few-years old enterprise mini-pc's
I spent some time last week tinkering with a SOQuartz board and ended up getting it working with a Pine-focused distro called Plebian[1].
Took awhile to land on it though. Before that I tried all of the other distros on Pine64's "SOQuartz Software Releases"[2] page without any luck. The only one on that page that booted was the linked "Armbian Ubuntu Jammy with kernel 5.19.7" but it failed to boot again after an apt upgrade.
So there's at least one working OS, as of last week. But its definitely quite finicky and would probably need some work to build a proper device tree for any carrier board that's not the RPi CM4 Carrier Board.
You can usually get an image that functions at least partially, but it's up to you to determine whether the amount it functions is enough for your use case. A K3s setup is usually good to go without some features like display output.
I've asked about that. There's a small possibility, but the earliest it would be able to happen (a batch of CM4 to offer as add-ons) would be summer, most likely :(
I don’t think these boards are meant for the way people are trying to use them. Mainline Linux support is actually great on RK3566 chips, but you have to build your own images with buildroot or something like that.
I have this cycle every 10 years where my home infra gets to enterprise level complexity (virtualisation/redundancy/HA) until the maintenance is more work than the joy it brings. Then, after some outage that took me way too long to fix, I decide it is over and I reduce everything down to a single modem/router and WiFi AP.
I feel the pull to buy this and create a glorious heap of complexity to run my doorbell on and be disapointed, can't wait.
If you want "hyperscale" in your homelab, the bare metal hypervisor needs to be x86-64 because unless you literally work for Amazon or a few others you are unlikely to be able to purchase other competitively priced and speedy arm based servers.
There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.
And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.
Supermicro launched a whole lineup of ARM-based servers last fall. They seem to mostly offer complete systems for now, but as far as I understand that's mostly because there's still some minor issues to iron out in terms of broader support.
I’ve been getting good price/perf just doing the top AMD consumer CPU’s. Wish someone would make an AM5 platform motherboard with out of band / remote console mgmt. that really is a must if you have a bunch of boxes and have them somewhere else. The per core speeds are high on these. 16 core / 32 threads/boxe gets you enough for a fair bit.
It’s a fantastic board spec. Timing with them on availability can take longer. If they can get VMware compatible would be great. Because they have dual 10g you need no network card in most cases. AM5 integrated graphics allows bring up / troubleshooting with no additional card (if remote console not working well). My use case I’d trade an extra m2 slot for a pci slot but can see their approach. These boards fit in nice compact setups as a result because u can run no pci and no hd setups
I’ve built a small rasp k3s cluster with pi4 and ssd. It works fine but one can ultimately still feel that they are quite weak. Or put differently deploying something on k3s still ends up deploying on a single node in most cases and this gets single node performance under most circumstances
I've been running a cluster like that since some years ago and definitely felt that, but it was easy to fix by adding AMD64 nodes to it
Modifying the services I'm working on to build multi-arch container images was not as straightforward as I imagined, but now I can take advantage of both ARM and AMD64 nodes on my cluster (plus I learned to do that, which is priceless)
It's amazing to see how far these systems have come since my coverage from The Verge in 2014, where I built a multi-node Parallella cluster. The main problem I had then was that there was no of the shelf GPU friendly library to run on it, so I ended up working with the Gray Chapel project to get some distributed vectorization support. Of course, that's all changed now.
It's not clear to me how to build a business based on RPi availability. And the clones don't seem to be really in the game. Are Raspberry Pis becoming more readily available? I don't see that.
I really want something like NVidia's upcoming Grace CPU in blade format, but something where I can provision a chunk of SSD storage off a SAN via some sort of PCI-E backplane. Same form factor like the linked project.
I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.
I have a few armada 8040 boards, and a couple raspberry pi's, but lets be real...
They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.
Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.
I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.
(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)
$60 per unit sounds pretty good. Does anyone have experience cross compiling to x86 from a cluster of Pis and can say how well it performs? A cheap and lower-power build farm sounds like an awesome thing to have in my house.
Yeah, I remember compiling for Pi with QEMU on my amd64 was infinite times faster than compiling on Pi itself. I think people don't understand a few things:
- running self-hosted services for 3.5 users doesn't take many resources and Pi can often handle multiple services
- Compilation is CPU heavy and I/O heavy operation, more memory you have on a single machine is better.
I use Pi as my on-the-go computer in places where I can't ssh to my home-server. Sometimes I can't even get projects indexed without language server being killed by OOM 20 minutes later (on my PC it takes <20 seconds to index).
I think these are fantastic, but I really wish it had a BMC so one could do remote management. I'd love for version 2 to have it so I could buy a bunch for my datacenter.
Love it, however, I'm skeptical of Raspberry Pi Foundation's claims that the CM4 supply will improve during 2023. It might improve for some, but as more novel solutions like these come up, the supply will never be enough.
Ok, still it would be nice to have a line that says this system can do X1 threads of X2 GFLOP/s and has a memory bandwidth of X3 MB/s, or something like that.
The clones based on the RK3588 are approaching last-Gen Qualcomm speeds, so they're not as much of a let down as the 2016-era chips the Pi is based on.
And efficiency is much better than the Intel or AMD chips you could get in a used system around the same price.
That's the speed of the NIC built into the CM4. If you want 2.5 or 5 Gbps, you'd have to add a PCIe switch, adding a lot more cost and complexity—and that would also remove the ability to boot off NVMe drives :(
Hopefully the next generation Pi has more PCIe lanes or at least a faster lane.
Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.
That I still had root on.
So I added a script that would renice the prof.'s jobs to be as slow as possible.
BOFH moment :)