Hacker News new | past | comments | ask | show | jobs | submit login
Hyperscale in your Homelab: The Compute Blade arrives (jeffgeerling.com)
136 points by mikece on Jan 24, 2023 | hide | past | favorite | 224 comments



I helped write parallelknoppix when I was an undergrad - our university's 2nd cluster ended up being a bunch of laptops with broken displays running it. Took me a whole summer.

Then the next semester I am denied the ability to take a parallel computing class because it was for graduate students only and the prof. would not accept a waiver even though the class was being taught on the cluster me and a buddy built.

That I still had root on.

So I added a script that would renice the prof.'s jobs to be as slow as possible.

BOFH moment :)


The school I went to had similar but more insane policies

* I frequently took computer science graduate courses and received only undergrad. credit because they could not offer the undergrad course

* Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate

I would like to point out that 300 and 400 level courses in the CS program usually had no more than 8 students. I distinctly remember meeting in a closet for one of my classes, because we had so few students they couldn't justify giving us a classroom.

Contrast that with the math department where I wanted to take some courses in parallel rather than serial. After a short conversation with the professor he said "ok sure, seems alright to me".


    Other majors were default prohibited from taking computer science courses under the guise of a shortage of places in classes. Even when those majors required a computer science course to graduate
I went to an institution that did the opposite; seats were reserved for non-cs majors despite a shortage of sections. This resulted in CS undergrads waiting for courses just so they could graduate. It was frustrating because it felt like the department was taking care of outsiders over its own.


I might be jumping to conclusions, but orgs which value revenue often prioritize new customers over existing (perhaps captive) customers.


Part of that is because “average class size” is used in metrics for the university ranking systems and every university out there wants to game those rankings.


Why do you guys think such things happen ?


My guess is they wanted you to pay the higher CS tuition to take the CS classes.

Math departments also tend to be a lot more lax in my experience. Case in point: I got sign-off to take a 300-level pure math class without its 300-level prereq. As well as just replace one required course for another to graduate.


Students aren't the business most universities think they are in.


Academia being inelastic and refusing to adapt in the face of market forces.


i think i've read this kind of reply before. and i was not wrong [1]. nice story to tell. too bad my uni. didn't have that kind of facilities and opportunities.

[1] https://news.ycombinator.com/item?id=34197024


Parallel knoppix sounds cool. Did the os's on each machine coordinate in any way at the kernel level? Or was it all user level libs/apps/services?


Almost certainly user level libraries. Coordinating at the kernel level is either a single system image[1] or a distributed operating system[2], which are very uncommon in scientific computing.

[1] https://en.wikipedia.org/wiki/Single_system_image [2] https://en.wikipedia.org/wiki/Distributed_operating_system


user level... tbh I'm surprised the whole thing worked at all!


Prof might’ve done you a favor. Seems like you didn’t need that class anyway.


I wanted the credit hours though :)


A “Snoop on to them, as they snoop on to us” moment. Good for you


That was nice of you!


One might even say that it had a very high nice value;)


Abusing admin priviledges like this confirms that it was probably a good idea to deny you access to the course, you were obviously too young.


Interesting. It's likely the point of constraining the graduate course to grad students had nothing to do with their maturity. But you see what you think is a sign of immaturity, and turn the constraint into a maturity filter.

I'm quite grown, and I wonder about the ownership/control of the cluster and why he didn't simply lock the professor out entirely, contingent on the approval of his waiver.

If anything, doing something as small as lowering the priority of his jobs instead of brazenly stonewalling him might be the sign of immaturity.


The computers were mostly the school's, the carpentry was ours (mine and a friend's), the cabling and network switches were ours (scavenged, eventually they bought a nice big switch), the labor was ours.

It's not really much of a prank war if you do a small prank and the other guy tries like hell to pretend he didn't notice and also tries like hell to pretend he didn't see you derping around in the building... escalating would have been evil on my part :)


Machiavellian... evil... semantics.


I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc) but lack of ECC and OOB management kinda disqualifies it for anything serious.

>He's running forty Blades in 2U. That's:

    >
    >      160 ARM cores
    >      320 GB of RAM
    >      (up to) 320 terabytes of flash storage
    >
>...in 2U of rackspace.

Yay that's like... almost as much as normal 1U server can do

Edit: I give up, HN formatting is idiotic


But does anyone remember the Beowulf trope(1) from Slashdot? Am I a greybeard now?

(1) https://hardware.slashdot.org/story/01/07/14/0748215/can-you...


So, I once met a guy named Don.

We were hanging out in the garage of a mutual friend, chatting. Got to the "what do you do" section of the conversation, and he says he works in massively parallel stuff at XYZ corp. Something something, GPUs.

I make the obvious "can you make a Beowulf cluster?" joke, to which he responds (after a pregnant pause), "you... do know who I am?"

Yep. Donald Becker. A slightly awkward moment, I'll cherish forever.


Ew. "Do you have any idea who I am?" is never a good look.


I like user big.ears' speculation on what someone could possibly do with that much parallel compute:

  I don't think there's any theoretical reason someone couldn't build a fairly realistic highly-complex "brain" using, say, 100,000,000 simplified neural units (I've heard of a guy in Japan who is doing such a thing), but I don't really know what it would do, or if it would teach us anything that is interesting.


Simplified neural unit? Less capacity than a human brain? Lame.


Lame, that just gives the opportunity to release for a higher price the Neural Unit Pro, Neural Unit Max, and Neural Unit ProMax. You consider it lame because you're acumen in business nuance is lame ;-)


You and me both. The funny thing is, I wound up writing a program that would benefit from clustering, and felt my way around setting up MPICH on my zoo. I laughed out loud when I realized that, after all these years, I'd built an impromptu Beowulf cluster, even though the machines are scattered around the house.

Installing MPICH from source instead of from your distribution is best if you can't have all your cluster members running the same version of the same distro and/or have multiple architectures to contend with. But it takes forever to compile, even on a fast machine.


That's more a COW, than a (Beo)wulf, no?

<https://www.oreilly.com/library/view/high-performance-linux/...>

A cluster of workstations (COW) is usually opportunistic exploiting existing systems, and lower density than a dedicated (usually rack-based or datacentre-based) cluster.

In practice, COWs usually turn out to be not especially useful, though there are exceptions.


well since you already have a cluster you could distcc the compilation. I remember doing that on a bunch of machines when we were building gentoo


To go along with "Imagine a Beowulf cluster of those!", don't forget "Take my money!"


You can get off my lawn now.


It remember building a 4 node Beowulf cluster out of discarded compaq desktops and then having no idea what to do with it.


75mhz, yeah! Stacked on top of each other! With 10mbit ethernet! I think we got OpenMosix going even.

But then 5 years later I was working on them for a living in HPC, but they were no longer called Beowulf Clusters then.


Did kinda the same thing but with Raspberry Pis. Neat, a cluster of r-pi's... now what?


If you want to continue the chain of specific goals in service of no specific purpose: run Kubernetes on it.


Then add ArgoCD for deployment and istio for a service mesh!

While you are at it, also setup Longhorn for storage. With that solved, you might as well start hosting Gitea and DroneCI on the cluster, plus an extra helm- and docker repo for good measure. And in no time you will have a full modern CI/CD setup to do nothing but updates on! :-)

Seriously, though, you will learn a lot of things in the process and get a bottom up view of current stacks, which is definitely helpful.


I did this. I am still dead inside. Thank goodness all my production shit has a managed control plane and network.


Alternatively, if you dont like the kube's learn yourself some erlang, and make a super fault tolerant application.


Step 1: run a page ranking algorithm using Naive Bayes on a bastardised HPC framework

Step 2: add advertising

Step 3: make more money than God.


Natalie Portman says yes, and instructs you to put some hot grits down your pants.


I do and you are. I’m also imagining one covered in hot grits…


Beowulf clusters were those lame things that didn’t have wireless, and had less space than a nomad, right?


Don't forget the Hot Grits & Natalie Portman.


I do but in all fairness, I have an entirely grey beard.


Grey! Mine is white.


I do. And cool research OSes that did process migration.


It's too bad Apple never bought Gerry Popek's LOCUS (https://en.wikipedia.org/wiki/LOCUS), which could do process migration between heterogeneous hardware!


Ah, Plan 9.


I'm not sure that Plan 9 does process migration out of the box. It does have complete "containerization" by default, i.e. user-controlled namespacing of all OS resources - so snapshotting and migration could be a feasible addition to it.

Distributed shared memory is another intriguing possibility, particularly since large address spaces are now basically ubiquitous. It would allow users to seamlessly extend multi-threaded workloads to run on a cluster; the OS would essentially have to implement memory-coherence protocols over the network.


If not Plan 9, then likely Inferno. (A pretty different system, of course.)


And God help us, OS2/Warp.


With much better tooling for OO ABI than COM/WinRT will ever get (SOM).


Sure do! It is interesting that these technologies evolve more slowly than it seems, sometimes.

On the graybearding of the cohort, here’s a weird one to me. These days, I mention slashdot and get more of a response from peers than mentioning digg!

In 2005, I totally thought digg would be around forever as the slashdot successor, but it’s almost like it never happened (to software professionals… er, graybeards)


New to me - found the source article in the Wayback Machine:

https://web.archive.org/web/20010715201416/http://www.scient...


Found the definition:

Sterling and his Goddard colleague Donald J. Becker connected 16 PCs, each containing an Intel 486 microprocessor, using Linux and a standard Ethernet network. For scientific applications, the PC cluster delivered sustained performance of 70 megaflops--that is, 70 million floating-point operations per second. Though modest by today's standards, this speed was not much lower than that of some smaller commercial supercomputers available at the time. And the cluster was built for only $40,000, or about one tenth the price of a comparable commercial machine in 1994.

NASA researchers named their cluster Beowulf, after the lean, mean hero of medieval legend who defeated the giant monster Grendel by ripping off one of the creature's arms. Since then, the name has been widely adopted to refer to any low-cost cluster constructed from commercially available PCs.


yeah, wanted to replicate something like that by proposing a hardware vendor who visited my uni. decades ago. didn't go nowhere because i was intimidated by the red-tapes.


But does it run Doom?


Crysis


What does Natalie Portman need to imagine a Beowulf cluster of Dooms running Crysis? Grits?


I do think this is sort of fool's gold in terms of actual performance. Even though the core count and RAM size is impressive, those cores are talking over ethernet rather than system bus.

Latency and bandwidth is atrocious in comparison, and you're going to run into problems like no individual memory allocation being able to exceed 8 Gb.

Like for running a hundred truly independent jobs then sure, maybe you'll get equivalent performance, but that's a very unique scenario that is rare in the real world.


I built such a toy cluster once to see for my self and gave up. It is too slow to do anything serious. You can be much better off by just buying older post lease server. Sure it will consume more power, but conversely you will finish more tasks in shorter time, so advantage of using ARM in that case may be negligible. If it was Apple's M1 or M2, that would have been a different story though. RPi4 and clones are not there yet.


I overall think people tend to underestimate the overhead of clustering. It's always significantly faster to run a computation on one machine than spread over N machines with hardware of (1/N) power.

That's not always a viable option because of hardware costs, and sometimes you want redundancy, but those concerns are on an orthogonal axis to performance.


Lines gets blurred when you are on a supercomputer interconnect and a global address space or even rdma


the fastest practical interconnects are roughly 1/10th the speed of local RAM. Because of that, if you use interconnect, you don't use it for remote RAM (through virtual memory).

I don't think anybody in the HPC business really pursued mega-SMP after SGI because it was not cost-effective for the gains.


Both Single System Image and giant NUMA machines were and are still pursued because not everything scales in shared-nothing message passing well (some stuff straddles it by doing distributed shared memory over MPI but using it mostly for synchronisation).

It's just that there's a range of very well paying problems that scale quite well in message passing systems, and this means that even if your problem scales very badly on them, you might have easier time brute forcing the task on larger but inefficient supercomputer rather than getting funding for smaller more efficient one that fits your problems better.


Cray did some vector machines that were globally addressed but not coherent. That’s an interesting direction. So is latency hiding.

The really important thing is that the big ‘single machine’ you’re talking about already has numa latency problems. Sharing a chassis doesn’t actually save you from needing to tackle latency at scale.


Well, a complete M1 board, which is basically about as large as half an iPhone mini, is fast enough. It's also super efficient. So I'm still waiting for Apple to announce their cloud.

They're currently putting Mx chips in every device they have, even the monitors. It'll be the base system for any electric device. I'm sure we'll see more specialized devices for different applications, because at this point, the hardware is compact, fast, and secure enough for anything, as well as the software stack.

Hello Apple Fridge


Fast enough for what?


>I do think this is sort of fool's gold in terms of actual performance.

It’s a fun toy for learning (and clicks, let’s be honest).

It’s not a serious attempt at a high performance cluster or an exercise in building an optimal computing platform.

Enjoy the experiment and the uniqueness of it. Nobody is going to be choosing this as their serious compute platform.


In TFA, isn't Jetbrains using it as a CI system?


Tangential, but it is so funny to me that “TFA” has become a totally polite and normal way to refer to the linked article on this site. Expanding that acronym would really change the tone!


I'm not sure it is 'totally polite'? I usually read it as having a 'did you even open it' implication that 'OP' or 'the submission' doesn't. Maybe that's just me.


Maybe it isn’t totally polite, but it IMO it reads in this case more like slight correction than “In the fucking article,” which would be pretty aggressive, haha.


I always thought it meant “the featured article”; I must be a lot more wholesome than previously indicated!


Unless they need something Pi specific I don't understand why this would be preferable versus just virtualizing instances on a "big ARM" server. I'm sure those exist.


It probably lends itself to tasks where CPU time is much greater than network round trip. Maybe scientific problems that massively parallel. Way back in the 90s I worked with plasma physics guys that used a parallel system on "slow" Sun boxes. I can't remember the name of the software though.


It's actually 3U since the 2U of 40 pis will need almost an entire 1U 48 port PoE switch instead of plugging into the TOR. The switch will use 35-100W for itself depending on features and conversion losses. If each pi uses more than 8-9W or so under load, then you might actually need a second PoE switch.

If you are building full racks, it probably makes more sense to use ordinary systems, but if you want to have a lot of actual hardware isolation at a smaller scale, it could make sense.

In some colos, they don't give you enough power to fill up your racks, so the low energy density wouldn't be such a bummer there.


> Yay that's like... almost as much as normal 1U server can do

Hyperscale in your Homelab. Something to hack on, learn, host things like Jellyfin, and have fun with.


I agree but can't you get the same effect with VMWare ESXi? If I just wanted to "have fun" managing scores of tiny computers, and I emphasize that this sounds like the least amount of fun anyone could have, I can have as many virtual machines as I want.


I can understand why some people want something physical/tangible while testing or playing in their hobby environment. I'm still a fan of virtualization - passmark scores for an RPi4 (entire SOC/quad core) are 21 times less than a per-single-core comparison in a 14-core 15-13600k (as a point of reference, my current system) and while am running 64GB RAM, can easily upgrade to 128GB or more on a single DDR4 node.

Hard to see to an advantage given obvious limitations, although it may make it more fun to work within latency and memory constrictions, I guess.


Haha jellyfin would eat through all your memory and cpu time transcoding or remuxing on a SBC. I'm seriously thinking of getting another home server just to run that.


It's ~~four~~ two spaces to get the "code block" style.

    like
    this
and asterisk for italics (I don't think there is a 'quote' available, and I'm not sure how they play together.

* does this work? * Edit: No! Haha

    *how*
    *about*
    *this*
Edit: No, no joy there either.

I agree, it's not the most intuitive formatting syntax I've come across :)

I guess we're stuck with BEGIN_QUOTE and END_QUOTE blocks!



The code block was workaround

I wanted to do this

    > * list element 1
    > * list element 2
without the indent.

I don't get why it just doesn't use common mark or something, it's just some inept, half-assed clone of it anyway.


> but lack of ECC and OOB management kinda disqualifies it for anything serious.

> Yay that's like... almost as much as normal 1U server can do

It’s a fun toy. Obviously it isn’t the best or most efficient way to get any job done. That’s not the point.

Enjoy it for the fun experiment that it is.


> I always wanted such thing for various "plumbing" services (DHCP/DNS/wifi controller etc)

You don't need a cluster for that, even a 1st gen Pi can run those services without any problem.


I can only speak for Raspi 3B+, but I agree.

I have multiple services running on it (including pihole, qbittorrent, vpn) and it's at about 40% mem usage right now.


I need multiple nodes for redundancy, not because it can't fit on one. And preferably at least 3 for quorum too.


Indeed, that box here next to my desk draws 50W of electricity continuously despite being mostly idle. Why? Because it has ECC.

Having some affordable low power device with ECC would be a game changer for me.

I added affordable to exclude expensive (and noisy) workstation class laptops with ECC RAM.


There are Intel Atom CPUs that support ECC. I had a Supermicro motherboard with a quad core part like that and I used it as a NAS. It was not that fast, but the power consumption was very low.


Do you remember how many Watts it was using with idle disks?


I personally have at 43-45W idle…

    >Corsair SF450 PSU
    >ASRock Rack X570D4U w/BMC
    >AMD Ryzen 7 Pro 5750GE (8C 3.2/4.6 GHz)
    >128GB DDR4-2666 ECC
    >Intel XL710-DA1 (40Gbps)
    >LSI/Broadcom 9500-8i HBA
    >64GB SuperMicro SATA DOM
    >2 SK Hynix Gold P31, 2TB NVMe SSD
    >8 Hitachi 7200rpm, 16TB HDD
    >3 80mm fans, 2 40mm fans, CPU cooler
That was an at the time modern “Zen 3” (using Zen 2 cores) system on an X570 chipset. The CPU mostly goes in 1L ultra SFF systems. TDP is 35W, and under stress testing the CPU tops out around around 38.8-39W. The onboard BMC is about 3.2-3.3W of power consumption itself.

Most data ingest and reads comes from the SSD cache, with that being more around 60W for high throughput. Under very high loads (saturating the 40Gbps link) with all disks going, only hits about 110-120W.

By comparison, a 6-bay Synology was over double that idle power consumption, and couldn’t come close to that throughput.


thanks for the parts list, especially because I think ASRock Rack paired with a Ryzen Pro offers better performance than a Supermicro in the same price range.


There’s reasons for that though.

I could drop a few more watts if ASRock could put together a decent BIOS where disabling things actually disables things.

SuperMicro costs what it does for a reason.

—- ————-

If you’re looking for a chassis, I’m using a SilverStone RM21-308, with a Noctua NH-L9a-AM4 cooler, and cut some SilverStone sound deadening foam for the top panel of the 2U chassis.

Aside from disks clicking, it’s silent, runs hilariously cool (I 3D printed chipset and HBA fan mounts at a local library) and it’s more usable storage, higher performance (saturates 40Gbps trivially) and lower power consumption than anything any YouTuber has come remotely close to. That server basically lets me have everything else in my rack not care much about storage, because the storage server handles it like a champ. I really considered doing a video series on it, but I’m too old to want to deal with the peanut gallery of YouTube comments.


If you don't mind me asking, how do your other workloads access the storage on it, NFS? The stumbling block for NFS for me is identity and access management.


Wow I just picked up an ASRock Rack X570D4U and put my 5950X into it.

Do you know how to make the BMC not a laggy mess when using the “H5Viewer”? I’m getting basically unusable latency when the system is two yards away compared to a RDP server 1,000 miles away.


That's impressively low, considering the amount of storage capacity and the performance potential for the time you need it. It goes a long way towards paying for itself if you replace some old Xeon server with it.


It was this board: https://www.supermicro.com/en/products/motherboard/a2sdi-2c-...

I think it was idling at something like 30-40W with four HDDs and a UPS. I didn't have an especially efficient PSU and the UPS must have taken some power too. The motherboard alone would draw as little as 15W, I suppose.


Most AMD desktop platforms support ECC, and if you don't use overclocking facilities, they are pretty efficient (though their chiplet architecture causes idle power draw to be a good fraction of active power draw, still much less than 50W though).


> Why? Because it has ECC

Sorry if I am missing the obvious here, but why would ECC consume so much power?


It's not that ECC consumes power, it's that systems that support ECC tend to consume more idle power (because they're larger etc.)


How much RAM is that with? My home server idles at ~25-27W, but that's with only 16GB (EEC DDR4). However, throwing in an extra 16GB as a test didn't measurably change the reading.


Xeon-D series?


Epyc Embedded and possibly some Ryzen Embedded devices.


Didn't AMD announce a 96 core processor with dual socket support?

As usual this is either done for entertainment value or to simulate physical networks (not clusters).


Intel also has now up to 480 cores in an 8-socket server with 60 cores per socket, though Sapphire Rapids is handicapped in comparison with AMD Genoa by much lower clock frequencies and cache memory sizes.

However, while the high-core-count CPUs have excellent performance per occupied volume and per watt, they all have extremely low performance per dollar, unless you are able to negotiate huge discounts, when buying them by the thousands.

Using multiple servers with Ryzen 9 7950X can provide a performance per dollar many times higher than that of any current server CPU, i.e. six 16-core 7950X with a total of 384 GB of unbuffered ECC DDR5-4800 will be both much faster and much cheaper than one 96-core Genoa with 384 GB of buffered ECC DDR5-4800.

Nevertheless, the variant with multiple 7950X is limited for many applications by either the relatively low amount of memory per node or by the higher communication latency between nodes.

Still, for a small business it can provide much more bang for the buck, when the applications are suitable for being distributed over multiple nodes (e.g. code compilation).


This is the exact space I’m in, high cpu low network. By my estimates it’s about 1/4 the cost per CPU operation to use consumer hardware instead of enterprise. The extra computers allow for application level redundancy so the other components can be cheaper as well.


One problem with 480 cores in single node: 480 cores is a shitload of cores, who needs more than a single node at this point? The MPI programmer inside me is having an existential breakdown.


I think the hardware isolation would be a selling point in some cases. Granted, it's niche.


In most cases it is wrong choice unless your business is selling raspberry pi hosting I guess


> Yay that's like... almost as much as normal 1U server can do

...but the normal server is much cheaper.



its a nice hobby project, but of course a commercial blade system will have far higher compute density. supermicro can do 20 epyc nodes in 8u, which at 64 cores per node is 1280 cores in 8u, or 160 in 1u, so double the core density, and far more powerful cores, so way higher effective compute density.


Also not noted: 320 TB in 40 M.2 drives will be extremely expensive. Newegg doesn't have any 8 TB M.2 SSDs under $1000. $0.12/GB is about twice as expensive as more normally-sized drives, to say nothing of the price of spinning rust.


> Yay that's like... almost as much as normal 1U server can do

What about cost, and other metrics around cost (power usage, reliability)? If space is the only factor we care about then it seems like a loss.


What about them? 1U servers from vendors are reliable and efficient - people use them in production for years. As for the cost, those hobby-style board are very expensive for dollars/performance. Indeed I'm not getting why would one want a cluster of expensive, low spec nodes?


Just the Pi’s are $35 a pop, right? So that’s $1400 of Pi’s, on top of whatever the rest of the stuff costs. Wonder how it compares to, I guess, a whatever the price equivalent AMD workstation chip is…


It seems they're the ones with 8 GB of ram, so probably closer to $75 each.


I’d be interested to see if anyone had any application other than CI for Raspberry Pi programs, I really can’t see one.


What are you talking about "lack of ECC?" The Pi4 has ECC.


Didn't knew that! Would still prefer one that actually reports the errors tho


That would be 40x (Rpi4 8GB $75 + 8TB nvme $1200 + psu and others) ~ $51000.


> lack of ECC and OOB management kinda disqualifies it

Can you expand on this please?


ECC RAM is more robust (fewer crashes due to random bit flips), and OOB management means if a server has issues, you can remotely view it as if you were jacked in, and force reboot, among other things (like installing an OS remotely).


The 1U server is however likely to use more than 200 Watts of power that the 40 Blade 2U setup would use.


> The 1U server is however likely to use more than 200 Watts of power

Q: Why would a 1U server need more than 200W if you're doing nothing more than basic network services?

I have mini tower servers that draw a fraction of that at idle.


The Pi's will be using those 200Watts at near full tilt. The main use here would be larger computational tasks that you can easily split up among the blades. Or you run a very hardware-failure tolerant software service on top.


I have some idle Dell R650's that draw 384W. A couple drives, buncha RAM, two power supplies, 2 CPU's (Xeon 8358)


Hrm, interesting to see how the TDP of those 8358s drives overall power consumption. I'm looking at the idrac consoles of a couple R720XDs with 12 3.5" hdds, >128gb ram, two E5-2665s per, and they're all currently sipping ~150W at < 1 load average. The E5s have a TDP of 115W to the 8358's 250W, so I assume that's what's most of it. I admittedly do some special IPMI fan stuff, but that only shaves off tens of watts.


> Dell R650's that draw 384W

Umm, I'm not sure I can afford the electricity to run kit like that :)

I'm currently awaiting delivery of an Asus PN41 (w/ Celeron N5100) to use as yet another home server, after a recommendation from a friend. Be interesting to see how much it draws at idle!


Ah, do you feel it too? That need to own some of these, even though you have zero actual use for them.


Nothing generates that feeling for me like seeing these things:

https://store.planetcom.co.uk/products/gemini-pda-1

I absolutely can't imagine what I'd use it for, and yet, my finger has hovered over "buy" many many times over the last few years


I feel that way about the ClockworkPi consoles [1]

There's a 5% chance that I fall madly in love with this thing and go tinker on some project in a coffee shop every weekend... but it's much more likely that I end up almost never using it :|

[1] https://www.clockworkpi.com/shop


Reminds me of the Psion Series 5 which I owned more than twenty years ago... and even then, had little use for. :^) https://en.wikipedia.org/wiki/Psion_Series_5


Exactly that. I used to thumb through the back pages of Personal Computer World[0] under the covers as a kid looking at the palmtops. I think it's mostly nostalgia

0: https://en.wikipedia.org/wiki/Personal_Computer_World


Good times, good times.



Doesn't look too far off from the Pinephone with it's keyboard


I think I could justify the world's most secure and reliable Home Assistant cluster with automatic failover...


Frankly, the bar for that is pretty low...


Yeah that’s my thought. The main benefit to this is High Availability. You’re not going to get compelling scale-out performance, but you can protect yourself from local hardware failures.

Of course, then you have to ask if you need the density. There are lots of ways to put Rpi in a rack.. and this approach gives up Hat compatibility for density.

For example, I’m considering a rack of Rpi with hifi berry DACs for a multi-zone audio system. This wouldn’t help me there.


I don't feel like I have zero actual use for them. The amount of Docker containers I have running on my NAS is only ever going up. These could make for a nice, expandable Kubernetes cluster.

As for if that's a good use-case is a whole another thing.


That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back. Connect them to a jelly bean switch chip (8 port GbE are like $8 in qty) Signal integrity on, at most 4" of PCB trace should not be a problem. You could bring the network "port status" lines to the front if you're interested in seeing the blinky lights of network traffic.

The big win here would be that all of the network wiring is "built in" and compact. Blade replacement it trivial.

Have your fans blow up from the bottom and stagger "slots" on each row and if you do 32 slots per row, you probably build a kilocore cluster in a 6U box.

Ah the fun I would have with a lab with an nice budget.


> That is a neat setup. I wish someone would do this but just run RMII out to an edge connector on the back

That stuck out to me too, they are making custom boards and custom chassis, surely it would be cleaner to route the networking and power through backplane instead of having gazillion tiny patch cables and random switch just hanging in there. Could also avoid the need for PoE by just having power buses in the backplane.

Overall imho the point of blades is that some stuff gets offloaded to the chassis, but here the chassis doesn't seem to be doing much at all.


Couldn't you do 1ki cores /4U with just Epyc CPUs in normal servers? At that point surely for cheaper, also significantly easier to build, and faster since the cores don't talk over Ethernet?


> jelly bean switch chip

What do you have in mind? I couldn't find this part. Really am asking.


Exemplar: https://octopart.com/ksz9477stxi-microchip-80980651

The Chinese made ones are even cheaper, open up a TP-Link "desktop 8 port Gigabit Switch" and you will find the current "leader" in that market. Those datasheets though will be in Chinese so it helps to be able to read Chinese. (various translate apps are not well suited to datasheets in my experience)


Thank you.

Yeah, I found the ones on mouser and digikey. $20 is a bit much (not for a one off, but if you are aggregating low end processors you will need a lot of them).

I'd love something like a 12-20 port 1Ge with a 10Ge uplink. If you find a super cheap 1Ge switch chip and docs (I suppose you could just reverse engineer the pcb from a tp-link switch), please post it.


No worries, the key though is cross section bandwidth. The "super cheap" GbE switch chips can have as little as 2.5 GBPS of cross section bandwidth which makes them ill suited for cluster operations.


What kind of fun might that be?


Well for one, I'd build a system architecture I first imagined back at Sun in the early 90's which is a NUMA fabric attached compute/storage/io/memory scalable compute node.

Then I'd take a shared nothing cluster (typical network attached Linux cluster) and refactor a couple of algorithms that can "only" run on super computers and have them run faster on a complex that costs 1/10th as much. That would be based on an idea that was generated by listening to IBM and Google talk about their quantum computers and explaining how they were going to be so great. Imagine replacing every branch in a program with an assert that aborts the program on fail. You send 10,000 copies of the program to 10,000 cores with the asserts set uniquely on each copy. The core that completes kicks off the next round.


These would be awesome for build servers, and testing.

I really like Graviton from AWS, and Apple Silicon is great, I really hope we move towards ARM64 more. ArchLinux has https://archlinuxarm.org , I would love to use these to build and test arm64 packages (without needing to use qemu hackery, awesome though that it is).


Multiple server vendors now have Ampere offerings. In 2U, you can have:

* 4 Ampere Altra Max processors (in 2 or 4 servers), so about 512 cores, and much faster than anything those Raspberry Pi have.

* lots of RAM, probably about 4TB ?

* ~92TB of flash storage (or more ?)

Edit : I didn't want to disparage the compute blade, it looks like a very fun project. It's not even the same use case as the server hardware (and probably the best solution if you need actual raspberry pis), the only common thread is the 2U and rack use.


those things are insanely expensive though, I priced a 2core machine at 20,000 EUR without much ram or SSDs.

I'm keeping my eyes open though.


An open secret of the server hardware market: public prices mean nothing and you can get big discounts, even at low volume.

But of course the config I talked about is maxed-out and would probably be more expensive than 20k. It would be interesting to compare the TCO with an equivalent config, and I wouldn't be surprised to see the server hardware still win.


Try the HPE RL300, should be more reasonably priced but I couldn't get a quote because availability seems to be problematic at the moment.


This looks very promising. I basically could print an enclosure to specifically fit my home space. And easily print a new one when I move.

More efficient use of space compared to my current silent mini-home lab -- also about 2U worth of space, but stacked semi-vertically [1].

That's 4 servers each with AMD 5950x, 128GB ECC, 2TB NVMe, 2x8TB SSD (64c/512GB/72TB total).

[1] https://ibb.co/Jm1SX7d


Wait this is pretty sick! What's the full build on that? How do you even get started on finding good cases that aren't just massive racks for a home build?


The case is "LZMOD A24 V3" - found it on caseend.com - there are smaller ITX cases, but I wanted to fit in standard components only, and not to mess with custom PSUs (for example).

The rest of the components are:

Board: AsRock Rack X570D4I-2T (2x 10GBe and IPMI!)

NVMe: 2TB Transcend TS2TMTE220S TLC

SSD: 2x 8TB Samsung 870 QVO

PSU: Seasonic SSP-300SUB (overkill, went for longevity)

CPU Cooling: Thermalright AXP-100 Series All-Copper Heatsink with Noctua NF-A12x15 PWM

Exhaust fans: 2x INEX AK-FN076 Slimfan 80mm PWM

On the air intake side, there's a filter sheet that I replace (or vacuum) once in a blue moon - the insides are still pristine after running for over a year now.

Interesting thing about cooling: one of those cases has a PSU with custom made cabling (reduced cables by about 90%). I was hoping it will reduce the temperatures a bit. Surprisingly there was basically no change. At full load all keep running at around 70 celsius.

Important: in such a small case, if you want silence you'd better disable AMD's "Core Performance Boost". This will make the CPU run at its nominal frequency, 3.4GHz for 5950x, otherwise it'll keep on jumping to it's max potential, 4.9GHz for 5950x, which will result in more heat, and more fan noise.


Not all racks are massive, I use this 6U one for home build. Can be mounted or put on wheels: https://www.amazon.com/gp/product/B01K1JJHTO


The blade has arrived but can you get a compute unit to go in it? The non availability of the whole pi ecosystem has done a lot of damage.


The Rock5B is whipping the Pi on compute power and availability. Only use a Pi if you absolutely have to.


At $150+ I would just buy an old small form factor dell / hp from ebay and have a whole machine.


I bought a retired dual-socket Xeon HP 1U server on ebay with 128GB of ECC RAM for like $50 on ebay a while back. It only had one CPU, but upgrading it to two would be very cheap.

Sure, it's hulking, obsolete, and very loud beast, but it's hard to beat the price to performance ratio there... just make sure you don't put anything super valuable on it because HP's old proliant firmware likely has a ton of unpatched critical vulnerabilities (and you'd need an HP support plan to download patches even if they exist)


100% this.

I picked up a HP 705 G4 mini on backmarket for $80 shipped the other day to run Home Assistant and some other small local containers. 500gb ram, Ryzen 5 2400GE, 8gb ddr4 w/ a valid windows license.

Sure it's not as small or silent, but there's no way to beat the prices of these few-years old enterprise mini-pc's


There are other CM-compatible SoMs.

Like the Pine64 SOQUARTZ


Geerling covers this in the accompanying video for this post. He couldn't get it running due to no working OS images being obtainable.


I spent some time last week tinkering with a SOQuartz board and ended up getting it working with a Pine-focused distro called Plebian[1].

Took awhile to land on it though. Before that I tried all of the other distros on Pine64's "SOQuartz Software Releases"[2] page without any luck. The only one on that page that booted was the linked "Armbian Ubuntu Jammy with kernel 5.19.7" but it failed to boot again after an apt upgrade.

So there's at least one working OS, as of last week. But its definitely quite finicky and would probably need some work to build a proper device tree for any carrier board that's not the RPi CM4 Carrier Board.

[1] https://github.com/Plebian-Linux/quartz64-images

[2] https://wiki.pine64.org/wiki/SOQuartz_Software_Releases


So you can get those compute units are obtainable, but a functioning image remains unobtanium. What a mess.


You can usually get an image that functions at least partially, but it's up to you to determine whether the amount it functions is enough for your use case. A K3s setup is usually good to go without some features like display output.


I like to tinker, but there is a limit.

The killer feature for their go fund would be is if they sourced a batch of pi compute modules ...


I've asked about that. There's a small possibility, but the earliest it would be able to happen (a batch of CM4 to offer as add-ons) would be summer, most likely :(


I don’t think these boards are meant for the way people are trying to use them. Mainline Linux support is actually great on RK3566 chips, but you have to build your own images with buildroot or something like that.


I have this cycle every 10 years where my home infra gets to enterprise level complexity (virtualisation/redundancy/HA) until the maintenance is more work than the joy it brings. Then, after some outage that took me way too long to fix, I decide it is over and I reduce everything down to a single modem/router and WiFi AP. I feel the pull to buy this and create a glorious heap of complexity to run my doorbell on and be disapointed, can't wait.


I love the form factor. But please. For the love of god. We need something with wide availability that supports at least ARMv8.2.

At this rate I have so little hope in other vendors that we'll probably just have to wait for the RPi5.


If you want "hyperscale" in your homelab, the bare metal hypervisor needs to be x86-64 because unless you literally work for Amazon or a few others you are unlikely to be able to purchase other competitively priced and speedy arm based servers.

There is still near zero availability in mass market for CPUs you can stick into motherboards from one of the top ten taiwanese vendors of serious server class motherboards.

And don't even get me started on the lack of ability to actually buy raspberry pi of your desired configuration at a reasonable price and in stock to hit add to cart.


Supermicro launched a whole lineup of ARM-based servers last fall. They seem to mostly offer complete systems for now, but as far as I understand that's mostly because there's still some minor issues to iron out in terms of broader support.


I’ve been getting good price/perf just doing the top AMD consumer CPU’s. Wish someone would make an AM5 platform motherboard with out of band / remote console mgmt. that really is a must if you have a bunch of boxes and have them somewhere else. The per core speeds are high on these. 16 core / 32 threads/boxe gets you enough for a fair bit.


Have you taken a look at any of AsrockRack's offerings? They've got some prelim 650 mATX boards: https://www.asrockrack.com/general/productdetail.asp?Model=B...


It’s a fantastic board spec. Timing with them on availability can take longer. If they can get VMware compatible would be great. Because they have dual 10g you need no network card in most cases. AM5 integrated graphics allows bring up / troubleshooting with no additional card (if remote console not working well). My use case I’d trade an extra m2 slot for a pci slot but can see their approach. These boards fit in nice compact setups as a result because u can run no pci and no hd setups


I’ve built a small rasp k3s cluster with pi4 and ssd. It works fine but one can ultimately still feel that they are quite weak. Or put differently deploying something on k3s still ends up deploying on a single node in most cases and this gets single node performance under most circumstances


I've been running a cluster like that since some years ago and definitely felt that, but it was easy to fix by adding AMD64 nodes to it

Modifying the services I'm working on to build multi-arch container images was not as straightforward as I imagined, but now I can take advantage of both ARM and AMD64 nodes on my cluster (plus I learned to do that, which is priceless)


It's amazing to see how far these systems have come since my coverage from The Verge in 2014, where I built a multi-node Parallella cluster. The main problem I had then was that there was no of the shelf GPU friendly library to run on it, so I ended up working with the Gray Chapel project to get some distributed vectorization support. Of course, that's all changed now.

https://www.theverge.com/2014/6/4/5779468/twitter-engineer-b...


It's not clear to me how to build a business based on RPi availability. And the clones don't seem to be really in the game. Are Raspberry Pis becoming more readily available? I don't see that.


Businesses and consumers don't see the same availability, apparently. And yes, they are very slowly becoming more available. But still no Pi 4 about.


Correct. These are for hobbyists and there is no market.


I really want something like NVidia's upcoming Grace CPU in blade format, but something where I can provision a chunk of SSD storage off a SAN via some sort of PCI-E backplane. Same form factor like the linked project.

I'm noticing that our JVM workloads execute _significantly_ faster on ARM. Just looking at the execution times on our lowly first-gen M1s Macbooks is significantly better than some of our best Intel or AMD hardware we have racked. I'm guessing it all has to do with Memory bandwidth.


Apple should go with a blade design for the Mac Pro. Just stick in as many M2 Ultra blades as you need to up the compute and memory.

Will need to deal with NUMA issues on the software side.


I would be all over any server like form factor for M-series chips. The efficiency numbers for the CPU are great.


I have a few armada 8040 boards, and a couple raspberry pi's, but lets be real...

They're not going to get maximum performance from a nvme disk, the cpus are too slow, and gigabit isn't going to cut it for high throughput applications.

Until manufacturers start shipping boards with ~32 cores clocked faster than 2ghz and multiple 10gbit connections, they're nothing more than a fun nerd toy.


Been waiting for this for over a year, was the first person to buy a pre-purchase sample. Planning to set up a PXE k3s cluster.


This looks cool!

I would, however, say that while I'm in the general target audience, I won't do crowdfunded hardware. If it isn't actually being produced, I won't buy it. The road between prototype and production is a long one for hardware.

(Still waiting for a very cool bit of hardware, 3+ years later - suspecting that project is just *dead*)


> 160 ARM cores

> 320 GB of RAM

Depending how you feel about hyperthreading, there are commodity dual-CPU Xeon setups than can do this as well.


$60 per unit sounds pretty good. Does anyone have experience cross compiling to x86 from a cluster of Pis and can say how well it performs? A cheap and lower-power build farm sounds like an awesome thing to have in my house.


Most likely, your amd64 CPU is much faster than all those tiny Pi cores. Add to that network latency...


Especially for x86 cross compiling from ARM. Typically people do the reverse, because outside of Graviton and M-series, X86 is generally a lot faster.


Yeah, I remember compiling for Pi with QEMU on my amd64 was infinite times faster than compiling on Pi itself. I think people don't understand a few things:

- running self-hosted services for 3.5 users doesn't take many resources and Pi can often handle multiple services

- Compilation is CPU heavy and I/O heavy operation, more memory you have on a single machine is better.

I use Pi as my on-the-go computer in places where I can't ssh to my home-server. Sometimes I can't even get projects indexed without language server being killed by OOM 20 minutes later (on my PC it takes <20 seconds to index).


I want to build a fan out of them.

Probably with one of these in the middle: http://www.mercotac.com/html/830.html


This is a lot cheaper, more silent and smaller:

http://move.rupy.se/file/cluster_client.png


I think these are fantastic, but I really wish it had a BMC so one could do remote management. I'd love for version 2 to have it so I could buy a bunch for my datacenter.


There's no backplane - all power and communication goes through a front-facing ethernet port. Kind of defeats the purpose of a blade form factor IMO.


It's too bad ARM boards are so expensive, it makes them nearly pointless for projects unless you need the GPIO.


This is cool. But it's super hard to compete w/ a computer you bought off craigslist for 25$.


I'd like to buy a laptop that's also a fault tolerant cluster.


Love it, however, I'm skeptical of Raspberry Pi Foundation's claims that the CM4 supply will improve during 2023. It might improve for some, but as more novel solutions like these come up, the supply will never be enough.


How do we measure the performance of these kinds of systems?


Not super fast but efficiency is okay: https://github.com/geerlingguy/top500-benchmark#results


The Blade is just a carrier for a Raspberry Pi CM4, so the performance will be that of a normal CM4.


Ok, still it would be nice to have a line that says this system can do X1 threads of X2 GFLOP/s and has a memory bandwidth of X3 MB/s, or something like that.


Unfortunately if you are asking that question the answer for all the Pi's and clones is "Not enough by more then an order of magnitude".


The clones based on the RK3588 are approaching last-Gen Qualcomm speeds, so they're not as much of a let down as the 2016-era chips the Pi is based on.

And efficiency is much better than the Intel or AMD chips you could get in a used system around the same price.


"last-Gen Qualcomm speeds" that's a pretty low bar...


Heh... but at least it's a lot higher than "current-Gen Pi"


You can look at benchmarks for the rpi cm4 for that



These are pi's right? No hardware AES :/


Why only 1 Gbps ethernet?


That's the speed of the NIC built into the CM4. If you want 2.5 or 5 Gbps, you'd have to add a PCIe switch, adding a lot more cost and complexity—and that would also remove the ability to boot off NVMe drives :(

Hopefully the next generation Pi has more PCIe lanes or at least a faster lane.


Yes, this, more of this!


A


A




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: