
Ask HN: Suggestions for low cost homebrew “HPC” beast? - amuresan
Over the past few years I&#x27;ve been playing around with the idea of building large number crunching applications for fun and personal research. While the applications I had in mind aren&#x27;t trivially parallel, they should map well to a parallel architecture.<p>There are generally two options for running a big parallel application:
1. going to a commercial provider, which is prohibitively expensive;
2. affiliation to a research lab, which is not the case for me and usually requires a research project for you to get compute time, no playing around with toy application.<p>I was wondering if anyone has experience with building a small homebrew number crunching machine on a budget. The best idea that I have so far is getting a few used server machines with decent GPUs.
======
anujsharmax
There is no one thing as "HPC". It is a lot of normal computer components
joined together to form a big "HPC" system. First, identify which kind of
scaling you need - are you limited by CPU or memory if you run the code on
your PC?

On a side note, please don't think about buying HPC hardware before actually
write the code. You can build the code to solve real problems on your normal
PC with a multi core CPU and GPU. Then you can benckmark your parallel code to
determine what you acutally need.

I could help you out if you can give me more details of what you actually want
to achieve.

------
stuxnet79
The one factor that distinguishes a typical HPC installation from what we
would typically call a 'cloud' system is the throughpout. HPC installations
are designed to provide maximum throughput and this requires very specialized
hardware and networking infrastructure (InfiniBand). Cloud installations in
contrast make use of widely available commodity hardware (e.g. Raspberry Pis).
Unless the tasks you want to do will require an enormous amount of throughput
for inter-node messaging within your cluster, then I'd suggest orienting your
plans towards the 'cloud' model aka using commodity hardware and taking
advantage of an open source cloud platform like OpenStack.

------
closeparen
If you don’t need to leave it running between computations, this seems like a
reasonable EC2/GCP use case.

University scientific computing labs cost millions of dollars; anything you’re
going to build and run at home will be a small-scale replica.

~~~
Const-me
> anything you’re going to build and run at home will be a small-scale
> replica.

If you’re OK with the performance of a top-tanking 2010 supercomputer, and the
task is a good fit for GPUs (=the numbers being crunched are single-precision
floats, not too much bandwidth is required, the “hot” set of the data is
within a few GB i.e. fits in VRAM), you can build one at home spending very
reasonable amount of cash.

Here’s a top 500 list for June 2010:
[https://www.top500.org/lists/2010/06/](https://www.top500.org/lists/2010/06/)
As you see, the system at #10 delivers 433 TFlops. That’s just 40 modern
GeForce 1080Ti GPUs. What once costed many millions of dollars and consumed 4
megawatts of electricity, now costs $40k and consumes little over 10KW.

And because cryptocurrency craziness, many-GPU motherboards, cables, PSUs and
other components are available on the market and aren’t too expensive. For
example, Asus offers B250 mining motherboard with 19 PCIe slots for $140.

------
Const-me
Depends on what exactly you gonna crunch.

If you're OK running on a GPUs, you essentially need a cryptocurrency mining
rig, there lots of articles about them.

If you need CPU performance look for used servers on eBay, either complete
ones or separate components.

------
itamarst
You can get very far with just a normal desktop computer with a bunch of CPUs.
E.g. [http://veekaybee.github.io/2017/03/20/hadoop-or-
laptop/](http://veekaybee.github.io/2017/03/20/hadoop-or-laptop/)

------
morphle
A low cost HPC number cruncher implies you seek the highest performance at its
lowest price. The energy used over its lifetime will be the main factor. The
capital cost of the processor will probably be less important than the price
for the bulk memory.

First you should identify the price performance per watt of the systems
(processor+memory+network) over their lifetime in your use case. This turns
out to be very difficult. The few benchmarks you find online can not be easily
compared. The interesting choices are usually just released and have no
benchmarks yet. More problematic, benchmarks seldom list the energy and cost
of a system

every year I try to design the lowest price/performance/watt system I can
find, I spend a few weeks on this. Even with that much effort I have not been
able to establish if a cluster of raspberry pi's is faster and cheaper than an
AMD EPYC with several cpu's. Establishing the cost is even harder, if you want
to manufacture enough of the systems to benefit from scale. (not a problem in
a homebrew system)

Currently I guestimate that the best price/performance/watt system is some
mass produced $1-$3 ARM (soon RISC-V) with two DRAM chips and a fast network
connection like an FPGA with high speed SERDES link fabric switch. Price per
core plus memory must be way below $10. Raspberry pi's are no contenders, they
only have 200-300 Mbps network for 4 cores to share, that should be several
gigabits/s to be competitive.

A custom build AMD EPYC with several GPU's networked together can turn out to
be faster and cheaper. The retail price needs to be lowered and you need to
build a similiar network fabric switch.

You can find better performance if you tailor the processor design to the
task. So a system with the best price/performance/watt will be different for
different software.

Even cheaper will be systems where you can rebalance part of the compute
recources between different programs. The system will can less efficient
because the hardware reconfigurability has has a high overhead, but that is
offset by being more efficient for one program. You could for example balance
transistors between integer, floating point, cache or network-on-chip (NoC).

We are currently making prototypes of our own design of a reconfgurable
manycore processor with NoC fabric. Around $9 per core in an FPGA. The entry
system costs around $80 and $500.

We plan to build this as an ASIC, the price will then drop to $1 per core
including DRAM.

Even cheaper will be if we not slice the wafer into 22000 chips but leave it
whole. You get over 100.000 cores with little memory with petabits/s network
for less than $6000. Half the wafer is reconfigurable logic that can be
reprogrammed at runtime as GPU, TPU, CPU or any other custom optimisation. The
energy cost of the wafer can be zero if you use the wafer as a water heating
element and lower if you only run it on solar PV during the day. (If you share
two wafers with a user on the night time side of the earth, you both can have
24 hour computing on $0,02 per kWh of solar PV). We can make a 180nm $500
version and a $1 version but they would not be as good as the 28nm $6000
version. A future 7nm wafer scale version might never be cheaper than the 28nm
version, we will have to wait and see.

An silicon optical network on the wafer (also $6000) would allow two or more
wafers to be networked at several terabits/s. This overcomes its small memory
problem.

Think of it, you want to get rid of any overhead in the system, like pcb, chip
package, connectors, cables. You want to put everything on a single large
chip, the whole wafer. Because we have different CMOS technology for DRAM or
processors, we wind up with needing one wafer.

We are confident in 30 years we can grow the wafer(or 3D block) and the solar
panel from CO2.

~~~
starlingforge
Can you elaborate on this? I am a programmer by day, amateur scientist by
night and I am interested i this exact process. Where can I read more about
this? Do you have a corporate landing page for me to start at?

~~~
morphle
You can find my email in my profile.

Which exact process you want me to elaborate on? I can gather some papers to
read more on this.
[https://scholar.google.nl/scholar?q=siliconsqueak](https://scholar.google.nl/scholar?q=siliconsqueak)

Our corporate landing page is a mess this month, with broken links
[http://morphle.com](http://morphle.com)

