
Banana Pi to Launch 24-Core Arm Server - ingve
https://www.cnx-software.com/2018/12/26/banana-pi-24-core-arm-server/
======
mntmn
The sad thing about this chip is that you won’t even get a datasheet for it,
nor a reference manual. At least SocioNext didn’t want to share it with me.
That’s why I prefer NXP or anything else that has NDA-free datasheets.

~~~
aswanson
Seriously,why the f--k do vendors do that. Broadcom used to be notorious for
that nonsense. They might as well keep their Manhattan Project chips to
themselves.

~~~
ohazi
> used to be

Sorry to be the one to wake you from your apparently very pleasant daydream,
but I just picked three random chips from their website and the only links
available are "request info" and "contact sales." I don't see any evidence to
support a claim that they've changed at all. They're still the worst.

~~~
Stratoscope
> Sorry to be the one to wake you from your apparently very pleasant daydream

This seems a bit uncharitable. Here's another way to look at it: perhaps the
commenter you replied to was familiar with Broadcom's policies from some years
ago, but had not kept up to date with them and didn't want to say one way or
the other about Broadcom's current behavior.

------
IntelMiner
Super niche use case. But I would absolutely love one of these for cross-
compiled development work

Right now if I want to cross-compile Gentoo for ARM and ARM64. I can build
"most" packages with the included emerge-wrapper. Which is great as it runs
basically at 1:1 speed as doing x86_64 compile jobs

However a ton of packages still fail, many of their upstream developers also
refuse to incorporate changes that might fix that
[https://dev.gnupg.org/T2370](https://dev.gnupg.org/T2370)

For those packages, I have to use a QEMU usermode chroot. Which on my i7-4790
build host, is slower than native compilation on the Raspberry Pi itself

I'd love to be able to do direct, native builds to sidestep these flaws. But
every "consumer" ARM board (Raspberry Pi, ODROID, ROCK64 etc) are flawed in
some way as to make them unusable for development

\- Raspberry Pi lacks enough RAM and will hang even when doing single-threaded
builds (heavy packages like GCC and Rust usually hit this limit)

\- The ODROID-C2 hits a similar limitation due to only having 2GB of memory
itself

\- The ROCK64 "can" complete full self-hosting builds (slowly) but has a
staggering amount of kernel bugs relating to its Ethernet and USB 3, which
frequently cause the system to hang

~~~
jancsika
> \- Raspberry Pi lacks enough RAM and will hang even when doing single-
> threaded builds (heavy packages like GCC and Rust usually hit this limit)

Can someone ELI5 what exactly the compiler is doing when needs far greater
than 2 Gigs of an RPI's RAM to compile itself/Rust?

Also, how is it a "flaw" of the RPI that GCC must exceed 2 Gigs of RAM to
compile itself/Rust?

Warning-- I may use the explanation to cudgel future Electron-so-fat
threads...

~~~
justincormack
Generally it is the linker that seems to be the issue AFAIK. 32 bit platforms
can no longer link at all for large programs like Firefox. Link time
optimization will probably make this worse. The gold linker (that Android
uses) uses less memory than GNU ld; possibly the llvm linker is better too.

~~~
jancsika
Are there any articles about the algorithm GNU ld uses and why it can't work
efficiently with less memory?

------
amelius
> The video below shows the server’s 24 cores fully utilized while building
> the Linux kernel, and as the title implies it’s running the recent Linux
> 4.19.

And how long does it take?

~~~
Alupis
Even if it takes a while, it's much more convenient to native compile than
cross-compile for linux distro maintainers and those who are compiling ARM
packages.

Cross-compilation gives you performance, but a lot of other headaches... and
currently using a quad or hex-core ARM takes quite a long time... so this
would give a significant performance boost, especially with the 32GB of ram it
seems to have.

~~~
emidln
Speak for yourself. I'll take faster builds in exchange managing a cross
compilation toolchain any day of the week.

~~~
Alupis
Perhaps for building a webapp or something...

For linux distros, they download the source, apply some patches, then let
their build server/farm have at it.

Managing a cross compilation toolchain in addition to cross compilation flags
and what-not for joe-random-package with quirks quickly becomes a nightmare.
Version upgrades will break everything, or pull in build-system linked
libraries when they're not supposed to, etc.

This sort of high-core count arm server alleviates a lot of headaches - it
gives you faster build times as well as native builds. It also does away with
a lot of complexity in your build pipeline, since you don't have to carefully
isolate the build environment to prevent rouge linking x86 binaries.

Most linux distros that offer ARM versions already build natively... so this
just gives them better build times, or more concurrent builds. That's a win.

------
syntaxing
I wonder what the price point will be. I always wanted a low cost DIY NAS
server with PLEX server capabilities. I tried with a RasPi 3 but had some
problems with it hanging up all the time when I ran a SAMBA server
concurrently.

~~~
ganeshkrishnan
Is it a problem with all raspis? I have two of them and they all conk off
after few days due to voltage issues and I have to physically restart them.

~~~
kryptk
What are you using to power them? A proper 5V 2A supply should have no issues
at load.

~~~
trashcan
Yeah, that is a very likely cause. Make sure to get a quality power supply and
a decent MicroSD card (as in, not the cheapest one on Amazon) so it doesn't
fail after a few months. I have Raspberry Pis that have had uptimes of years+
with no issues.

~~~
05
And set up readonly rootfs. Or use a Banana Pi with onboard NAND to store
rootfs. Or do both, just to be sure.

------
aritmo
What are the use-cases for such a server? It does not look good for desktop
use.

~~~
Twirrim
ARM has some interesting power and heat advantages over Intel/AMD at the
moment. Power and Cooling make up significant costs for cloud providers (and
for many companies, for that matter), especially on the cloud storage side.
For a fair number of the services, particularly ones with very bursting
workloads, that superior idle CPU usage, and reduced heat output is _very_
interesting. It opens up a number of possibilities where "good enough"
performance is actually good enough, and the extra rack density provides some
strong advantages, _particularly_ in easily parallelisable workloads (or
completely independent workloads happening in parallel, for example functions
as a service)

Amazon isn't just providing ARM CPU servers because of customer demand, and
nor is it just to get leverage with Intel. That's not the way they tend to
work. They're providing them so that they can leverage them in-house as well.
Amazon just likes to take the position that what is good for the goose is good
for the gander. If it's good for them, there's bound to be customers that will
find it good for them also, and as you sell it to customers, you get to those
advantages of scale much quicker.

~~~
deepnotderp
Arm has no intrinsic advantage over x86 ISA wise, for the most part. High
performance arm cores consume similar energy to x86.

~~~
floatboth
Not having to decode variable-length instructions is one advantage.

------
equalunique
24 cores & 32GB of RAM? This might just be enough for me to run Chromium with
all my extensions enabled!

------
Quequau
I had this idea about ARM servers that they were going to be significantly
smaller, demand less power, and generate less heat than the existing x86
units. Where I work all of our servers fit in something like 22U and I expect
a solid minority of them could be swapped out for something smaller.

Instead what we got are big chips with ARM ISA that are essentially scaled up
to roughly Xeon sizes... which in hindsight I suppose is obvious coz that's
probably where the money is.

Nevertheless I still think there is a niche for smaller servers with smaller
granulation of quanta of compute capacity.

~~~
tjoff
Not sure how small you want it but there are smaller and less power hungry x86
alternatives too. The more server-oriented atom chips (2-16 cores) are pretty
decent and the much more expensive but more powerful Xeon-D (4-16 cores) can
both be found in mini-ITX motherboards and pretty low-power configurations.

Have not kept myself up to date in case AMD offers decent alternatives in this
segment.

Though my hope would be for ARM to lower the price point for these kind of
servers.

------
microcolonel
I think it can't be a SC2A11, because the SC2A11 maxes out at 16GiB of memory,
AFAIK.

I wonder what the TDP is on the SoC, the SC2A11 is only 5W (though they get a
lot out of that!). I'd like to see what they could accomplish with a higher
TDP target.

~~~
rjsw
There is a link in the article to another one about a Linaro development
platform, it states that the chip can use 64GB.

~~~
microcolonel
Hmm, maybe they're configured with different memory controllers? WikiChip and
a couple others seem to say it's 16GiB, maybe that's 16GiB per DDR4 PHY.

------
dmitrygr
A53 is a rather slow in-order core. I am not sure what use cases they imagine
for this.

~~~
yjftsjthsd-h
Acceptable server performance without worrying about Spectre?

------
wyldfire
It would be interesting if consumer grade parts could hit a low enough price
point for server boards without easily serviced parts [integrated CPU, mem,
basic storage]. Maybe you could have cheap boards that deliver high
reliability in aggregate.

------
contingencies
They are not the only ones in Guangdong with hardware in this space. I saw an
RK3399 ARM64 cluster board at Firefly a few months ago.
[http://en.t-firefly.com/](http://en.t-firefly.com/)

~~~
subway
The RK3399 doesn't come anywhere close to this. The BPi board looks to be 24
cores on a single soc (or at least running a single system image). The closest
you could get is to hang 10gig ethernet off of the pcie bus on 4+ RK3399s.
Then you also have to start thinking about fun things like cluster scheduling,
and the overhead of running an OS on each device.

~~~
contingencies
Yes, these were carrier boards with numerous (8-16?) sub-boards each of which
had an RK3399 CPU, each of which has 6 CPU cores (2xCortex-A72 cores,
4xCortex-A53 cores) plus a NEON coprocessor and Mali T860 MP4 GPU. The article
discusses 24 cores. The Firefly hardware I saw would therefore have at least
6x8 = 48 CPU cores, or double the ARM64 CPU cores discussed in the article,
before even counting the coprocessor or the 8-16 GPUs also included.

~~~
subway
Depending on workload, pure core count can be irrelevant when your
interconnect is trash and you have no shared memory. You _might_ see OK
performance if the interconnect somehow supports RDMA, but that seems
exceedingly unlikely. Almost all these cluster boards are using gigabit
ethernet interconnect. Not to mention you spend a core or more per SoC to
handle your os and interconnect/network interrupts.

NEON is implemented on a per-core basis, not as a separate coprocessor. Every
A53 (and your handful of A72s) supports NEON.

I'll give you the GPUs... if you really want to put in all the effort for that
sweet sweet OpenCL 1.2 action on an out of date vendor kernel.

An RK3399 cluster, and a single many-core system is a nonsensical apples to
oranges comparison.

------
floatboth
That socionext A53 chip again… BUT if they manage to make it way cheaper than
the developerbox, that might actually be interesting. Otherwise, I'd rather
have four A72 cores (see macchiatobin) than twenty-four A53s.

------
awwyis
What's about power consumption at full throttle? In watts, say compiling linux
kernel with -j24/-j48.

------
stevefan1999
I'd love to see a Banana Pi with x86 on it. Or literally any SBCs with good
and afforadable x86

~~~
W-Stool
Odroid H2 - dual gb nics, SBC (but a bigger one), x86_64, SATA and USB3. $111.

------
KiDD
I do want one....

------
etaioinshrdlu
If one can make a board with enough cheap high performance CPUs, it may be
good for neural nets. The main reason we use NVIDIA hardware is because it's
cheaper per FLOP.

~~~
Alupis
> The main reason we use NVIDIA hardware is because it's cheaper per FLOP

... well... and largely because most neural network software only supports
CUDA and/or has rudimentary OpenCL support.

Nvidia hardware isn't really cheaper per FLOP in all cases, AMD and Nvidia
still leapfrog each other back and forth.

~~~
etaioinshrdlu
Sure, but every framework has full support for CPUs.

~~~
Alupis
That's true to a point... latest versions of TensorFlow, for example, have
caused a few issues with instruction sets that not all CPU's (even produced
today) have.

~~~
walterbell
Is that because of CPUs that are only available at major cloud providers?

~~~
Alupis
No, not entirely.

AVX instruction set, for instance, seems to only be on select Intel and AMD
cpu's, even modernly manufactured ones. TensorFlow changed to requiring AVX
since version 1.4 (Nov 2017), which has caused grief for many users... even
those with recent/performant systems.

