
Introducing Zaius, Google and Rackspace’s Open Server Running IBM Power9 - kungfudoi
https://cloudplatform.googleblog.com/2016/10/introducing-Zaius-Google-and-Rackspaces-open-server-running-IBM-POWER9.html
======
jacques_chester
IBM have done lots of work to port stuff to ppc64le (edit: wrong, see reply),
which I _think_ is the particular arch for P9. One of the reasons for
introducing the -le variant was to make porting easier.

They ported Go; in my day job I've seen them porting various parts of Cloud
Foundry and supporting infrastructure.

Basically I'm interested to see if Google open these up on GCP. As points of
differentiation go, it would be a doozy.

~~~
4ad
> They ported Go;

No, they didn't. Minux Ma, a major Go contributor unaffiliated with IBM,
ported Go to POWER.

~~~
zdw
I would be nice if they would continue to maintain the big endian PowerPC
branch of Go, as that appears to have atrophied in recent years.

From a software quality standpoint, testing on older Apple PowerPC gear is
probably the cheapest (PowerMac and Xserve G5's are in the $100-300 range on
the used market) and highest performance big endian systems available, and
they're still natively supported by a lot of mainline distros, Ubuntu in
particular.

The only other BE systems out there of comparable performance are SPARC
systems which either are very expensive or have low-single threaded
performance (T1 and T2 based).

~~~
4ad
> I would be nice if they would continue to maintain the big endian PowerPC
> branch of Go

Who is "they"?

If you are referring to the Go project, big endian power is supported.

> as that appears to have atrophied in recent years.

What exactly has atrophied?

> Apple PowerPC [...] highest performance big endian systems available

???

POWER8 (and soon POWER9) systems are available today, in big endian mode. You
don't need obsolete hardware.

Modern MIPS64 is also faster than old Apple G5s, albeit it can be tricky to
get one.

ARM64 is available in server grade hardware, with current level of
performance. Unfortunately, even though ARM64 supports big-endian, nobody
deploys in big-endian mode. When I did the ARM64 Go port, I did run my ARM64
hardware in big-endian mode for a while, but as that required created my own
distribution, I never committed big-endian ARM64 support in the Go port. If a
big-endian ARM64 distribution ever appears, we'll definitely add ARM64 big-
endian support to Go.

> The only other BE systems out there of comparable performance are SPARC
> systems

FYI, I am working on a SPARC64 Go port. It's in a very advanced stage. I hope
it will be ready for Go 1.8. I am using a S7-2 system, 4.13GHz, 128 threads,
256GB RAM, very good single threaded performance. I can assure you it's very
performant, but yes, it's more expensive that POWER.

In any case, thanks for your support! We definitely need more awareness of
non-x86 architectures.

------
cm3
If Rackspace, given their major support for various opensource projects, were
to provide POWER9 runners for, say, Gitlab CI, this could be a major help in
porting software. Or, they could, like IBM provide SSH access to interested
projects. But the CI part is important to ensure there's no regression, and
given the scarce availability of POWER9 (or even POWER8) hardware to the
general public, let alone opensource developers, Gitlab CI integration sounds
like the more practical service.

~~~
jacques_chester
IBM sponsor a fleet of POWER-based systems at the OSU Open Source Lab[1].

Edit: You already said this ("Or, they could, like IBM provide SSH access to
interested projects."), and if you'll excuse me, I'm going to go hide in
shame.

In my day job we've interacted with an IBM team who are porting our entire
buildpacks pipeline[2][3] (which uses Concourse) to run on ppc64le. We fall
under the Cloud Foundry heading the list of projects.

The eventual goal is that we will be able to run x86 workers (on a regular
commercial cloud) and some POWER workers at OSU-OSL or SoftLayer, and build
both kinds of binaries from the same pipeline.

I believe the _eventual_ eventual goal is that all Cloud Foundry pipelines and
products will be fully available across both x86 and ppc64le, including first-
class integration with any pipeline producing binaries. Given that buildpacks
represents the bulk of the binary volume, it makes sense to ensure our entire
pipeline works on ppc64le.

Disclosure: I work for Pivotal, not IBM, and I'm not able to commit either to
anything.

[1] [http://osuosl.org/services/powerdev](http://osuosl.org/services/powerdev)

[2] [https://buildpacks.ci.cf-app.com/](https://buildpacks.ci.cf-app.com/)

[3] [https://github.com/cloudfoundry/buildpacks-
ci](https://github.com/cloudfoundry/buildpacks-ci)

~~~
cm3
What would be the course of action for an opensource project to set up a CI
worker there (ideally per-commit on X branches, not periodic) such that it
could be integrated in a pre-merge check? I'm not bound to Gitlab CI runners,
but it's the first thing that came to mind given the popularity of github and
gitlab.

~~~
jacques_chester
I honestly have no idea. I imagine access is mediated by OSU, not IBM. The
contact page ([http://osuosl.org/contact](http://osuosl.org/contact)) seems
like the place to start.

One of the tricky parts about running PRs is that you're running arbitrary
code, for which the main threat is the exfiltration of secrets. You need to
lock down the workers fairly tightly to avoid unintended consequences. I'd be
interested in reading more about how Travis, CircleCI, Gitlab et al do it --
some light googling didn't turn up any specifics.

Edit: looks like CircleCI call this out explicitly and state their defences --
[https://circleci.com/docs/fork-pr-builds/#security-
implicati...](https://circleci.com/docs/fork-pr-builds/#security-implications-
of-running-builds-for-pull-requests-from-forks)

~~~
cm3
That CircleCI post seems to talk about different issues than I see when I
think safety of random CI jobs.

It's a non-trivial problem to solve, especially with caching of artifacts
involved. You'd probably want to run a sandbox inside a vm and secure the vm
itself first, while having only ephemeral storage attached. Barring a
container escape via just read/write/execve allowed inside the sandbox, which
could probably also used to escape the surrounding vm, there isn't much you
can do if you support running random stuff in a CI job.

Actually, maybe CI needs to be limited to tools that can run on something like
ZeroVM.

Limiting persistent state and spinning up machines (vm or bare metal) for each
job, while having no permanently active job runners, sounds like another
defense to consider.

That said, I very much doubt any of the CI services goes to such great
lengths, given the limitations involved.

~~~
jacques_chester
> _Limiting persistent state and spinning up machines (vm or bare metal) for
> each job, while having no permanently active job runners, sounds like
> another defense to consider._

I can imagine how I'd do this with Concourse, but it'd be confusingly meta in
approach -- a pipeline which builds a new pipeline with a new worker for each
PR.

I still think the exfiltration threat is the worst. _Any_ secret injected into
the environment of _any_ tested codebase is vulnerable -- especially if your
logs are public.

~~~
cm3
> I still think the exfiltration threat is the worst. Any secret injected into
> the environment of any tested codebase is vulnerable -- especially if your
> logs are public.

Fair point, though instead of worrying about that, I think the real solution
is to have test-only keys and also make sure logs can be shared without fear
of leaking data.

~~~
jacques_chester
We (buildpacks team) get some of the way by ensuring that all secrets in our
logs are redacted -- we actually wrote a rough-and-ready tool (concourse-
filter[0]) for this purpose. It works on a whitelist principle. Any
environment variable emitted to stdout or stderr is redacted unless it appears
on a whitelist[1].

You're right that in the longer run, providing per-test keys will be the
safest option. It's on our radar as part of the overall "3 Rs" effort[2].

[0] [https://github.com/pivotal-cf-experimental/concourse-
filter](https://github.com/pivotal-cf-experimental/concourse-filter)

[1] [https://github.com/cloudfoundry/buildpacks-
ci/blob/1c345c30e...](https://github.com/cloudfoundry/buildpacks-
ci/blob/1c345c30e1f9bcabf7d56cfe78ab70d0104cd0c4/build/filter.sh)

[2] [https://medium.com/built-to-adapt/the-three-r-s-of-
enterpris...](https://medium.com/built-to-adapt/the-three-r-s-of-enterprise-
security-rotate-repave-and-repair-f64f6d6ba29d#.tefbtzegc)

~~~
cm3
Right. Unfortunately, Rotate and Repave are not common practice, just like
periodically restoring backups isn't.

~~~
jacques_chester
We're working on it. One day I expect it'll be considered normal.

------
cm3
Since Zaius also seems to have fully open firmware for most (all, including
USB controller?) pieces, it would be nice to get something like a <3k$
workstation. I would say Raptor Talos, but they seem to have their hands full
with the POWER8 workstation and I wouldn't want them to divert resources to a
P8 -> P9 move and further delay their project.

------
gok
Dumb question: with lots of NVLink/OpenCAPI bandwidth for GPUs/FPGAs, what's
all the PCIe bandwidth for? I count something like 150 GB/sec of it in that
system diagram. Terabit ethernet?

~~~
kps
Based on what's not on the board, I'd guess network and/or local storage.

~~~
gok
Storage! Of course

------
pantalaimon
Will I be able to buy this machine?

~~~
0xcde4c3db
I wouldn't count on it. OCP designs seem to be targeted at companies that buy
servers and switches by the truckload.

------
rbanffy
Odd...
[https://news.ycombinator.com/item?id=12709995](https://news.ycombinator.com/item?id=12709995)

~~~
Rexxar
You seem not new here (>3000 days). Is it really the first time you see that
the duplicate detection algorithm don't work reliably on HN ?

This bug is not even really a problem: it allows to give a second chance to
unlucky good submissions.

~~~
sctb
It's not a bug, resubmissions are explicitly allowed by the software for
stories that haven't gotten significant attention.

