Helios: A distribution of Illumos powering the Oxide Rack

milon · 2024-01-29T19:11:08

I'm glad this is out, i'm going to deploy this locally and learn as much about it as possible. Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there. Thank you Oxide team!

refulgentis · 2024-01-29T19:17:11

Can you get me excited? I spent 20 seconds browsing the homepage and walked away with "so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?"

But immediately got myself to "what does a server OS do anyway, doesn't it just launch VMs? You don't need Linux, just the ability to launch Linux VMs"

Tell me more? :)

mustache_kimono · 2024-01-29T19:37:21

> so the idea is vertical integration for on-premise server purchases? On custom OS? Why? Why would people pay a premium?

As I understand it, re: vertical integration, the term is actually "hyperconverged". Here, that means it's designed at the level of the rack. Like -- there aren't per compute unit redundant power supplies. There is one DC bus bar conversion for the rack. There is an integrated switch designed by Oxide. There is one company to blame when anything inside the box isn't working.

In addition, the pitch is they're using open source Rust-based firmware for many of the core components (the base board management controller/service processor, and root of trust), and the box presents a cloud like API to provision.

If the problem is: I'm running lots of VMs in the cloud. I'm used to the cloud. I like the way the cloud works, but I need an on-prem cloud, this makes that much easier than other DIY ways to achieve (OMG we need a team of people to build us a cloud...).

steveklabnik · 2024-01-29T19:59:22

The terminology in this space is confusing, but "hyperconverged" isn't really what we're doing. I wrote about the differences here: https://news.ycombinator.com/item?id=30688865

(That said I think other than saying "hyperconverged" your broad points are correct.)

mustache_kimono · 2024-01-31T14:06:01

> but "hyperconverged" isn't really what we're doing.

This is fair, especially as it follows a discussion of how you're building at the rack level, which is more like a mainframe than current "hyperconverged" offerings. Recommend others read the linked post.

SteveNuts · 2024-01-29T19:31:48

It seems like the folks on HN tend to think the world runs on AWS (I'm not trying to say they don't have a huge market share), but many huge enterprises still run their own datacenters and buy ungodly amounts of hardware.

The products that are on the market for an AWS-like experience on-prem are still fairly horrible. A lot of times the solutions are collaborations between vendors, which makes support a huge pain (finger pointing between companies).

Or, a particular vendor might only have compute and storage, but no offering for SDN and vice-versa. This sucks because then you have two bespoke things to manage and hope they work together correct.

These companies want a full AWS experience in their datacenter, and so far this looks to be the most promising without dedicating huge amounts of resources to something like Openstack.

refulgentis · 2024-01-29T21:17:59

The "(finger pointing between companies)" took me from confusion to 100% understanding, was at Google until recently. It was astonishing to me that it was universally acceptable to fingerpoint if it was outside your immediate group of ~80 people.*

Took me from "why would people go with this over Dell?" to "holy shit, I'm expecting Dell to do software and make nvidia/red hat/etc/etc etc/etc etc etc help out. lol!"

* also, how destructive it is. never, ever, ever let ppl talk shit about other ppl. There's a difference between "ugh, honestly, it seems like they're focused on release 11.0 this year" and "ughh they're usless idk what they're thinking??? stupid product anyway" and for whatever reason, B made you normal, A made you a tryhard pedant

lijok · 2024-01-29T19:42:12

Wouldn't a "full AWS experience in their datacenter" be AWS Outpost?

mustache_kimono · 2024-01-29T19:44:57

> "full AWS experience in their datacenter"

... Including the bill!

lijok · 2024-01-31T15:32:59

Part of AWS managed services offering. AWS Managed Wallet

mardifoufs · 2024-01-29T19:49:11

Is AWS outpost truly a full AWS stack/experience? I thought it wasn't actually meant to be a "data center in a box" experience, but more so a way to run some workloads locally when you are already using AWS for everything else.

PeterCorless · 2024-01-29T23:49:12

Some data products will run successfully in AWS Outposts. Others will not. For example, AWS itself can't run DynamoDB in an AWS Outpost. It recommends users to run ScyllaDB in DynamoDB compatible mode.

e.g., https://www.scylladb.com/2020/09/15/scylla-cloud-on-aws-outp...

Disclosure: I worked at ScyllaDB.

adfm · 2024-01-29T19:46:51

With DHH and others promoting a post-SaaS approach (once.com, etc.) we might see hardware refresh as cost-cutting. Astronomical compute bills and lack of granularity bring all things cloudy into sharp focus.

threeseed · 2024-01-30T00:42:47

What they are doing is SaaS by stealth.

You buy their product once, but it only has bug and security fixes for 3 years.

Which means every business is going to need to upgrade on a cycle anyway.

mlindner · 2024-01-30T01:12:02

People don't usually throw out their server hardware after 3 years. After 3 years is up they'll probably sell service plans. And with the code being all open source some owners may go the self-supported route, though probably most will buy service plans.

threeseed · 2024-01-30T01:52:11

Was actually referring to DHH and the 37signals products.

Whilst I think we will see a trend back towards more on-premise hardware I don't think SaaS is going away anytime soon. And in fact it's arguably better for everyone because the software is being continually maintained.

_zoltan_ · 2024-01-29T19:51:43

OpenStack is pretty smooth sailing these days and I bet you it would be much cheaper to just get 3 FTEs for your OpenStack install than an Oxide rack

linksnapzz · 2024-01-29T20:41:10

Where, exactly, are you getting these 3FTEs qualified to touch production OpenStack infra, for more than a year, where their aggregate cost is less than a rack of equipment?

_zoltan_ · 2024-01-29T22:13:40

if you need OpenStack you're not running one rack, but a couple dozen.

KAMSPioneer · 2024-01-30T07:52:30

That...sounds like a market segment you've just discovered for Oxide.

gtirloni · 2024-01-29T20:54:16

The rack doesn't require FTEs?

linksnapzz · 2024-01-29T21:44:00

Not three of them; it ought to be about as difficult to administer as a single rack of hw, +Vsphere, if that.

milon · 2024-01-29T19:26:57

Having a solid on-prem rack product to me is a great thing. I like IaaS services a lot, don't get me wrong, and I think they're the right pick for a bunch of cases, but on-prem servers also have their "place in the sun", so to speak :) I could present any number of justifications that I don't think I'm qualified enough to defend, but the gist is that at the bare minimum, I'm glad the option exists.

As to why I'm personally excited: I enjoy the amount of control having such an on-prem rack would afford me, and there surely could be a great amount of cost-savings and energy-savings in many scenarios. Sometimes, you just need a rack to deploy services for your local business. I like the prospect of decentralizing infrastructure, applying all the things we've learned with IaaSes.

bionsystem · 2024-01-30T06:49:57

In the last 10 years and 6 different clients/employers I worked there is pretty much no way to run production on the cloud. Only 1 of them had some stuff running in the (GCP) cloud at all.

Of all of the 6 infrastructures I've seen, only 1 of them is half decent, with 6 dedicated teams around the datacenter working closely together (by dedicated I mean, nothing is required of them concerning the core software product that the company develops). Network, Unix/Virtu, Windows, Storage, PC, and datacenter. That's 30+ people just to run a couple big datacenters and a few more server rooms. The service was actually quite good with VMs/zones delivered under an hour and most tech issues solved in half a day. The other infrastructures were either bigger or smaller, with more or less people, and were all terrible, sometimes needing weeks of email exchanges with excel attached to get a single VM.

AWS was the dream everywhere I went for everybody. Oxide may be coming out with a product that will solve a LOT of issues. SmartOS/IllumOS has all the tech to be self-sufficient (virtualization, storage, SDN...), add support for networking and storage and you get a complete product that a handful of people can run (well, you still need a windows team in most cases but fine).

lijok · 2024-01-29T19:40:31

> Why would people pay a premium?

I would pay a premium just to not have to deal with HPE, DELL, etc

_zoltan_ · 2024-01-29T19:52:12

Dell's been nothing but fantastic for us (compute, not storage.)

sarlalian · 2024-01-29T22:22:11

Dell is a mixed bag depending on how well the individual region you are dealing with is doing overall. Things were great for us, but something changed and now getting good support for hardware failures has been a nightmare of jumping through hoops, time zone handoffs to other teams, and forced on-site techs to replace a stick of ram.

adamnemecek · 2024-01-29T19:56:52

One company making both HW and SW generally leads to really good, integrated experiences. See e.g. Apple.

0cf8612b2e1e · 2024-01-29T20:16:33

I am really hoping the broader industry takes note. By owning the platform, the Oxide team was able to dump legacy stuff that no longer makes sense.

throwup238 · 2024-01-29T19:21:17

The best elevator pitch I've heard is "AWS APIs for on-prem datacenters". They make turn-key managed racks that behave just like a commercial cloud would with all the APIs for VM, storage, and network provisioning and integration you'd expect from AWS, except made to deploy in your company's datacenter under your control.

magnawave · 2024-01-29T20:48:40

I guess the wildcard is price.

AWS's pricing model works kinda at their OMG eyewatering scale - aka all the custom hardware they design is highly cost optimized, but just doing custom hardware has a notable cost. This is easily covered by their scale, to make for their famous margins. [during their low scale times, they did use a good bit of HP/Dell, etc]

Oxide seems to be no different (super custom hardware) only major difference being the "in your datacenter" part. Since you own the cost of your datacenter, Oxide has to come in a lot cheaper to even compete with AWS, but how do you do that with low volume [and from the look of it not-cost optimized, but instead fairly tank-like] bespoke hardware? Feels like the pricing / customer fundamentals are going to be pretty rough here outside perhaps a few verticals.

kaliszad · 2024-01-29T23:20:00

Oxide seems to be a lot more efficient than a rack full of 1U servers with each having 2 PSUs + 2 ToR switches + 1 management switch somewhere for all the OOBMs. All those little fans and power conversions eat a lot of power, the fans and the PSUs all cost something too. Also, have fun managing all of that in a secure manner or debugging anything at all. Once you add the VMware licensing you might end up with more or less the same cost up front and quite likely higher overall cost. And I am not even beginning to talk about racking/ stacking of the whole rack. I haven't seen much support even when Dell/EMC owned VMware and together produced the VXRail lineup and the company I used to work for was presented as the reference project in Saxony, Germany at that time. All of the boxes would add up to about 2 standard racks but it was representative of the other bigish customers in that area and time.

I imagine, some of the customers will order 1-2 racks half full and over a few years possibly add a few sleds, these will probably demand great GUI/ manual experience and possibly competitive Oracle/ SAP/ MSSQL benchmarks and I can imagine Veeam integreation. Other customers such as the DoE or some big enterprise customers will order whole rows of racks and demand perfect automation options. That is just a guess.

sarlalian · 2024-01-29T22:27:24

Datacenter costs are weird. The first big cost is having a datacenter. However once you have the space, power, cooling and that part makes sense, then the actual hardware going into it can have a pretty decent premium and still be highly competitive with AWS. It will also depend heavily on what you are doing and producing, if the answer to that is a large amount of data, and it needs to transit out of AWS, suddenly the cost of a pretty large datacenter is really cheap in comparison. AWS egress fees have a markup that will make your accountants panic. From a hardware standpoint, once you need GPU compute or large amounts of RAM, the prices get pretty dumb as well.

jjav · 2024-02-01T08:29:22

> Oxide has to come in a lot cheaper to even compete with AWS

Which should be pretty easy. I don't know the exact costs but in a previous Oxide discussion the number 1M was thrown around. If that's roughly correct, that is comfortably less than a single year of AWS bills at most startups I've been in (except the very tiny ones < 15 people).

Haven't seen any performance numbers either so admittely estimating here, but from what I know about building racks of 1U servers and knowing that Oxide is more efficient, I can believe an Oxide rack should handily outperform the AWS VMs we (the startups) were paying >>>100K/mo for.

If these numbers are anywhere in the ballpark, an Oxide rack should easily be saving quite a bit of money already by year two.

kortilla · 2024-01-29T19:26:03

That’s the elevator pitch for open stack

steveklabnik · 2024-01-29T21:08:04

You are not wrong that OpenStack is sort of similar in a sense, but the difference is that Oxide is a hardware + software product, and OpenStack is purely software.

capitol_ · 2024-01-29T19:29:23

That just sounds like a bunch of api's on top of linux.

throwup238 · 2024-01-29T19:30:45

Just like Dropbox is a bunch of APIs on top of FTP.

throwawaaarrgh · 2024-01-29T19:36:34

It's a mainframe. If you can't get excited for mainframes it'll be hard to be excited about this.

IllumOS is the OS/360 to Oxide's System/360. (It won't get that popular but it's a fair enough comparison for illustrative purposes)

panick21_ · 2024-01-29T20:06:43

Except that it use the same standard CPU as commodity machine. Doesn't have much of the extra reliability stuff. It can go from vertical to horizontal scaling. The OS is open source Unix. And yeah its not like a mainframe at all really.

linksnapzz · 2024-01-29T20:42:28

It's a mainframe, for people who do not, actually, know what a mainframe is or does.

kaliszad · 2024-01-29T23:23:54

I bet it costs a fraction of what a similarly powerful mainframe would cost. However I don't think the customers for each overlap that much. If you need a mainframe, you need one and there is no discussion about possible alternatives because there are none.

EvanAnderson · 2024-01-29T19:30:29

I'm excited to see how this compares to SmartOS. I'm pretty heavily invested in SmartOS in my personal infrastructure but its future, post-Joyent acquisition, has been worrying me.

I really wish I did work for an org big enough to use Oxide's gear. Not having to futz around with bogus IBM PC AT-type compatibility edifice, janky BMCs and iDRACs, hardware RAID controllers, etc, would be so unbelievably nice.

nwilkens · 2024-01-29T20:51:27

SmartOS is being actively developed since the aquisition from Joyent[1] in April 2022.

We've released a new version every two weeks post acquisition, and are continuing to develop and invest.

We also hold office hours events roughly every two weeks on Discord[2], and would love for you to stop by and ask any questions, or just listen along!

[1]: https://www.tritondatacenter.com/blog/a-new-chapter-begins-f... [2]: https://discord.gg/v4NwA3Hqay

rjzzleep · 2024-01-29T23:08:34

IllumOS needs to attract new developers. To do that, the platform build needs to become a lot more straightforward. It's a pretty huge endeavour in my opinion. I'd be happy to help out on that regard, but in the past Joyent has not been very open to outside support.

jclulow · 2024-01-30T09:26:08

I've noticed you make this comment repeatedly when illumos is mentioned on HN. I think you're underestimating the irreducible complexity of the build process for what is essentially a whole UNIX operating system, save for a few external dependencies. It's not just a kernel, but an extensive set of user mode libraries and executables. The build is complex in part because it's a complex body of software.

I also think you're overestimating the extent to which make(1S) is the reason we're not more popular than Linux. There are any number of more relevant factors that make someone choose one operating system or another. Also, certainly for me personally my goal is not world domination, merely the sustainable maintenance of a body of software that helps me solve the problems that I work on, and which I enjoy using and developing as a result.

I agree we need (as do all projects!) new developers, both now, and over the long term. We work as we can to make improvements to the build process, and the documentation. We are a relatively niche project, but we do attract new developers from time to time, and we're making changes at least as rapidly as we ever have in the past. There are a number of actively maintained illumos distributions (OmniOS, SmartOS, Tribblix, OpenIndiana, and now Helios) and there are a variety of commercial interests that ship more proprietary appliances on top of an illumos base. For our part at Oxide we continue to encourage our staff to get involved with illumos development as it makes sense for them, and we try to offer resources and assistance to the broader community as well.

If you would like to contribute, we have a guide to getting started: https://illumos.org/docs/contributing/

Please, though, it's "illumos", not "IllumOS"!

rjzzleep · 2024-01-30T09:54:55

I do, yes, but your comment makes it clear that this is a problem that you either don't really think of as a problem, or that you don't know how to address. Building open source communities is hard work. Telling everyone how amazing your product is(even if it is), is only a small part of it. The lesson to take away from your time at Joyent should be that, that way of community building didn't work, and there needs to be some change.

Even in the early 2000s linux had a make menuconfig or make xconfig setting to build linux. And yes this is different, it's a posix distribution. Yocto was a relatively niche project as well and it also addresses the issue of building a collecting of posix applications into a big project, so does gentoo's stage.

I'm sure that at the time of it's creation OpenSolaris was ahead of its curve, but that's how many years ago? You know as well as I do that sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there isn't really a sane way to handle such a build process for a curious third party. Most will probably drop it before it gets to that point.

There have been many many projects that have reworked their entire build architecture, some of which took years to flesh out fully.

What needs to happen for illumos to get a boost of development in the long term is:

1. first for you to acknowledge on a political level that there is an issue that needs to be addressed here, and

2. to then work with the community, and it doesn't have to be across the board, but you need to be willing to invest in some experts and some people interested in solving this, so they can grind out something that is more sane in this current world.

"Read our getting started guide" isn't really all that useful, when most of the complex issues happen after that and are often met with "this isn't how we do things".

jclulow · 2024-01-30T10:19:23

> sprinkling LD_LIBRARY_PATHs here and there and then removing undocumented dot files here and there

I obviously don't have any context about the issues you were facing at the time, and I can't really figure it out based on the advice you ostensibly received. I'm definitely sorry if we have lead you astray in the past, but those are not workarounds I would encourage people to use today. If there's some aspect of the build process that requires workarounds like you're describing, it's definitely a bug and we'll fix it when we're made aware as best we can.

As for the rest of it, I think you're putting the cart before the horse on some level. An operating system is a large and complex thing to work on, regardless of whether it's built with make or ninja or bazel or whatever other build tool.

The Rust toolchain is another similarly complex body of software, which also has a large and at times inscrutable build process. I know because I have personally contributed to it, and had to figure out how to get it to work. Rust obviously has more active contributors than illumos, but it also has vastly more active _users_ -- it is a body of software that has broad applicability to many people and the work they do.

For illumos to continue to succeed as an actively maintained project, what we need to do is continue to inspire _users_ to want to use it. Nobody wants to work on an operating system they don't personally need to use at all. We draw contributions today from a mixture of community driven distributions making fixes or adding features, and by people employed by companies like Oxide who have a vested economic interest in the deployment of the software.

None of this is to say that we're perfect, or that we're not trying to improve things. Just that we're trying to put build system improvements in the proper context amongst all the other work there is to do with our limited resources. It's probably more important that we have support for new Intel client NICs like you would find in a modern desktop system, for example, than it is that we replace make. It's important that we continue to add system calls and libc facilities that other platforms have adopted in order to ease software porting. It's important that we continue to maintain modern JDKs and Python and Go and Rust and C/C++ compilers. It's important that we keep up with security issues and the endless stream of mitigations imposed by the sieve-like nature of speculative CPUs.

There's actually quite a lot of stuff going on for us all the time, and we do still find time to improve the build system. If you have more specifics in mind, that's fantastic and we'd love to here about them concretely! I would encourage you to channel your enthusiasm into writing an illumos project discussion (IPD) describing the issues you see and the work you'd propose to sort them out! You can see some examples of existing IPDs at https://github.com/illumos/ipd

And as ever, if you hit issues in the build as it stands, please file bugs! We can't fix things we haven't heard about.

Cheers.

_rs · 2024-01-29T20:07:02

I had been using SmartOS for a long time but finally had to bite the bullet and give up. I ended up deciding on Proxmox on a ZFS root and am quite happy with it.

icybox · 2024-01-29T21:58:36

I've been running smartos at least since 2015 where I co-located my server. There have been times where I felt like giving up, but people like danmcd, jperkin and others always stepped in and fixed what needed to be fixed for LX to be usable and working. (Keeping java updated and running is hard, uphill battle. Thanks!) I always ran a mixture of OS and LX zones and bcantrill's t-shirt with "Save the whales, kill your VM" made sense. I've used zones in Solaris 10 even before and they just click with me. FreeBSD's jails are nice, but far from it. And linux's cgroups are a joke. And using KVM/VMs for security containerization is just insane. At dayjob, I've implemented multiple proxmox clusters, because we're linux shop and there's no way to "sell" smartos or tritonDC to die-hard debian colleagues, but I've managed to sell them ZFS. With personal stuff, I like my systems to take care of themselves without constant babysitting and SmartOS or OpenBSD provide just that. I don't dislike windows, I love UNIX. You could really feel those extra 20y UNIX had compared to linux. I migrated all my stuff to proxmox for like 2 months. And then went back to SmartOS, because there was something missing ... probably elegance, sanity, simplicity or even something you'd call "hack value".

unethical_ban · 2024-01-30T05:59:19

And here I am, having compared the SmartOS documentation and ease of installation to Proxmox... and with very few complaints am using Proxmox to host a file server on bare metal/Samba container and a OPNSense on VM.

I remember buying the OpenSolaris Bible in 2008, getting really excited to dig into my second Unix (after FreeBSD). And then, the Sun went down on me... and I stuck with Ubuntu 10 years.

rbanffy · 2024-01-30T13:43:11

> and I stuck with Ubuntu 10 years.

For a while Nexenta had an Ubuntu running on the OpenSolaris kernel.

sswezey · 2024-01-31T16:25:06

Do you have a write-up of your SmartOS and/or OpenBSD set up(s)?

geek_at · 2024-01-29T20:28:42

the nice thing about the Proxmox + ZFS setup is that it works and is even recommended without using hardware raid controllers. Less headaches either way.

I recently wrote a guide [1] how to use proxmox with ZFS over iSCSI so you can use the snapshot features from a SAN

[1] https://blog.haschek.at/2023/zfs-over-iscsi-in-proxmox.html

rjzzleep · 2024-01-29T21:18:31

I feel the same. I used a SmartOS distro called Danube Cloud for a long time and am looking to move and looked at Harvester[1] and OpenNebula, but with everything I know about Kubernetes(and LongHorn) I'm reluctant to use something so heavily based on Kubernetes.

At its peak I reached out multiple times to Joyent to fix their EFI support for virtualization. The Danube team had similar experiences with them, working on live migrations for VMs, and a few months back I did a rebase of the platform image to a more recent illumos stack.

Two of the fundamental issues with Illumos is that they don't seem to understand that they need to fix the horrendous platform build to get community support to keep up with the pace of development of other OS's. The platform build is a huge nasty mess of custom shell scripts, file based status snapshots, which includes the entire userspace in the kernel build. Basically if your openssl version is out of wack the entire thing will fail. Not because it has to, but because it was never adapted to modern needs of someone just wanting to hack on a kernel. It's fixable, but I don't see any desire to fix it, and even if that desire eventually shows up it might just be too little, too late.

[1] https://harvesterhci.io/

jjav · 2024-01-31T09:27:06

> Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there.

Same for me. Oxide is the only company I know that I'd really love to work for. Similar (I think, observing from the outside) to Sun. That's what I dream about.

Unfortunately their pay structure is such that I can't afford it, with a family to support. Maybe when the kid is out of university, if I don't need much income anymore, I can fulfill the dream.

DominoTree · 2024-01-30T03:18:18

>> Oxide is pretty much the company I dream to work at, both for the tech stack, plus the people working there.

Thought I was the only one :P

yjftsjthsd-h · 2024-01-30T03:24:42

I mean, I phrase it as "my dream job is systems integration at Sun" (but Oxide is the living equivalent)

rbanffy · 2024-01-30T13:44:21

Sun had some lovely desktop hardware. They also had SPARC.

I really miss Sun.

jjav · 2024-01-31T09:23:06

> I really miss Sun.

There's nothing in the world I miss so much as Sun.

I've done many startups post-Sun and there's been good highs and many lows but, there's nothing like a true hard-core tech company like Sun.

There basically are no tech companies anymore, other than Apple, but they are mass-market consumer oriented which is not interesting.

rbanffy · 2024-02-01T12:20:42

> but they are mass-market consumer oriented which is not interesting.

Kind of. Apple focuses their high end gear on creative professionals. Us Unix geeks have much more modest needs, which are often satisfied by the average uninspired Dell design. Still, Apple has a decent Unix underneath all that glitter. At the same time, there are almost no desktop-friendly Unixes besides the free crowd. HP and IBM have given up on the Unix workstation market eons ago. IBM’s POWER gear can crush the best Xeons and Epycs, but they have nothing to compete with the “good enough” low end.

It’s a shame Oracle doesn’t offer Solaris on their cloud the same way IBM offers AIX (and Z, which, surprisingly, is a certified UNIX as well) on theirs.

jjav · 2024-02-04T21:10:42

> Still, Apple has a decent Unix underneath all that glitter.

Had. That's what pulled me in to Apple laptops after spending the 90s convinced I'd never use a Mac.

With OS X, suddenly it was BSD, but with a Mac GUI! Cool. When OS X (10.0) came out I quickly bought a mac first time ever.

But Apple has spent the last 20+ years making OSX less and less BSD, locking out more and more core functionality into obscure nonstandard behavior.

rbanffy · 2024-02-04T21:19:44

Well... For me and a lot of people, it's still good enough - it runs MacPorts, Python, I can compile things and so on, but, TBH, I also have a couple Linux boxes lying around on my desk and on the network one hop away from my desk, so the deep hackability side is solved outside the Mac in my case.

pjmlp · 2024-01-31T16:04:54

I can share the feeling, despite my UNIX related rants, there were quite a few I actually enjoyed working on, Solaris was one of them.

Still own several Sun CDs they used to give away to developers.

rbanffy · 2024-02-01T12:11:00

You can always install OpenIndiana and use it. It’s been a while I don’t use it, but I gather it’s still a worthy daily driver (plus, Teams, Slack, Outlook, and others simply don’t have installable apps for it.

pjmlp · 2024-02-01T12:22:00

It isn't the same thing as running on Sun hardware, specially if one cares about graphics programing.

rbanffy · 2024-02-01T12:25:10

I never used Suns that way, but I assume any modern gaming desktop can run rings around their fanciest Ultra.

What they can’t do as well as a deskside Ultra Enterprise is to look impressive.

codethief · 2024-01-29T20:28:30

Can anyone ELI5 what Oxide's offer is? I've looked at their website and still got no clue. Is it hardware + software I can purchase and use on-premise? Is it a PaaS / yet another cloud provider?

steveklabnik · 2024-01-29T20:54:14

I believe you're being downvoted because there is already a big thread about this here, though I think that's a bit unfair to you. I haven't posted in that thread yet because I wanted to let others say what is meaningful about the product to them, but this seems like a good place to put my reply. Regardless of all that: it is hardware + software you can purchase and use on-premise, that's correct.

The differentiator from virtually all existing on-prem cloud products is that we are a single vendor who has designed the hardware and software (which is as open source as we can possibly make it, by the way, hence announcements like this) to work well together. Most products combine various other products from various vendors, and are effectively selling you integration. We believe that that leads to all kinds of problems that our product solves.

Another factor here is that we only have two SKUs: a half rack and a full rack. You don't buy Oxide 1U at a time, you buy it a rack at a time. By designing the entire rack as a cohesive unit, we can do a lot of things that you simply cannot do in the 1U form factor. There is a running joke that we talk about our fans all the time, and it's true. Because our sleds have a larger form factor than a traditional 1U, we can use larger fans. This means we can run them at a lower RPM, which means power savings. That's the deliberate design choice. But we also have gained accidental benefits from doing things like this: lower RPM also means that our servers are way quieter than others. That's pretty neat. Some early prospective customers literally asked if the thing is on when it was demo'd to them, because it's so quiet. Is that a reason to buy a server? Not necessarily, but it's just a fun example of some of the things that end up happening when you re-think a product as a whole, rather than as an integration exercise.

codethief · 2024-01-30T03:25:27

Thanks so much for elaborating!

chrishare · 2024-01-29T20:47:49

On-prem, fully-integrated compute and storage solution with cloud-like APIs to provision resources, all with a commitment to open source.

haolez · 2024-01-30T00:05:18

Do you know if they support GPUs or whatever is needed to host LLM models?

steveklabnik · 2024-01-30T00:26:33

The current product does not have any GPUs in it. https://news.ycombinator.com/item?id=39183072

mkoubaa · 2024-01-29T21:16:18

Mainframe 2.0

danpalmer · 2024-01-30T00:29:15

This is really not accurate in any way that matters I don't think. It's a mainframe in as much as you buy a rack and spec it out. It's not a mainframe in that the performance is typical server performance rather than the mainframe profile which is very different and requires different considerations, the compute model is typical server compute rather than the mainframe compute model which (aside from compatibility layers) is a radically different environment to build software for.

sneak · 2024-01-29T18:40:32

I know they’re ex-Sun, but is there any real technical benefit for choosing not-Linux (for their business value prop)?

I know of the technical benefits of illumos over linux, but does that actually matter to the customers who are buying these? Aren’t they opening a whole can of worms for ideology/tradition that won’t sell any more computers?

As someone who runs Linux container workloads, the fact that this is fundamentally not-Linux (yes I know it runs Linux binaries unmodified) would be a reason against buying it, not for.

steveklabnik · 2024-01-29T18:57:18

> does that actually matter to the customers who are buying these?

It's not like we specifically say "oh btw there's illumos inside and that's why you should buy the rack." It's not a customer-facing detail of the product. I'm sure most will never even know that this is the case.

What customers do care about is that the rack is efficient, reliable, suits their needs, etc. Choosing illumos instead of Linux here is a choice made to help effectively deliver on that value. This does not mean that you couldn't build a similar product on top of Linux inherently, by the way, just that we decided illumos was more fit for purpose.

This decision was made with the team, in the form of an RFD[1]. It's #26, though it is not currently public. The two choices that were seriously considered were KVM on Linux, and bhyve on illumos. It is pretty long. In the end, a path must be chosen, and we chose our path. I do not work on this part of the product, but I haven't seen any reason to believe it has been a hindrance, and probably is actually the right call.

> the fact that this is fundamentally not-Linux (yes I know it runs Linux binaries unmodified) would be a reason against buying it, not for.

I am curious why, if you feel like elaborating. EDIT: oh just saw your comment down here: https://news.ycombinator.com/item?id=39180814

1: https://rfd.shared.oxide.computer/

wmf · 2024-01-29T19:07:39

The Linux vs. Illumos decision seems to be downstream of a more fundamental decision to make VMs the narrow waist of the Oxide system. That's what I'm curious about.

amluto · 2024-01-29T19:24:28

Especially since Oxide has a big fancy firmware stack. I would expect this stack to be able to do an excellent job of securely allocating bare-metal (i.e. VMX root on x86 or EL2 if Oxide ever goes ARM) resources.

This would allow workloads on Oxide to run their own VMs, to safely use PCIe devices without dealing with interrupt redirection, etc.

wmf · 2024-01-29T19:27:58

I'm not affiliated with Oxide but I don't think you can put Crucible and VPC/OPTE in firmware. Without a DPU those components have to run in the hypervisor.

amluto · 2024-01-29T20:00:04

Possibly not.

But I do wonder why cloud and cloud-like systems aren’t more aggressive about splitting the infrastructure and tenant portions of each server into different pieces of hardware, e.g. DPU. A DPU could look DPU could look like a PCIe target exposing NVMe and a NIC, for example.

Obviously this would be an even more custom design than Oxide currently has, but Oxide doesn’t seem particularly shy about such things.

throwawaaarrgh · 2024-01-29T19:43:23

A team should always pick the tools they are most familiar with. They will always have better results with that, than trying to use something they understand less. With this in mind, using their own stack is a perfectly adequate choice. Factors outside their team will determine if that works out in the long term.

wmf · 2024-01-29T19:50:31

A handful of the team are more familiar with Illumos and the next hundred people they hire after that will be more familiar with Linux.

steveklabnik · 2024-01-29T19:56:23

To be clear, we at the time had already hired people with deep familiarity with Linux at the time this decision was made. In particular, Laura Abbott, as one example.

It is true that the number of developers that know Linux is larger than the ones that know illumos. But this is also true of the number of developers who know C as the ones who know Rust. Just like some folks need to be onboarded to Rust, some will need to be onboarded to illumos. That is of course part of the tradeoff.

Jtsummers · 2024-01-29T20:07:32

If your hiring decisions are always based on what people are currently familiar with, you'll always be stuck in the past. You may not even be able to use present day tooling and systems because they could be too new to hire people for.

You're much better off hiring people who are capable of learning, and then giving them the opportunities to learn and advance their knowledge and skills.

throwawaaarrgh · 2024-01-29T23:08:58

Everyone is capable of learning. I can hire someone who is capable of learning Japanese. They can then try to teach the rest of the team Japanese. Does that mean it's a good idea to switch all our internal docs to Japanese? Maybe if I was building a startup in Japan. Similarly, writing internal docs in English for a startup in Japan would be of equal difficulty and value. Hooray, we're learning! And struggling more than needed to build a product.

You're better off hiring experienced people who are highly productive. If they're highly productive with one stack, it makes no sense to change their stack so they're no longer productive, or hiring people who aren't familiar with it and waiting for them to become productive.

There's nothing wrong with using old, well established things. They're quite often better than new things. As long as they're still supported, just use whatever builds a working product. It's the end product that matters.

Jtsummers · 2024-01-30T00:08:25

> Everyone is capable of learning. I can hire someone who is capable of learning Japanese. They can then try to teach the rest of the team Japanese. Does that mean it's a good idea to switch all our internal docs to Japanese?

The difference between Japanese and English is much, much bigger than the difference between one Unix OS and one Unix-like OS. This is a remarkably disingenuous argument. If you really don't understand the difference in scope, there's no point in discussing anything with you because you've managed to disprove your opening sentence with yourself as the counterexample.

throwawaaarrgh · 2024-01-30T13:55:22

You're welcome to see it that way if you want. But if you think you can get to know a completely new kernel, OS, etc in a short amount of time, backwards and forwards, you're equally as disingenuous. You can get by editing a few lines in a pinch, but you could equally just learn a few Japanese phrases. Proper understanding requires a deep knowledge that comes from practice and experience with subtle complexity and context.

(Japanese isn't so radically different from English, it mostly just has more words for more contexts. In many ways it's simpler than English. It would be harder to go from Java to Haskell, with their many different language paradigms)

throwawaaarrgh · 2024-01-29T19:56:04

A lot of people out there claim to know Linux, yet few can prove it. OTOH, if they gain a cult following with lots of people using their stack, those people might become more familiar with their stack than most Linux people are with theirs. They could grow a captive base of prospective hires.

That's not the big concern though. The big concern is whether vendor integration and certification becomes a stumbling block. You can hire any monkey to write good-enough code, but that doesn't give you millions in return. Partnerships with vendors and compliance certifications can give you hundreds of millions. The harder that is, the farther the money is. A totally custom, foreign stack can make it harder, or not; it depends how they allocate their human capital and business strategy, whether they can convince vendors to partner, and clients to buy in. Anything very different is a risk that's hard to ignore.

pjmlp · 2024-01-29T21:35:36

As someone that knows UNIX since 1993, starting with Xenix, many that are familiar with Linux, are actually familiar with a specific Linux distribution, as the Linux wars took over UNIX wars.

That being the case, knowing yet another UNIX cousin isn't that big deal.

steveklabnik · 2024-01-29T19:51:10

I do not personally agree with this. I do think that familiarity is a factor to consider, but would not give it this degree of importance.

It also was not discussed as a factor in the RFD.

apache8080 · 2024-01-31T04:53:40

Is there any chance Oxide is going to make more of these RFFs public? I think it would be a useful artifact to see why a company would choose to run a non-Linux OS. I also think there are other Oxide RFDs that would have a similar benefit e.g why Oxide decided to build dropshot instead of using one of the existing rust REST server crates?

steveklabnik · 2024-01-31T18:03:29

Yes, for sure. We agree there's tons of value there, there's just so much to do it's easy to let things fall through the cracks. This thread has given some renewed energy to work on releasing some of them, so no promises but we'll see :)

networked · 2024-02-02T18:41:26

If you publish the RFD, please submit it to HN. As a former (mostly former) FreeBSD user interested in bhyve, I would like to read the case for bhyve on illumos.

binjip · 2024-01-30T06:03:34

It would be great if that RFD will become public someday, if it of course possible, especially if it's a long read.

bcantrill · 2024-01-29T18:53:19

Keep in mind that Helios is really just an implementation detail of the rack; like Hubris[0], it's not something visible to the user or to applications. (The user of the rack provisions VMs.)

As for why an illumos derivative and not something else, we expanded on this a bit in our Q&A when we shipped our first rack[1] -- and we will expand on it again in the (recorded) discussion that we will have later today.[2]

[0] https://hubris.oxide.computer/

[1] https://www.youtube.com/watch?v=5P5Mk_IggE0&t=2556s

[2] https://mastodon.social/@bcantrill/111840269356297809

kaliszad · 2024-01-29T20:00:34

Perhaps you could talk a bit about the distributed storage based on Crucible with ZFS as the backing storage tonight. I would really love to hear some of the details and challenges there.

bcantrill · 2024-01-29T20:06:53

Yes! Crucible[0] is on our list of upcoming episodes. We can touch on it tonight, but it's really deserving of its own deep dive!

[0] https://github.com/oxidecomputer/crucible

panick21_ · 2024-01-29T20:59:11

The timing of your podcast is the least convenient thing ever for us poor Europeans. And then the brutal wait the next day until its uploaded.

The only thing I miss about Twitter Spaces is that you could listen the morning after.

kaliszad · 2024-01-29T22:50:43

Yes (hello from Czechia), however there will always be somebody who this is inconvenient for. Also, I have to confess I was at times immersed in other work that I made a few Oxide and Friends live. I might stay up tonight.

I am looking forward to the crucible episode. It sounds like it could be a startup on its own, it wouldn't be the first distributed file/ storage system company.

StillBored · 2024-01-29T19:12:27

Linux is a nightmare in the embedded/appliance space because one ends up just having platform engineers who spend their day fixing problems with the latest kernels, drivers, core libraries, etc, that the actual application depends on.

Or one goes the route of 99% of the IoT/etc vendors, and never update the base OS and pray that there aren't any active exploits targeting it.

This is why a lot of medium-sized companies cried about Centos, which allowed them to largely stick to a fairly stable platform that was getting security updates without having to actually pay/run a full blown RHEL/etc install. Every ten years or so they had to revisit all the dependencies, but that is a far easier problem than dealing with a year or two update cycle, which is too short when the qualification timeframe for some of these systems is 6+ months long.

So, this is almost exclusively a Linux problem; any of the *BSD/etc. alternatives give you almost all of what Linux provides without this constant breakage.

bcantrill · 2024-01-29T19:21:59

This is a really, really good point -- and is a result of the model of Linux being only a kernel (and not system libraries, commands, etc.). It means that any real use of Linux is not merely signing up for kernel maintenance (which itself can be arduous) but also must make decisions around every other aspect of the system (each with its own communities, release management, etc.). This act is the act of creating a distribution -- and it's a huge burden to take on. Both illumos and the BSD derivatives make this significantly easier by simply including much more of the system within their scope: they are not merely kernels, but also system libraries and commands.

This weighed heavily in our own calculus, so I'm glad you brought it up!

trhway · 2024-01-29T19:35:13

>including much more of the system within their scope: they are not merely kernels, but also system libraries and commands.

giving limited resources of the dev team it may lead to limited support of the system outside of the narrow set of officially supported/certified hardware with that support falling behind on modern hardware, as it happened with Sun, and vendor lock-in as a result into overpriced and low performing hardware.

There is a reason that back then at Solaris dev there was a joke about embedding Linux kernel as a universal driver for Solaris kernel in order to get reasonable support for the hardware around.

cross · 2024-01-29T19:44:33

This is less of an issue for us at Oxide, since we control the hardware (and it is all modern hardware; just a relatively small subset of what exists out there). Part of Sun's issue was that it was tied not just to a software ecosystem, but also to an all-but-proprietary hardware architecture and surrounding platform. Sun eventually tried to move beyond SPARC and SBus/MBus, but they really only succeeded in the latter, not the former.

mardifoufs · 2024-01-29T19:41:28

Well they aren't burdened by having to make their own processors, like Sun had to do, or their own full custom chips in general. They just have to support the selection of hardware they pick, and they have complete oversight of what hardware runs on their racks. So I'm not sure if the sun comparison is relevant here, since they can still pick top of the line hardware. Just not any hardware

trhway · 2024-01-29T22:13:42

Any issues with funding or whatever, and their customers would get locked in on the yesterday's "top of the line hardware" (reminds how Oracle used lawyers to force HP to continue support Itanic). Sun was 50K persons company, and they struggled to support even reasonably wide set of hardware. Vendor lock in is like a Newton law in this industry.

linksnapzz · 2024-01-29T20:45:30

>that support falling behind on modern hardware, as it happened with Sun, and vendor lock-in as a result into overpriced and low performing hardware.

The Oxide hw is using available AMD SKUs for CPU.

pjmlp · 2024-01-29T21:41:07

Interesting that you bring up embedded/appliance space, as I have noticed there are plenty of FOSS alternatives coming up, key features not being Linux based, and not using GPL derived licenses.

FreeRTOS, Nuttx, Zephyr, mbed, Azure RTOS,...

GrumpySloth · 2024-01-29T19:31:26

CentOS wasn’t used in embedded systems.

dralley · 2024-01-29T20:39:01

Sure it was. So is RHEL.

Embedded isn't limited to devices equal or less powerful / expensive than the Raspberry Pi.

sarlalian · 2024-01-29T23:08:05

Arista EOS is definitely CentOS Linux release 7.9.2009 (AltArch) based.

mlindner · 2024-01-30T00:35:06

Even Windows was and is used substantially in embedded systems.

GrumpySloth · 2024-01-30T02:12:13

I know about that. This is a special edition for embedded though. But CentOS is news to me. CentOS was targeted for servers.

skullone · 2024-01-29T18:49:00

It seems healthy to have options, almost like the universe is healing a bit after oracle bought Sun. I can't imagine better hands bringing the oxide system together than that team. As an engineer who works entirely with Linux anymore, I pine for the days of another strong Unix in the mix to run high value workloads on. Comparing openvswitch on Linux, to say, the crossbow SDN facility on Solaris, I'd take crossbow any day. Nothing "wrong" with Linux, but it is sorely lacking in "master plan" levels of cohesion with all the tooling taking their own path, often bringing complexity that requires even for abstraction with yet more complicated tooling on top.

pjmlp · 2024-01-29T18:47:32

Their customers run virtualised OS on top of this.

This is no different from Azure Host OS, Bottlerocket, Flatcar or whatever.

This maters to them, as knowing the whole stack, some of the kernel code is still theirs from Sun days, and making it available matters to the customers that want source code access for security assement reasons.

greggyb · 2024-01-29T19:10:47

If you're running in one of the big 3 cloud providers, the bottom-level hypervisors are not-linux. This is equivalent. Are you anti-AWS or anti-Azure for the same reason?

This is the substrate upon which you will run any virtualized infrastructure.

qmarchi · 2024-01-29T19:13:56

Small note, that's not true for Google Cloud, which runs on top of Linux, though modified.

Disclaimer: Former Googler, Cloud Support

refulgentis · 2024-01-29T19:19:50

Another Xoogler here: any idea what they mean by it's not Linux at the bottom for other providers? Like, surely it's _some_ common OS? Either my binaries wouldn't run or AWS is reimplementing Linux so they can, which seems odd.

Or are they just saying that the VM my binary runs on might be some predictable Linux version, but the underlying thing launching the VM could be anything?

p_l · 2024-01-29T19:27:51

Old AWS used to be Xen, Nitro afaik uses customised VMM and I don't recall if it's not a custom OS or hosted on top of something.

Azure is Hyper-V underneath IIRC, a custom variant at least (remember Windows Server Nano? IIRC it was the closest you could get to running it), with sometimes weird things like network cards running Linux and integrating with Windows' built-in SDN facility.

Rest of the bigger ones is mainly Linux with occasional Xen and such, but sometimes you can encounter non-trivial VMware deployments.

zokier · 2024-01-29T22:18:54

Nitro is supposed to be this super customized version of KVM.

bewaretheirs · 2024-01-29T22:55:10

When your programs are running on a VM, the linux that loads and runs your binaries is not at the bottom; that linux image runs inside a virtual machine which is constructed and supervised by a hypervisor which sits underneath it all. That hypervisor may run on the bare machine (or what passes for a bare machine what with all the sub-ring-zero crud out there), or may run on top of another OS which could be linux or something else. And even if there is linux in the middle and linux at the bottom they could be completely different versions of linux from releases made years apart.

antod · 2024-01-29T22:57:30

> Or are they just saying that the VM my binary runs on might be some predictable Linux version, but the underlying thing launching the VM could be anything?

Yup. eg with Xen the hypervisor wasn't Linux, even if the privileged management VM (dom0) was Linux (or optionally NetBSD in the early days). The very small Xen hypervisor running on the bare metal was not a general purpose OS, and didn't expose any interface itself - it was well hidden and relied on dom0 for administration.

bpye · 2024-01-29T19:34:20

Azure runs a version of Windows, see:

https://techcommunity.microsoft.com/t5/windows-os-platform-b...

qmarchi · 2024-01-29T19:23:44

Correct, that the Hypervisor isn't running Linux.

I think the only provider where that would make sense would be Microsoft, where they have their own OS.

bewaretheirs · 2024-01-29T22:34:28

As I understand it, there's linux running on the Google Cloud hardware but the virtualized networking and storage stacks in Google Cloud are google proprietary and largely bypass linux -- in the case of networking see the "Snap: a Microkernel Approach to Host Networking" paper.

In contrast, it appears that Oxide is committing to open-source the equivalent pieces of their virtualization platform.

tptacek · 2024-01-29T19:35:35

I don't about EC2 but Lambda and Fargate are presumably Firecracker, which is Linux KVM.

zokier · 2024-01-29T22:10:22

AWS "Nitro" hypervisor which powers EC2 is their (very customized) KVM.

https://docs.aws.amazon.com/whitepapers/latest/security-desi...

wmf · 2024-01-29T19:16:10

I suspect a lot of people would (irrationally) freak out if they saw how the public cloud works because it's so different from "best practices". Oxide would probably trigger people less if they never mentioned Illumos but that's not really an option when it's open source.

shusaaafuejdn · 2024-01-29T18:51:04

As far as performance and feature set, probably not anymore (I would have answered differently 10 years ago, and if I am wrong today would love to be educated about it).

However, if we are considering code quality, which I consider important if you are actually going to be maintaining it yourself as oxide will have to do since they need customizations, then most of the proprietary Unix sources are just superior imo. That is, they have better organization, more consistency in standards, etc. The BSDs are slightly better in this regard as well, it really isn't a proprietary vs open source issue, it's more about the insane size of the Linux kernel project making strict standards enforcement difficult if not impossible the further you get from the very core system components.

Irregardless of them being ex-Sun (and I am not ex Sun), if I needed a custom OS for a product I was working on, Linux would be close to the last Unix based OS source tree I would try to do it with, only after all other options failed for whatever reason. And that's not even taking into account the licensing, which is a whole other can of worms.

spamizbad · 2024-01-29T19:04:47

Seems strange to me too but it sounds like the end-users basically never interact with this - it's just firmware humming along in the background. As long as its open-source and reasonably well documented its already lightyears ahead of what else is out there.

NexRebular · 2024-01-29T19:18:34

Not everything needs to be linux. Besides, if monocultures are supposed to be harmful, why is linux being thrown to everything nowadays? Very dangerous to have a single point of failure in (critical) applications.

sarlalian · 2024-01-29T23:33:35

As a customer, I expect most of the technical advantages will be basically being a down stream consumer of ZFS. From a developer / maintainer of an OS, Dtrace and ZFS are large technical wins. Part of the overall value proposition of Oxide, is "correctness". You get an OS/Hardware stack that are designed to work together. You get 20 years of cruft thrown out. You get a lot of tooling, API's, etc written in a performant memory safe language (rust). Also you get a really fantastic podcast about the whole process. And as a customer you get a company that understands their stack from driver to VM and has a ton of internal expertise debugging production problems.

moondev · 2024-01-29T19:52:47

The main drawbacks to me are

1. No support for nested virtualization, so running a vm inside your vm is not available. This prevents use of projects such as kubevirt or firecracker on a Linux guest, and WSL2 on a Windows guest.

2. No GPU support

If the base hypervisor was Linux, it would be way more capable for users it seems. I also wonder if internally Linux is used for development of the platform itself so they can create "virtual" racks to dogfood the product without full blown physical racks.

With all that said, I do not know the roadmap and admittedly there are already quite a few existing platforms built on kvm, so as their hypervisor improves and becomes more capable it could potentially become strategic advantage.

steveklabnik · 2024-01-29T20:06:11

> I also wonder if internally Linux is used for development of the platform itself

Developers at Oxide work on whatever platform they'd like, as long as they can do their work. I will say I am in the minority as a Windows user though, most are on some form of Unix.

> so they can create "virtual" racks to dogfood the product without full blown physical racks.

So one of the reasons why Rust is such an advantage for us is its strong cross-platform support: you can run a simulated version of the control plane on Mac, Linux, and Illumos, without a physical rack. The non-simulated version must run on Helios. [1]

That said we do have a rack in the office (literally named dogfood) that employees can use for various things if they wish.

1: https://github.com/oxidecomputer/omicron?tab=readme-ov-file#...

moondev · 2024-01-29T20:33:10

Interesting thanks for the insight.

> I will say I am in the minority as a Windows user though, most are on some form of Unix.

Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

steveklabnik · 2024-01-29T20:40:19

You're welcome. I will give you one more fun anecdote here: when I came to Oxide, nobody in my corner of the company was using Windows. And hubris and humility almost Just Worked: we had one build system issue that was using strings instead of the path APIs, but as soon as I fixed those, it all worked. bcantrill remarked that if you had gone back in time and told him long ago that some of his code would Just Work on Windows, he would have called you a liar, and it's one of the things that validates our decisions to go with Rust over C as the default language for development inside Oxide.

> Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

That would be pretty funny, ha! IIRC something about simulated omicron doesn't work inside WSL, but since I don't work on it actively, I haven't bothered to try and patch that up. I think I tried one time, I don't remember specifically what the issue was, as I don't generally use WSL for development, so it's a bit foreign to me as well.

panick21_ · 2024-01-29T21:01:43

> that was using strings instead of the path API

Man you can't let Brain live that one down can you?

:)

steveklabnik · 2024-01-29T21:11:25

I didn't bother to git blame the code, I myself do this from time to time :)

yjftsjthsd-h · 2024-01-30T02:40:22

> Now i'm imagining Helios inside WSI - Windows Subsystem for illumos

I mean... WSL2 is just hyperv with some integration glue, and illumos isn't Linux but unix is unix; that might well be doable.

fragmede · 2024-01-29T21:11:09

How is Oxide for GPU-heavy workloads?

steveklabnik · 2024-01-29T21:40:46

There are no GPUs in the rack, so pretty bad, haha.

We certainly understand that there's space in the market for a GPU-focused product, but that's a different one than the one we're starting the company off with. There's additional challenge with how we as a company desire openness, and GPUs are incredibly proprietary. We'll see what the future brings. Luckily for us many people still desire good old classic CPU compute.

NexRebular · 2024-01-29T23:51:35

Would pass-through to VM work?

At $work I'm running SmartOS servers with GPU passing to a ubuntu bhyve for the occasional CUDA compute and it works wonderfully. Wonder if similar could be possible with Helios?

steveklabnik · 2024-01-30T01:00:39

The software interface isn't the problem: the problem is that there are no physical GPUs in the product. There's nothing to pass through.

mardifoufs · 2024-01-29T19:46:18

I think it's a good idea to have more choice, especially in OSS. A Linux mono culture isn't any better than a chromium mono culture. They might be able to do stuff that just isn't practical if they stuck with Linux. They are also probably more familiar with illumos, or at least familiar enough to know that they can use it to do more than with linux

thinkingkong · 2024-01-29T18:57:45

This has been / will be the market education challenge; Its the same one Joyent had with SmartOS. Theyre correctly pointing out that the end user or operator will basically never interact with this layer, but it does cause some knee-jerk reactions. All that said, there are some pretty great technical benefits to using illumos derived systems the least of which is the teams familiarity and ability to do real diagnosis on production issues. I wont put words in anyones mouth but I suspect thats going to be critical for them as they support customer deployments w/o direct physical access.

kardianos · 2024-01-29T18:42:36

In one podcast, the reason given was staff familiarity and owning the full stack, not just the kernel I believe.

quux · 2024-01-29T18:54:01

Aren't they also ex-Joyent? Joyent ran customer VMs in prod on Illumos for many years so there's a lot of experience there.

steveklabnik · 2024-01-29T18:59:11

Many people, including part of the founding team, are ex-Joyent, yes. Some also worked at Sun, on the operating systems that illumos is ultimately derived from.

littlestymaar · 2024-01-29T18:59:28

bcantrill used to work at Sun then became CTO at Joyent, so the reason why Joyent ran Illumos is probably the same reason as why Oxide is, because Cantrill likes it and judges that it's a good fit for what they are doing.

steveklabnik · 2024-01-29T19:01:40

As I elaborated above, bcantrill did not decree that we must use illumos. Technical decisions are not handed down from above at Oxide.

littlestymaar · 2024-01-29T19:11:59

I saw your comment[1] after I wrote mine, but I'm not saying that he's forcing you guys to use it (that would not a good way of being a CTO at a start-up…), but that doesn't prevent him from advocating for solutions he believes in.

Would you say that Oxide would have chosen Illumos if he wasn't part of the company?

[1]: https://news.ycombinator.com/item?id=39180706

steveklabnik · 2024-01-29T19:43:04

> Would you say that Oxide would have chosen Illumos if he wasn't part of the company?

I don't know how to respond to this question, because to me it reads like "if things were completely different, what would they be like?" I have no idea if you could even argue that a company could be the same company with different founders.

What I can say is that this line of questioning still makes me feel like you're implying that this choice was made simply based on preference. It was not. I am employee #17 at Oxide, and the decision still wasn't made by the time I joined. But again, the choice was made based on a number of technical factors. The RFD wasn't even authored by Bryan, but instead by four other folks at Oxide. We all (well, everyone who wanted to, I say "we" because I in fact did) wrote out the pros and cons of both, and we weighed it like we would weigh any technical decision: that is, not as a battle of sports teams, but as a "hey we need to drive some screws: should we use a screwdriver, a hammer, or something else?" sort of nuts-and-bolts engineering decision.

littlestymaar · 2024-01-29T19:57:03

> we weighed it like we would weigh any technical decision: that is, not as a battle of sports teams, but as a "hey we need to drive some screws: should we use a screwdriver, a hammer, or something else?" sort of nuts-and-bolts engineering decision.

I'm not saying otherwise.

In fact, when I wrote my original comment, I actually rewrote it multiple time to be sure it wouldn't suggest I was thinking it was some sort of irrational decision (that's why I added the “it's a good fit for what they are doing”), but given your reaction it looks like I failed. Written language is hard, especially in a foreign language, sorry about that.

steveklabnik · 2024-01-29T20:18:29

It's all good! I re-wrote what I wrote multiple times as well. Communication is hard. I appreciate you taking the effort, sorry to have misunderstood.

Heck, there's a great little mistake of communication in the title: this isn't just "intended" to power the rack, it does power the rack! But they said that because we said that in the README, because that line in the README was written before it ended up happening. Oops!

sunshowers · 2024-01-29T19:34:22

(I work at Oxide.)

Bryan is just one out of several illumos experts here. If none of those were around, sure, maybe we wouldn't have picked illumos -- but then we'd be unrecognizably different.

I came into Oxide with a Linux background and zero knowledge of illumos. Learning about DTrace especially has been great.

bbkane · 2024-01-30T06:12:05

I'd like to learn DTrace (especially after the recent 20yr podcast episode), but I worry it'll never make into mainstream Linux debugging, and hence only useful for more niche jobs.

roland-s · 2024-01-31T08:39:48

Linux now has eBPF, which is essentially a VM running inside the Linux kernel. You can run your own programs on this little VM and extend the kernel to a staggering degree. Some clever folks have used this to build open source tracing tools that rival and even surpass Dtrace in some ways. Brendan Gregg, a Dtrace wizard and old colleague of some of the Oxide team back at Sun and Joyent, has some great resources on the subject

https://www.brendangregg.com/ebpf.html

https://www.brendangregg.com/blog/2018-10-08/dtrace-for-linu...

sunshowers · 2024-01-30T19:15:57

Your concern is completely reasonable -- a thing I'd add though is that both Windows and macOS have DTrace support.

4ad · 2024-02-03T13:56:59

DTrace in macOS was always a third class citizen, it never worked very well, and in fact, on arm64 machines it doesn't work at all (panics the kernel).

sunshowers · 2024-02-04T02:37:06

That's really unfortunate :(

bbkane · 2024-01-30T19:49:55

I was excited, but it looks like both MacOS and Windows require special admin permissions for my laptop that I doubt my work would approve (completely reasonable to require this, it just makes it unusable for me).

4ad · 2024-02-03T13:58:48

It's actually not reasonable to require this if you do it right. On Solaris and illumos DTrace works just fine for a normal user. You simply have access to fewer probes if you are less privileged.

Not the case on FreeBSD/Windows/macOS, where it's all or nothing. Still, on FreeBSD you just need root, on macOS you need to boot into a special kernel mode which doesn't even work on modern Apple Silicon machines.

vvern · 2024-01-29T18:45:34

> yes I know it runs Linux binaries unmodified

Is it that it runs Linux binaries unmodified or that it runs vms and manages VMs which run Linux, and as an end-user, that's what you run your software in?

bcantrill · 2024-01-29T18:55:46

It runs VMs -- so it doesn't just run Linux binaries unmodified, it runs Linux kernels unmodified (and, for that matter, Windows, FreeBSD, OpenBSD, etc.).

tonyarkles · 2024-01-29T18:55:30

As far as I recall it's not a VM. They run in "LX Branded Zones" which does require a Linux userland so that the binaries can find their libraries etc but Zones are more like "better cgroups than cgroups, a decade earlier" than VMs.

bcantrill · 2024-01-29T18:58:09

No, it's a VM, running a bhyve-based hypervisor, Propolis.[0] LX branded zones were/are great -- but for absolute fidelity one really needs VMs.

[0] https://github.com/oxidecomputer/propolis

bpye · 2024-01-29T19:36:03

Do you have a solution for running containers (Kubernetes, etc)? Are you spinning up a Linux VM to run the containers in there, doing VM per container, or something else?

panick21_ · 2024-01-29T21:04:19

Costumers can decide I would assume. Most likely you install you install some Kubernetes and then just have multible VMs distributed across the rack. And then run multible Pods in each node.

VM per container seems like a waist unless you need that extra isolation.

bpye · 2024-01-29T21:17:12

I wondered if there was any support for running containers built in - something like EKS/AKS/GKE/Cloud Run/etc - but looking at the docs it appears not.

I agree that VM per container can be wasteful - though something like Firecracker at least helps with start time.

panick21_ · 2024-01-30T05:37:28

From the podcast it seems that they want to deliver a minimal viable product. Their primary costumers already have a lot of their own higher level stack.

They might get into adding more higher level software eventually depending on what costumers want.

Extigy · 2024-01-29T18:49:36

Perhaps Illumos is particularly well suited for a Hypervisor/Cloud platform due to work upstreamed by Joyent originally for SmartOS?

jeffbee · 2024-01-29T18:56:49

Do you have the same gut reaction to ESXi?

stonogo · 2024-01-29T19:03:59

I sure do. We've finally got to a place where we don't need weird hardware tricks to containerize workloads -- this is why a lot of shops pursue docker-like ops for production. When I buy hardware, long-term maintenance is a factor, and when my whole operations fleet relies on ESX, or in this case a Solaris fork, I'm now beholden to one company for support at that layer. Buying a rack of Supermicro gear and running RHEL or SLES with containerized orchestration on top means I can, in a pinch, hire experts anywhere to work on my systems.

I have no reason to believe Oxide would be anything but responsive and effective in supporting their systems, but introducing bespoke software this deep in the stack severely curtails my options if things get bad.

jeffbee · 2024-01-29T20:03:50

I can somewhat see your point, but in my experience you can't rely on RHEL or whatever vendor Linux to correctly bring up random OEM hardware. You will slowly discover all of the quirks, like it didn't initialize the platform EDAC the way you expected, or it didn't resolve some weird IRQ issue, etc. Nothing about my experience leads me to believe Linux will JFW on a given box, so I don't feel like Linux has an advantage in this regard, or that niche operating systems have a disadvantage. Certainly I feel like a first-party OS from the hardware vendor is going to have a lot of advantages.

apendleton · 2024-01-29T19:29:03

I think the value proposition they're offering is a carefully integrated system where everything has been thoroughly engineered/tested to work with everything else, down to writing custom firmware to guarantee that it's all ship-shape, so that customers don't have to touch any of the innards, and will probably just treat them as a black box. It seems like it's chock-full of stuff that they custom-built and that nobody else would be familiar with, by design. If that's not what you want, this probably isn't the product for you.

criddell · 2024-01-29T20:57:46

I’m unfamiliar with illumos so I went to their webpage and the very first thing it says is:

> illumos is a Unix operating system

Is illumos an actual Unix (like macOS) or a Unix-like OS (like GNU/Linux)?

steveklabnik · 2024-01-29T21:09:38

Actual Unix. Wikipedia is pretty good: https://en.wikipedia.org/wiki/Illumos

> It is based on OpenSolaris, which was based on System V Release 4 (SVR4) and the Berkeley Software Distribution (BSD). Illumos comprises a kernel, device drivers, system libraries, and utility software for system administration. This core is now the base for many different open-sourced Illumos distributions, in a similar way in which the Linux kernel is used in different Linux distributions.

Thoreandan · 2024-01-29T21:35:14

Nobody's paid to have it pass Open Group Unix Branding certification tests

https://www.opengroup.org/openbrand/register/

so it can't use the UNIX™ trade mark.

But it's got the AT&T Unix kernel & userland sources contained in it.

PDP-11 Unix System III: https://www.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/ut...

IllumOS: https://github.com/illumos/illumos-gate/blob/b8169dedfa435c0...

msla · 2024-01-30T03:23:01

Legally, NetBSD isn't actually Unix. The brand doesn't mean what people seem to think it means.

yjftsjthsd-h · 2024-01-30T03:40:36

Right, "unix" roughly means

- Derived from Bell Labs unix source

- Legally allowed to use the UNIX trademark (AKA certified Unix)

- A unix-shaped OS (similar but not 100% the same as POSIX complacence)

and those things are basically independent. Most GNU/Linux are unix-likes but not derived from original unix code or certified, but there's been 1-2 that did get certified. The BSDs are (now quite distantly ) derived from unix source but not certified (although ex. UnixWare is IIRC). Solaris was all 3 but OpenSolaris and now illumos are obviously unix-like and still based on the original code but not certified UNIX™.

(Take all this with a grain of salt; I'm typing this all from memory and IANAL)

GrilledChips · 2024-01-30T05:47:40

This isn't NetBSD. NetBSD broke off loooooooong after the release of BSD that Sun used to build this OS.

BirAdam · 2024-01-30T01:07:05

It was an open source branch of Solaris that Ian Murdock worked on while he was at Sun under the name Project Indiana. It descends from UNIX SVR4.

chucky_z · 2024-01-29T20:59:30

Actual Unix. I believe it is in the Solaris family.

busterarm · 2024-01-29T19:30:56

Not that I'm not rooting for Oxide, but their product is still so niche and early stage that I can't imagine any actual businesses buying their stuff for a long time. They only just shipped their first rack to their first customer at the end of last summer and it's Idaho National Laboratory. State research institutions are basically the only entities positioned to gamble on this right now.

steveklabnik · 2024-01-29T19:48:13

Just a small note, but from when we announced this back in October, two customers were mentioned: https://oxide.computer/blog/oxide-unveils-the-worlds-first-c...

> Oxide customers include the Idaho National Laboratory as well as a global financial services organization. Additional installments at Fortune 1000 enterprises will be completed in the coming months.

lijok · 2024-01-29T19:45:12

This describes every single product in its early days in existence. If you're planning to launch any other way, you've doomed the company before you even launched. Lucky few survive, in spite of, and that's what contributes to the 9/10 startups statistic.

Lazer focus on the first set of customers that will help you cross the chasm. Only then mass market.

newsclues · 2024-01-29T21:25:45

I hope they sooner or later release a smaller, cheaper, homelab product for people to learn or for startups that will lead to future rack sales or workers.

steveklabnik · 2024-01-29T21:38:21

This is a common request and we absolutely understand the desire, but I suspect such a thing, if ever, will be a long time off. Given that the product is designed as an entire rack, doing something like this would effectively be a different product for a different vertical, and we have to focus on our current business. Honestly it's kind of frustrating not being able to reciprocate the enthusiasm back in more than just words, but it is what it is.

amluto · 2024-01-29T23:26:30

For what it’s worth, there’s a somewhat common view at least in the Linux community that it’s important for hardware vendors to make their tech stack targetable from the office or home. This isn’t to be polite or to make money — it’s to foster adoption among developers, which drives sales.

Some examples:

x86 owned the desktop, workstation and laptop world for a long time. So everyone targeted x86, which made x86 the default in the datacenter. It was hard for ARM to break in and it mostly happened when AWS did it by fiat. If ARM had made some loss-leader actually useful laptops and workstations available, it might have happened sooner.

But x86 largely didn’t deploy AVX-512 in client machines, so people who wrote libraries only used it for fun or benchmarking, so it wasn’t widely used, and most users flubbed it anyway. (And might have gotten it right if they had the hardware on their desk.)

People target Nvidia datacenter GPUs. But people have targeted them for a long time, because they have them in their gaming machines too.

Xilinx used to push free academic gear quite hard, because that was a big lead into people learning how to use their gear.

So, if I were giving Oxide straightforward sales advice, absolutely don’t get distracted with small systems. But maybe, if Oxide thought of it as lead generation, Oxide should do it anyway. If I could buy something small enough to be affordable but big enough to be useful [0], I might get one. And I’d target it with my own stuff, and fix bugs, and evangelize it at little cost to Oxide.

[0] For me, maybe 100-150TB of spinning rust (or cheap NVMe or the ability to attach a JBOD), plus anywhere from 4-64 cores, in a format that works on 120V and fits in, say, 16U or less, at a credible price point, would be quite likely to net Oxide a sale. (Just one sale but still!) It could be sold as a developer thing, and there would be absolutely no expectation that it would perform like the real thing. If I found it awesome, I might buy a couple more. But I would also use it and make things work on it and talk about it, and if a whole bunch of people did this, Oxide might get a bunch of real sales.

(Also, I get the idea behind two SKUs, but can buyers at least configure storage and compute separately? Different workloads need radically different ratios.)

steveklabnik · 2024-01-29T23:45:40

I certainly understand that strategy generally, see also Adobe giving Photoshop licenses to students back in the day so that they'd be familiar professionally. It's just that doing so amounts to building an entirely new product, and as a relatively young startup, focus is more important. We're going deep, not wide. Someday :)

> (Also, I get the idea behind two SKUs, but can buyers at least configure storage and compute separately? Different workloads need radically different ratios.)

Right now, this early: no. Sleds have compute and storage located together, so the unit of customization is currently "number of sleds in the rack" which according to https://oxide.computer/product/specifications apparently is currently three, not two, at the moment: 16, 24 or 32 sleds.

You are right that these need to be different for certain customers and workloads, we just aren't ready to support those just yet. We'll get there. Same issue, different aspect.

amluto · 2024-01-30T03:03:37

On reading the specs, you’re using 2.91TiB U.2 drives. On the list of “oh my gosh too much engineering and too many stock keeping units,” allowing them to be swapped for the much larger U.2 devices one can buy now seems fairly easy. In case there’s a potential thermal issue, most NVMe drives I’ve checked have active power states that are a bit slower but have reduced power consumption.

But Oxide is small and shouldn’t listen to me unless a customer asks for this.

mattclarkdotnet · 2024-01-30T04:23:34

See also Cloud Foundry, where the late arrival of something devs could use on a laptop was probably key to its failure to capture the market for PaaS.

newsclues · 2024-01-30T00:32:34

Until then maybe sell a few to schools to provision for student access?

A future of developers and CTOs who grew up with Oxide!

steveklabnik · 2024-01-30T00:56:10

My friends and I had a small server set up in college, and those were some of the best times of those years. :D

mbreese · 2024-01-30T01:52:12

> This isn’t to be polite or to make money — it’s to foster adoption among developers, which drives sales.

I get that in theory… and it makes sense most of the time. I’m not sure it does this time though. You don’t exactly “target” Oxide as an OS or platform. Rather, you use it to run VMs on. Those VMs are whatever you want. Other than that, I’m not sure what else having a home-lab version of Oxide would look like.

A different competitor to Proxmox?

amluto · 2024-01-30T03:05:34

People target AWS and GCP and Azure, and they write actual code that interacts with them, do test deployments there, and do real deployments there.

mbreese · 2024-01-30T13:53:13

Right, but you need to be at a particularly high scale when you need to write code that directly interacts with a cloud provider. I can think of no use case where you are at that level that can also scale down to a home lab setup.

Yes, you can do slimmed down cloud deployments, but you're still not running the (actual) S3 or EC2 backends at home.

At most, you’re talking a K8s based deployment (for a workload that could scale up or down). But that’s also not at the level of working with Oxide directly. And I doubt Oxide wants to get into the business of selling access to their own public cloud.

newsclues · 2024-01-29T22:02:00

I appreciate the response, I totally understand and don’t expect it to materialize soon, but am still hopeful that someday it will be a possibility.

whalesalad · 2024-01-30T01:39:28

can't wait to find liquidated oxide gear on ebay in 2035. all my current homelab gear is "ancient" enterprise gear like R720's etc

faitswulff · 2024-01-30T04:24:17

We'll have to wait for it to hit Groupon.