Hacker News new | past | comments | ask | show | jobs | submit login
IBM introduces z15 mainframe (wikichip.org)
232 points by rbanffy 38 days ago | hide | past | web | favorite | 293 comments



In reading the comments here it is clear that there are some that do not understand the true difference between running a workload on a rack of servers vs a mainframe. In the case of something like a x86 cluster you need something you can chuck. That is not necessarily true in with a mainframe. You run a "job" and it sorts it out for you. You can even walk up to the mainframe and pull out the cpu the job is running on and nothing happens. It keeps working.

Before anyone starts a huge debate on everything I said it wrong, please understand I know it is not exact. This is just a base level set of one of the main structural differences in the HW/OS that everything is derived from.


> You can even walk up to the mainframe and pull out the cpu the job is running on and nothing happens.

More precisely:

You can even walk up to the mainframe and pull out the cpu the job is running on and you still get your paycheck.


Correction - s/chuck/chunk

I mean that in x86 you design so that the workload has parts, app server, db, web server etc. in order to spread the workload around. This is not really required on a mainframe as you can just run it all in the same instance (you do not have to nor do you really want to).

This is a very SIMPLE view. I am trying to point out that the HW/OS on a mainframe is design for a very different thing then how people today think of "cloud" design.


Thanks for clarifying.

>"I am trying to point out that the HW/OS on a mainframe is design for a very different thing then how people today think of "cloud" design."

Might you or anyone else have any resources you could share this design paradigm? Thanks.


Not really, because it's the same paradigm that cluster software is designed for.


One thing you missed is that most people use better software on clusters than what you described... software that runs "jobs" and fixes up failures.


>"In the case of something like a x86 cluster you need something you can chuck. That is not necessarily true in with a mainframe."

What is meant by "chuck" here? To throw? If so what is the thing thats needs to be chucked exactly in the case of x86 vs mainframe? Thanks.


not a typo; when you design for the typical x86 cluster it's a cluster - you expect components to die and you can throw away whole machines (i.e. expect failure). mainframes on the other hand keep chugging along no matter what.

the usual comparison is a cluster of cheap nodes vs a mainframe


Likely a typo. Probably meant chunk as in batch processing.


Maybe not. Mainframes have hot-swappable motherboards.


Can't you do the same with a Kubernetes cluster?


In terms of component failure handling, there's very little difference nowadays between a cluster of x86 servers running well-written software connected with high performance network and cluster management system vs mainframe - why should there be, since there's no "magical" advantage mainframes have.

Almost all value of mainframe lies in software compatibility. There's no theoretical or practical advantage they have over clusters of x86 servers.


Well besides not requiring you to solve for HA, sync between nodes, etc. Also the fact that the IO on mainframe is MASSIVE and they have their own IO controllers to sort for IO scheduling.

I have worked on, designed and built both type of systems. Distributed systems cloud style are hard and require the software to sort for everything. For the most part the HW/OS on the mainframe just handles it. It is just a different way to build.


Earnest question: when is it cheaper for a business to use mainframes rather than x86 clusters? If they are easier to develop on (no need to solve distributed problems) even if mainframes have a 1000x upfront cost, they would still likely be cheaper over time due to smaller development budgets? Where am I misunderstanding?


The OS of the mainframe is software. The software running the distributed system is software. For the most part, the software running the distributed system just handles it. It's slightly different.


Do you think there's any mainframe lessons that don't exist in today's cloud architecture that could actually be applied (obviously software, not hardware)?


there's still quite a bit of different on the interconnect side + replaceability side


FWIW ibm hosts a coding competition called master the mainframe to help students learn mainframe programming on the latest systems. (Note: I'm not affiliated with ibm)

https://www.ibm.com/it-infrastructure/z/education/master-the...


That looks awesome. Passed it along to some folks at NYU. I wish _I_ could join. Surely they don't have so much demand they'd need to restrict this to high school and university students... In general if they're willing to do these programs why not provide training/re-training for adults in need of work?


To take part in the competition parts you need to be a student, but everyone can use the "learning system", and from what I've heard that includes all the same exercises and tasks.


And it's a lot of fun to learn a technology that shares almost no common ancestry with the Unix boxes most of us use today.


Oh good call, thank you. Signed up.


I placed highly in this competition two years running while in college myself. It was highly informative and I would recommend it to anyone else in a similar position.


We had a mainframe class at university from two older IBM employees. Was very interesting.


What does one use a mainframe for these days? I'm assuming projects that require massive parellelism, like research projects or engineering modeling and computation?

For those of us with only exposure to PCs and commodity servers, I'd be interested to know who has encountered one in their day to day, and how that experience differed from the norm.


A mainframe is not a supercomputer. Science is done on clusters of x86 chips and Nvidia gpus with custom interconnect and Linux.

A system that runs a bank or an insurance company and wants good availability can be built in one of two ways: you either spend money on software that deals with the hardware being unreliable and save money on software (pioneered by Google) or you spend money on hardware that promises to be highly reliable and save on software.

No new player believes mainframe (extremely expensive hardware) is cost effective, they all use commodity hardware.

A bank that needs to run binaries from the 70s for which they don't have the source code can keep paying IBM and not investing in reverse engineering the binary and implementing it in Java.

A bank that has a billion lines of code in Cobol can compile it to run on jvm on commodity hardware and run it in parallel with the mainframe for a year to validate and then switch over to the new system and stop overpaying for hardware, but that sounds risky, so they keep paying a million dollars for a system with the same performance as a 50 thousand dollar server.


> you either spend money on software that deals with the hardware being unreliable and save money on software (pioneered by Google) or you spend money on hardware that promises to be highly reliable and save on software.

Seems like the first "save money on software" shouldn't be there.


sounds like it should be "spend money on software" as the software becomes much more complex when hardware being unreliable (and distributed rather than one big box)


Yeah, I think maybe they meant to say "save money on hardware" since they already said "spend money on software" earlier in the sentence.


Should have been "... and save money on hardware (pioneered by Google) ..." but too late to edit now.


Google pioneered something done before Google existed?


Change it to "made famous by Google". Do you have an example of a famous large system built out of many cheap computers (cheaper than normal servers) that predates Google?


> Do you have an example of a famous large system built out of many cheap computers (cheaper than normal servers) that predates Google?

That approach is, in scientific computing, at least as old as Beowulf (1994, four years before Google was founded), the prototypical system from which we get the term “beowulf cluster”.


Oh, as scary as it sounds, it's entirely possible to save current day unicorn runway money on metered processing duty cost software compared with the Oracle installations that you find at this level. Especially when you are pre processing using the accelerator cards that IBM doesn't include in your cup budget. I keep telling myself that my retirement plan is to write the missing "DB2 On Z is Not Any DB2 From Any Brochure You Read", handbook, plus the supplementary, "The Difference Between Larry Ellison And God Is God Runs z/vm syslpex." *0 Not being funny but do you honestly like the fact your hardware can't return correct values for your financial accounts? I am drooling over fogas on xeon mcms for selling metered coprocessor logic .

Edit accidentally included runaway thought process...


You should try SAP


Is it really common for banks to run legacy applications without sources? How would they know or trust the behavior?


I can't speak for banks specifically, but for sure lots of old stodgy Fortune 500 companies have critical software running for which they have lost the source code. Similarly, critical closed source software for which the vendor no longer exists. And things like servers running that they are afraid to shut down...they have no idea if the output is used by anyone. It's a mess out there :)


Can confirm. The last two clients I worked with (both Fortune 500) used mainframes for business critical operations. The most recent one didn't actually have source for some components of their business ops software stack - they've just been working as-is for about 30 years and no one will touch them.

I was told they did an assessment a year or two ago to price out what it would take to move the business completely off mainframe and onto an x86 stack. It was in the hundreds of millions of dollars to do so because so much other software has been built to interact with and rely on the mainframe over the years that switching off it would be a multi-year effort across every department in the company. So of course that ROI calculation was pretty damn easy and the mainframe isn't going anywhere.


> so much other software has been built to interact with and rely on the mainframe over the years

So just... have the new server expose itself over the TN3270/TN5250 protocols, such that these interoperating systems see what they expect to see? (You could even build the new system as a regular REST API or something at its core, and then build the TN3270/TN5250 exposure as a separate gateway service on top, such that it'd be easy to shut it off later if everything finally moves off it one day.)


That's one of many possible integration points. LU6.2, SNA, batch file transfers, remote DB2 connections, MQ queues, etc.


Just?


If you were dependent, to the tune of 100s of millions of $ on certain hardware, wouldn't it make sense to start the process anyway?

You don't necessarily need to move the $new project off the legacy hardware, just write it in such a way that you can do easily later.

I'm eliding many details here, but the principle stands.


Choose one:

- Spend $300,000,000 over 10 years moving to a new system, and hope that by the time you've done that it's not obsolete.

- Spend $1,000,000 a year on a system that still works.

You could run the system for 300 years for what it would cost to replace the system.

In spite of the collective wisdom on HN, there aren't a lot of companies in the world with hundreds of millions of dollars sitting around doing nothing. Even ones that work on mainframes.


I would characterise it as option 3

- Spend $1.5 million maintaining the current system, but as bits get updated keep in mind the system that you'd like to have in 10 15 years time.

You're going to have to replace the system within 300 years anyway, you aren't saving that money, every feature you add that is reliant on the old system is literally technical debt, because eventually you'll have to rewrite it for the new system.

I'm not even saying you need to have a new system in mind, just keep in mind that you will be moving to a new system, so code appropriately.


Since a mainframe can usually be updated to the newest model (and you often run more than one for redundancy, as some people may need more than 5 nines), in 300 years the company will have a z150 or so, with millions of quantum entangled cores made of folded spacetime mesh around a pair of rotating singularities running at a couple terahertz.

And it will still be able to run all your programs that were written and tested since the late 20th century, by the kind of organic entity we used to call "human".


Lighthearted retort: And web browsers will still be slow.

Slightly more serious retort: I'm not aware of any brands from 300 years ago, I'd be surprised if any of IBMs customers survive, let alone enough to keep IBM as a going concern.

Btw when you start folding the space time mesh, terahertz figures just become marketing numbers, what you really want to know is how many parsecs it can do the Kessel run in.


I'm not aware of any brands from 300 years ago

You might be, you just don't realize that they're hundreds of years old.

For example, the insurance company Lloyds of London is fairly well-known around the globe. It's 333 years old.


I suppose brand yes, but it's not really a company. If you want to go down that route The Church of England, and indeed England are 'Brands'


What’s the meaning of those numbers? Where did they come from? How did you lose the source to a 300 million dollar software project?

If anything, this is a sign that mainframe users have way too much profit for what should probably be commodity software.


Not as long as IBM continues to make mainframes and components, which they do. If you were unable to source replacement hardware to keep it running indefinitely, then yes, but until then it's better to keep paying a couple million a year to keep your business running as-is than to spend hundreds of millions to effectively rebuild the entire infrastructure from the ground up in parallel to the running infrastructure.

Remember that these are large, public businesses. Explaining to shareholders that profits are going to take a noticeable hit for years because of IT investment that isn't strictly necessary is effectively a non-starter.


"rebuild the entire infrastructure from the ground up in parallel"

I wasn't thinking that. I was thinking more like when you add a new feature, fit it into an API that is portable. Or add a translation layer so the feature can be written how you would like system to be in the future, but it works on your hardware today.


Everybody attempts this. But the problem is that the ground keeps moving under your feet. Your portable interfaces become irrelevant as software systems evolve. XML gives way to JSON, etc. Eventually it costs just as much in time and effort to interoperate with your 10-20 year old "portable" interface as it would to interface with the mainframe directly, which was already established to have been too costly to bother. So now you have two problems, not just one.


Possibly. First I'd say that xml doesn't become 'wrong' when json came about. If that 1950s data format works for you, and you don't see advantages to moving, then fine stay with it.

2nd I'd say theres a half life of best practise. Over the 10 20 year tome frame, id expect some of what you were doing is going to become outdated yes, on the same way some of your knowledge over your career will become outdated, or the phone I your pocket will, you wouldn't use that as an argument against education, or buying that phone.

Btw, if its as costly to deal with the interface as with the mainframe directly, that's a win. The interface can move off the mainframe to somewhere cheaper/better.


> First I'd say that xml doesn't become 'wrong' when json came about.

It doesn't become wrong, it just evolves into another dead-end. The 3270 is an interface, too, but the reason everybody suggests replacing it or augmenting it is precisely because the spartan tooling and mindshare makes it expensive. As XML recedes into history it is likewise becoming more expensive as an interface.

I chose XML vs JSON because I figured it was a transition everybody was somewhat familiar with. And younger programmers have an almost visceral dislike of XML, which I thought might help get the point across--that an interface someone once thought (and probably still thinks) would help ease future interoperability becomes a reason or excuse for future programmers to avoid that integration.

> Btw, if its as costly to deal with the interface as with the mainframe directly, that's a win.

I think the problem is that you don't really know if it's as costly. The error bars on that sort of risk assessment are huge because our industry sucks at accurately predicting migration costs. And it sucks because complex software systems are intrinsically unique. Commercial solutions that claim to be able to capture and control all those dimensions of complexity tend to be sold by vendors with names like IBM and Oracle. Such vendors also pioneered interfaces like SQL, which is both a soaring achievement in terms of capturing complexity behind a beautiful interface while also falling epically short of what's needed to actually reduce long-term integration costs.


You seem to have made an assumption that the mainframe is “legacy hardware”, on a thread about the latest generation of mainframe hardware... As long as IBM keeps building mainframes the business case for anything else will remain unattractive. Unless you have some information to the contrary, it doesn’t make a lot of sense to invest (financially or technically) in switching.


It isn't about legacy, it's about being stuck on one suppliers hardware. That isn't a great place to be.


This is awesome in a terrifying way. The code itself outlived any employee that knows what it does.


Hi, pedant here. That's the original meaning of the word: to inspire awe, and there is always a bit of terror in awe. A hurricane is awesome, as is a $deity that sends one.

But alas, language has changed and these kids won't get off my lawn.


Speaking of words that changed: sublime used to mean this. Something powerful and dangerous that inspired terror and a sort of admiration was said to be sublime. Here's Edmund Burke writing in 1757:

‘Whatever is fitted in any sort to excite the ideas of pain, and danger, that is to say, whatever is in any sort terrible, or is conversant about terrible objects, or operates in a manner analogous to terror, is a source of the sublime; that is, it is productive of the strongest emotion which the mind is capable of feeling.’


This is the kind of pedantry I come to HN for, thanks.


This is the kind of pedantry that makes me thankful for the [-] box


And this was particularly exciting in the runup to Y2K


Even having access to the source code might not be enough: I’ve seen cases of weirdly forked repos with custom additions and half baked backports that relied on hacked system libraries and weird mix of specific legacy packages in some symbiotic way that just one specific laptop of some long gone developer was able to compile it and virtual machine of this Windows XP was passed around.


As someone who works for a bank... no, not in my experience.

What is a problem is that the last time a lot of code was touched may be 5-10 years ago. That code may have started being written 20-40 years ago. It may well have been maintained by people who think "if it was hard to write, it should be hard to read" or "documentation is for the weak". It definitely will have been written when the cost of a gig of memory and storage was many orders of magnitude higher than today (indeed, last time I priced memory for a mainframe, a Z10, it ran to $10,000 a gig); hence terseness in everything from table and column names through stored data and everything else was prized. Dropping from COBOL into assembler is not uncommon for critical path performance.

Making any changes will be a week of coding and three months of working out the what and why of the code, because the last person who worked on it retired a couple of years ago.


With few exceptions, every non open source software is. Without sources. Not every legacy but critical app was in-house custom software.


A lot of vendor packages included source code licenses actually. I'm not sure distributing binaries was even practical in all cases since the code would have to be built with and linked to specific versions of systems libraries and keeping track of which customer is on which system would be a nightmare. I'm familiar with several loan accounting systems and financial authorization systems that are either nowadays totally maintained by the bank or co-developed with the vendor in a shared source model.


> How would they know or trust the behavior?

It has worked for the past 30 years and hasn't been touched since 20.


Not just banks. Any long standing company has this problem.


I have worked with a dot com that linked to commercial C libraries for which they never had the source. Did finally reimplement but not till 2016 or something.


Because it's been running a certain way for decades?


Yep at a known fixed cost. Corps like knowns and hate unknowns, especially stalwarts like the various big banks. It's kind of like buying stocks rather than shorting stocks. Can you handle the unknown of potentially unlimited losses many times your original "investment" in the venture?


Do you know any big bank not using mainframes for their core operations? 90% of big banks use mainframes, so there must be some big banks that don't use them. Interested to know who they are. Chinese?

It's OK to do most things with chunk of normal servers, but when you need to handle very large number of transactions with fast commits, and stuff like eventual consistency is not allowed, it becomes expensive to handle them no matter what.


Commonwealth Bank of Australia -- they were on mainframes, but spent a few hundred million dollars to get off.

One of the other big Australian banks (Westpac) has an "exit IBM" project as well, but it isn't complete yet.


I googled it. Apparently they moved their home loan systems etc. from "windows-based mainframe" to UNIX. What is windows-based mainframe? (HP Superdome??) They also moved into SAP analytics.

There is no information anywhere saying that they moved their critical banking systems and transaction processing.


Unisys.

Unisys always had a smaller mainframe share than IBM, and they stopped making new chips sometime in the 2000s.

For at least ten years the fastest Unisys mainframes have been very big (proprietary) x86 machines running Windows with a Unisys mainframe emulator.


I worked for Unisys in 2002 or so and I always thought it was funny to see a mainframe able to run Pinball...

For some time I defined a "serious computer" as something that didn't have ports for keyboard and monitor.

Having said that, the pre-x86 A series were pretty cool. And ran the most user hostile OS ever created, to the point its very name was used for Tron's villain.


Going from IBM to SAP seems like no less lock in?


But only on the software side. Dozens of companies will compete to sell you hardware solutions to run your SAP stuff on.


SuSE Linux on Lenovo, Dell, HP, whatever


> Do you know any big bank not using mainframes for their core operations? 90% of big banks use mainframes, so there must be some big banks that don't use them. Interested to know who they are. Chinese?

Monzo? They're getting bigger now, circa 3M customers. More than First Direct, Starling, Metro Bank.


Seems to me like the global scale consistent database systems could really change this calculus (google cloud spanner, faunadb). To me these products represent a significant capability that never existed before — not entirely sure it would ever make sense for a bank to become dependent on hosted transactional storage but maybe a “bring your own cloud” model for one of these systems (I think faunadb offers this — cloud spanner can’t because of the custom network requirements for clock synchronization) ...


National Australia Bank moved their core banking to Oracle software running on commodity x86 servers.


Having worked there only a few years back, I can guarantee there was still a very large amount of mainframe activity - all the change management at the time was still done on an IBM mainframe.

I personally worked on a project there that involved building new virtualization platforms for not only x86/x86_64 Intel, but also AIX (Power) and Solaris (SPARC).


I do, but they still use enterprise grade servers that each cost almost as much as my house.


Re: binaries from the 70s, if the re-engineering costs are the only reason to stay on mainframe (rather than e.g. microcomputers not being up to the task IO-wise), then why not just run the workload in a z/OS emulator on commodity hardware? I mean, the modern z/OS machines are just running 70s-era workloads on an "older z/OS uarch" emulator anyways, so...


Technically that would mostly work but IT leaders at banks and insurance companies would rather pay high premiums to IBM then stick their neck out endorsing anything risky. Renegotiating with IBM is the safest career move at such places. Cloud is steadily edging out mainframes at banks but a lot of old guard IT leads are circling their wagons around old blue because learning how to do high resiliency on cloud is new and scary, but old blue is safe and well understood.


Using safe and well understood tech to handle my money sounds like a good thing.


It's hard to understate the value of safe and well understood for core systems. After all it's not like a massive migration to a new technology stack has ever blown up in anyone's face, right? (</s> if not obvious)


>then why not just run the workload in a z/OS emulator on commodity hardware?

Who are the vendors that provide a migration path from mainframe to a set of commodity hardware running an emulator, and subsequently provide continual maintenance and support?


The big cloud vendors have partners who specialize in such work. Random one from the first page of Google results: https://aws.amazon.com/blogs/apn/migrating-a-mainframe-to-aw...


part of this is the 'partnership' arrangement.

Global 100 company gets IT from other global 100 company

vs

Global 100 company gets IT from small mainframe support shop

there's a question of shareholder liability, being able to adequately sue them for M's of $, expectation they will be around in 20 years, etc.


Total non starter. IBM won’t license the software for use on an emulator.


My guess is a mainframe would be much better at handling a high volume of transactions much better than a $50k server? I would also think it would better handle conditions where "eventually consistent" doesn't work? I'm sure most companies are running batch processes and data-intensive calculations that could be easily shifted to cloud workloads in practice, but it does seem like there are some use cases where a mainframe could be better?


No. If you need reliability and have money, separate commodity servers running a consistent (paxos or raft, not eventually consistent) in-memory database (the memory is a few terabytes) with logs to flash and to disaster recovery site will outperform a mainframe.

A single computer with 4 sockets of xeons will outperform a mainframe but will have more downtime.

The mainframe has best possible single thread performance and as much cache as possible, redundancy and parts can be replaced while it's online, but not that many cores.

The cost is very high - when I looked, 1 million per year is the baby version with only 1 CPU enabled and no license to run the cryptographic accelerator and limits on software, etc.

Commodity hardware you buy and use for 5 years, so the amount of good hardware you can buy for the price of owning a mainframe for 5 years is a lot.

For the money, you can buy a lot more CPU, ram, network, storage, etc and hire Kyle Kingsbury to audit your distributed database.


>The cost is very high - when I looked, 1 million per year is the baby version with only 1 CPU enabled and no license to run the cryptographic accelerator and limits on software, etc.

At one point, IBM was selling base-level mainframes for $75,000 (see https://arstechnica.com/information-technology/2013/07/ibm-u...)

It is true though that a 'realistic' configuration is likely to cost north of $1 million, and that none of these numbers include the price of the software.


A standard IBM "gotcha" is to offer deep discounts and charge the 20% p.a. maint cost on the list price - e.g. "We'll give you this hardware for $1." then a year later "So sorry the maint is 3 x $250k x 0.2 = $150k p.a.".

Another is that the software and hardware designed to turn a baby system into anything useful is astonishingly expensive to people not used to dealing with this end of the market. Enjoy finding that your hypervisor is licensed at 5 figures per core, and that your cores are six figures a pop.


It’s the support contract that’s the real cost.


You have just described VoltDB. They worked with Jepsen years ago and fixed all the problems Kyle found.


Interesting. Thanks for the details. I didn’t realize how few cores mainframes had.


And it is especially quite a bit better in handling a high volume of transactions while physically replacing one of the CPUs and half of the RAM in the system than many $50k servers.


You are paying about 100x on hardware because your software is limited to a single address space.

Even if the mainframe never goes down, the entire site will go down (eventually the rack power supply, HVAC, fiber, natural disaster, backhoe, etc. will get you even if your CPUs and RAM are redundant and replaced before they fail), and then either your entire business stops or at least processing for that region stops, or your system is resilient to site failure because you built a distributed system anyway.

If you could rewrite your software to be distributed and handle a node/site going down, you could run a single site on 5 servers that together outperform the mainframe (by a lot) and can be serviced on a whole server basis (though of course, expensive x86 servers also have reliability features), or use really cheap hardware without even redundant power supplies, but have enough of them to not care.

The modern solutions are better than the mainframe, and the only reason to use them is management risk aversion and unwillingness to learn new things.


A single mainframe can easily be located in multiple datacenters. The Hungarian state owned electricity distributor company has one, one half is at Budapest the other half -- if memory serves -- is at Miskolc, a bit more than a hundred miles away.


Is that a single address space system or merely two systems with db2 databases and disk volumes on a SAN in replication? I think it's the latter.

100 miles will add 5ms (round trip) to your disk flush on commit. So a system like this has the sequential and random IO latencies of a RAID of SSDs but the flush (database commit) times of a 15K RPM spinning rust disk. People lived with mechanical disks, it's ok.

Sync disk replication (in one direction) over a fiber line is not an exclusive feature. Having both sides be active, instead of active and hot standby requires some smarts from the software, but modern distributed databases do that, and if you're careful you can get far with batch sync jobs.


It might be just storage as it was brought up as an example of how subsystems of mainframes are essentially their own world and a single disk or distributed volumes over a long distance are presented the same to the rest of the system. In the Linux world DRBD does something similar (just much simpler). The point, however, is that the software knows nothing about this being distributed.


Yes if you have a huge volume of transactions and a huge amount of constraints that can not be partitioned. But the real world constraints can almost always be partitioned.

For read-only batch computations you can always add some extra redundancy and partition the problem. So, I don't think it is likely that a mainframe would be useful here.


The newest trend in banks is to migrate to AWS. See recent Capital One hack. They view it as a combination of commodity hardware with cheap software.


Yes and no. I work for a bulge bracket bank and we're moving a significant number of applications to the cloud, including AWS, GCP, and our internal cloud offering.

But we have literally thousands of internally developed applications. We can move thousands of apps the cloud and still have a need to keep thousands on virtual/physical machines. My own apps are stuck on commodity physical hardware for at least the next few years.

The type of applications that can historically been run on mainframes is not really moving to AWS/cloud. Most of what's going to the cloud is what I would consider to be "supporting" applications, not core applications.

My own experience, that of others may differ.


What you're saying is we've been 737-maxing it for the last few decades?


Banks work because processes performed by humans that were developed before digital computers were invented, not because technology. They have logs of all transactions and are willing to reconcile them manually and are insured/lawyered against loss when they can't reverse an erroneous transfer. Money can take two business days to move instead of milliseconds, and their customers are mostly fine with this.

Programmers are taught that database transactions exist so that when you move money from one account to another and crash in the middle, no money is ever created or destroyed. Well, cat picture websites might do that, but banks don't. They reconcile logs at end of day.


I was commenting on this:

> you either spend money on software that deals with the hardware being unreliable...or you spend money on hardware that promises to be highly reliable and save on software.


Then I don't understand. Which strategy is like the 737-max ? Banks generally don't build systems that crash and lose your money.


Mainframe hardware isn't reliable though. It maybe more reliable, but no hardware is reliable. What you call reliable is implemented by an architecture that lets the system route around hardware damage. The difference is that a mainframe is a mostly closed box where you preselect what hardware components are wired together, while a cloud computing type system is an open system where you can dynamically rearchitect it.


I had access to AS/400, nowadays known as IBM i, back in the early 90's.

Had to perform regular backups on the system as part of my internship.

Nowadays kids are all up with WebAssembly and WASI, well that is just how OS/400 works during the last 30+ years.

Originally designed in a mix of PL/S and Assembly, everything else (RPG, Cobol, PL/I, C, C++) compiled into ILE (Intermediate Language Environment) and cross language calls are relatively easy to do.

ILE applications are AOT compiled either at installation time, or anytime some critical hardware has changed or the application themselves have been updated.

Nowadays there is also Metal C (real native not ILE), Java via IBM's own JVM (which in early versions converted JVM bytecodes into ILE ones), and the C and C++ compilers are also to target actual native code besides ILE.

The database backed filesystem took some time getting used to, coming from Amiga/MS-DOS/Windows 3.x/Xenix experience, and the command line felt more cryptic those OSes, given the use of special characters as part of the name.

The company I did my internship was using them for their accounting, everything else were MS-DOS/Windows computers connected via Novel Netware, zero UNIX flavour in sight.

IBM z and Unisys ClearPath are two other mainframe models that also follow similar bytecode based deployment formats.

So in a sense you can say that on every Android phone, watchOS or Windows PC lives a little mainframe.


AS/400 isn't a mainframe, it's IBM's minicomputer arch.


Naming details,which I always get wrong, because nowadays PC servers have taken that role.


It's just weird to bring up AS/400 in a conversation about mainframes.


It is as far away from a x86 server than a s/390 was from an AS/400. It was a unique architecture that came from, IIRC, the System/38. Also, its native encoding is EBCDIC. It shares a couple concepts and names, such as LPARs.


It has stuff like LPARs because it shares a processor arch with IBM's COTS servers running POWER (yes it has an extension to have tagged pointers, but they demoed it originally on unmodified POWER).

Ie. it shares way more with regular server boxes than mainframes.


High-IO jobs, where a single system-image (read "os") is important. Most of banking and alike problems.

The main three benefits of the mainframes (as a non-mainframer):

- crazy amounts of caches

- crazy amounts of pcie-slots (and sufficient internal io and processing power to make it balanced)

- production environment mentality for everyone involved, incl. extremely engineered (and redundant) hardware

I think new customers are likely to run Linux on such a box in the future... The main cost-problem for mainframe customers is software cost (esp. z/OS, cobol etc..).

I have heard stories (from actual mainframers) about the sql-performance and key-value performance (read like mongodb) on these boxes that are eye-watering..


Yeah people don’t realize the IO bandwidth (off the central CPUs mostly) is the differentiator. And writing SQL for DB2 is easier than writing efficient spark jobs (with minimal shuffles across those cheap small machines).


People often forget how much computer science and engineering is in those machines that allow them to fully saturate two and a half full height racks of high performance IO while running COBOL programs that are blissfully unaware of all that goes behind their ABI and just see a ridiculously fast computer.


How is that level of IO bandwidth achieved exactly? Or how is it different from the x86 architecture?


There is a ton of technical info out there, but in a nutshell, the hardware is extremely optimized for I/O bandwidth, cache's are able to be accessed by more cores at a time (I think the L3 cache is 32-way accessible for example). The goal is to have nearly everything in at least RAM (up to 40TB of it) or better yet in a cache closer to a CPU so there is no latency.

And when it comes to reaching things outside of the mainframe itself, generally anything you can do to make accessing the data wider is better... Why use a single 16GB/sec HBA when you can interleave 8 or more FCAL paths to the same disk. Why use FCAL when you can use Infniband, etc etc.. They have a LOT of PCIe lanes available, and enough chips to keep most of them busy most of the time.

TL;DR if it's expensive and offers more paths between things, it's probably an option for a mainframe.


> I have heard stories (from actual mainframers) about the sql-performance and key-value performance (read like mongodb) on these boxes that are eye-watering

Does that mean great or awful?


I assume he means that DB perf on mainframes are insane. That would make sense, given their low-latency architecture and I/O throughput capabilities.

I would love to see an article that breaks down clustering v. supercomputers v. mainframes in tangible use-cases. It is all a bit opaque to me where one officially starts and the other one(s) begin. As well as where (and to what degree) the value prop is of one versus the others.

I've heard folks say mainframes are just legacy stuff carried forward - I've heard others say they have specific use-cases that are not well solved by other current technologies (usually a cost of refactoring). That said, are there any current use-cases, where if I were to write code from ground zero, that are best solved on mainframe, hands down? That is my root question.


Great... order of magnitude-ish higher than a really high end x86 box with pure flash storage...


It was so beautiful it made him cry. although eye-watering usually means something negative or stinky (like onions or a a particularly putrid fart)


I work for a bank. We use mainframes, but not for building new systems - just for keeping the legacy core running. “Just”. Eventually that core will move to other platforms as product cycles come and go, but there’s no active migrate-off-mainframe project. It will probably be 20 years before we’ve fully replaced them.


Common in large transactional systems - stock exchanges & banks, airlines, shipping/logistics and academic/research - requiring tens of terabytes of data in-memory accessible from all cores without any networking latency or bottlenecks.

There are a lot of systems that are simpler (cheaper and less risky) to scale up than re-engineer to be distributed.


Yeah, and in my experience especially for anything that is extremely time-critical + has the potential to generate huge in/direct losses.

For example at the end of each day a bank has to generate the final balances of the accounts that it has with other banks and then it has to check for irregularities and then finally set those balances just right (e.g. enough to cover payments made by its customers but without leaving there millions) => if it doesn't then it might not be able to use the money that is "parked" there by mistake, or it might have a huge position with a risky bank, or if the balance is too low other banks might decide not to perform the final credit into the target accounts involved in the customers' payments (which would then generate a lot of complaints + lower the bank's reputation with its long-term consequences), etc... . (the opposite happens as well - https://www.investopedia.com/ask/answers/051815/what-differe... )

Nowadays even accounting (coupled with risk management) has become time-critical because of the many regulations - a 1-day delay filing the numbers with regulators and/or central banks can result in huge fines and loss of reputation (if the news about the problem becomes public it could happen that the news services try to highlight it increasing the focus on that negative publicity). In our case the detailed/low-level data (e.g. client X bought N shares of some company, client Y retrieved N$ from the ATM, etc) is all processed by the mainframe, which works fine 99.99999% of the time => then anything that comes afterwards which involves analysis/aggregation/reporting/etc of that data happens in other non-mainframe apps (they're all very different - from tiny to huge apps with distributed databases) and in general any such app has usually some kind of problem that results in a delay at least once every quarter due to the SW itself or even the HW => it's usually not a big problem (usually some hours are lost but there is a buffer) but if the central SW (running on the mainframe) would have the same problem then there would be a chain-reaction on all the other apps that need that data (and then their specific problems would be on top of that, and then there would be a hotspot of needed 100% CPU/RAM/network resources as they would all run at the same time as fast as possible which could cause further delays, and so on).


I'd guess they are only used to run legacy mainframe software. The accounting systems (loan accounting, deposits, etc) of large banks for example are all based on very old COBOL code bases and have lots of assumptions about their OS platform built into them. These code bases are not only very stable and well-tested but they are deeply integrated into hundreds of other systems in the bank. Porting them from Z/OS or replacing them is probably out of the question, but there more are modern tools for integrating them into a services-oriented architecture for example.


Surprisingly there are reasons why running financial type systems in COBOL makes a lot of sense. Evidently COBOL really excels at foxed point decimal operations more so than other “modern” languages like Python and Java. Porting then off the mainframe isn’t a trivial exercise it seems.

https://medium.com/@bellmar/is-cobol-holding-you-hostage-wit...


Java has had a fixed point type - BigDecimal - for a couple of decades now. Yes, there are people who don’t know what they are doing using floats for currency.


It’s far more than just having a single data type.

COBOL was in effect designed for business use so fixed point arithmetic has a lot of support in the language, libraries, and tooling.


One useful way of thinking of COBOL is as a DSL targeted at business operations.

Once you do that, it becomes clear Java is not exactly a competitor.


>Java has had a fixed point type - BigDecimal - for a couple of decades now.

Which is neither here nor there. COBOL has end to end support for financial/commercial calculations, and a whole lot more.


If you clicked the link from the person I was responding to, and read that article, you would know why my comment is relevant.


I've read that article a couple of weeks before. It still doesn't posit a simplistic "Java has bigdecimal, just ditch COBOL" as the solution. From the article:

"When you are a major financial institution processing millions of transactions per second requiring decimal precision, it could actually be cheaper to train engineers in COBOL than pay extra in resources and performance to migrate to a more popular language. After all, popularity shifts over time."


The entire premise of that article was based on the example of "engineers" trying to port COBOL code to Java and using floats for maths of financial transactions. Yes, they can't say "just use BigDecimal" and still have enough content to be ranked well by Google, but that in fact is the solution to that particular problem.


That is like to say in Java you can have nullable types.

But because is not in-built any advantage of use it get lost in the noise.


This is true. The mainframe has hardware support for fixed point instructions (I don't know the details). IIRC, the POWER line (p and i) also have hardware support.


> COBOL really excels at foxed point decimal operations

That's one of the best typos I've ever seen.


I know most airlines run on big iron mainframes. Mainframes are undoubtedly kings for things like high throughput highly atomicity databases.

(Although I’m sure the precise problem set could be replicated with commodity hardware too).


"Although I’m sure the precise problem set could be replicated with commodity hardware too"

Only pretty recently. You would be surprised at the number of crazy smart people that failed to make a commercially viable TPF replacement.

Including ITA, whom Google bought for ~$700M. Lots of top talent, lots of funding. They did build a reservation system, but nobody of note would use it.

Amadeus only got rid of their TPF mainframes in the last year or so. As far as I know, Sabre hasn't finished.

I don't know what progress VISA has made (another heavy TPF user). They are still hiring TPF programmers: https://usa.visa.com/careers/job-details.jobid.7439996782301...


Even if these IBM mainframes are $10 million each, its such a rounding error in terms of cost that I can’t think of a good reason to migrate off of them. What does commodity hardware add other than complexity?


Mostly a platform you can find programmers for. I believe that's really the main driver. If you have business logic in TPF, you need TPF literate 360 Assembler developers. Technically, you can program in C, but legacy integration and performance issues make that of limited use.

People moving away usually tried to extract the business logic such that the TPF layer was mostly a distributed NoSQL store. That bought a lot of time, but it seems like the skill shortage is now hitting that layer.


The migration itself may be valuable, if treated properly. Many airlines have had very public issues when their mainframe had problems running (often power issues, but sometimes other things); a migration to x86 includes accepting and handling more failures, which may include having a warm second site available.

You can certainly have a warm second mainframe too, and I think most large banks are run that way, but it wouldn't be very fashionable to add that to your airline now, if it wasn't already there.


TPF is inherently hot/hot, distributed across many mainframes. It's very different from z/OS.


I would have asked the opposite question: what does mainframe add other than complexity?

It's a lot easier to find good engineers to write and maintain software for Linux on x86-64 than it is to find mainframe engineers, which makes things simpler, arguably.


A mainframe adds reliability and backward compatibility, and the knowledge that you don't have to rewrite your systems from the ground up.

How long is Linux on x86-64 going to stay a thing? What are you going to do about staging Linux updates on business-critical facilities?

Perhaps not so simple after all.

Which suggests an interesting contrarian career path - train on COBOL + mainframe you'll never lack well-paid work. You'll also avoid the usual fad tracking.


Your idea is logical, but it's not exactly that way in practice. Many (most?) of the mainframe jobs are not that well-paid (for software industry). The other thing is that it nearly limits you to working in banking, airlines, government, insurance.


> A mainframe adds reliability and backward compatibility

Citation needed, especially on that implication that x86-64 lacks reliability.

I’ve run reliable services on commodity hardware that literally processed over 5 billion transactions per day.

> How long is Linux on x86-64 going to stay a thing?

If something better comes along, why wouldn’t you want to move to it? But “better” in this context would imply popular support and plentiful access to developers, so you would have years of warning. It wouldn’t come as a surprise. As of now, it has been going for at least a decade, and shows no signs of stopping.

> What are you going to do about staging Linux updates on business-critical facilities?

This is not some huge, unsolved problem. It has been solved many times, in my opinion, so just learn how others do it and follow suit. I’m not here to teach sysadmin “tips and tricks.”

I would be completely clueless on how to stage updates for a mainframe. Turn it off and back on? And I’m betting all of the good training material there is locked behind expensive paywalls.

> Perhaps not so simple after all.

Disagree, especially since you can hire people to solve these problems for a lot less money than you can get just the hardware for a mainframe, let alone hire the extremely rare (read: expensive) personnel needed to maintain and develop applications for mainframe.

If IBM would focus on bringing the entry-level cost of mainframe down, they would probably be able to get more adoption, and more people would be able and motivated to learn their systems.


"Citation needed, especially on that implication that x86-64 lacks reliability."

Basically, the hardware and software on a mainframe does it for you. Versus having to implement redundancy and resilience into your app. You can pull a CPU on a running mainframe, and it keeps chugging. Batch failures, app failures, etc, have a very well defined ecosystem for recovery that's consistent across apps.

As you imply, though, provided you pick the right software, there's not much difference in reliability these days. The variety of software choices is what kills reliability on x86/Linux. Too little established experience because there's too many choices.


I think it’s become an incredibly small and important niche, and that’s ok. It’s not broken, doesn’t need to be fixed, and compared to other costs spending $100mm/yr on COBOL programmers and mainframes isn’t going to destroy their business. It’s highly specialized stuff. It has a proven track record of working very well. Switching off of it will cause a lot more headaches than it will fix.


Virtually every sentence of your comment demonstrates snobby ignorance. You treat your erroneous assumptions as fact and build a house of cards on top.


> what does mainframe add other than complexity?

Compared to what? A cluster of x86 computers where all the security, cryptography, observability, reliability, availability has to be written by the owner on top of something like Kubernetes? And that can achieve the kind of throughput a mainframe has?

I'd say a mainframe is much simpler than that. It's already done. All you need to do is to sign the check and read the (hundreds of) manuals.


It adds one throat to choke. You won't have separate vendors for the OS and the hardware to point fingers at each other. In the case of the mainframe, there's usually a whole raft of other software that's all from IBM to add to the one throat to choke: TPF, z/OS, z/VM, CICS, DB2, etc.


> mainframe add

if it's already in place, it's not 'adding' anything, it simply 'is'.


According to Wikipedia it seems like a lot of people use or used ITA: “This system has been and is used by travel companies such as Bing Travel, CheapTickets, Kayak.com, and Orbitz, and by airlines such as Alitalia, American, ANA, Cape Air, Delta Air Lines, United Airlines, US Airways, and Virgin Atlantic.”


That's the shopping engine, which was successful, but only a small subset of what a TPF reservation system does...albeit one of the more complex pieces.

ITA did build the full thing, but only Cape Air used it. The "full thing" being shopping, schedules, inventory, booking, cancel, change, check-in, etc.


Oh interesting so ITA is really just a retail front-end for TPF then? I guess I was misinformed about it as well.


The customers shop using ITA. The ITA shopping engine gets schedules, inventory and updates inventory from/to the backend (often TPF).

The shopping engine itself isn't trivial. ITA has the best shopping engine available.


I was under the impression that ITA used exclusively Common Lisp to inplement their software. I could be wrong, though.


Other parts were written other languages. ITA also used languages like C++ (reported to be roughly 50% of the line count for QPX) and Java (for RES).


Is TPF running on Linux on commodity X86 now then? Is that also the eventual plan for Sabre?


They slowly moved the business logic out of TPF ASM to x86 app servers, which left the TPF machines as mostly distributed NoSQL servers. Then they replaced that layer with Couchbase or similar. Took 15 years or so.


TPF itself, and the programs written in it are mainly in s/360 asm, so I doubt very much it's running on x86 now. It's so s/360 it doesn't have traditional stacks.


Is there a way to reconcile these two above responses? They seem to be very different answers. Are these both options that a customer can choose from?


Yeah, it sounds like they rewrote it piece by piece to run on different architectures.

Really similar to having a big monolith, and taking pieces out chunk by chunk rewritten to run as microservices until you're left with a new codebase with no sign of the original monolith or any of it's code.


Vertical scaling. It's common with startups to move to a sharded cloud configuration. In cases like a transaction that can cross any imaginable shard boundary, scaling vertically makes more sense. Mainframes allow much greater vertical scaling as mentioned in other comments regarding I/O, caching, etc. Also being on-premise, fewer and more reliable machines require fewer people to maintain it.


In the CAP theorem, mainframe is the king of CA. You use them mostly for DB2 databases and Batch processing on that data to remain highly consistent and available.


This is exactly the secret behind mainframe's ubiquitous persistence across mission-critical businesses. Consistency and availability are far more important than partition tolerance. You build your mainframe with >N+2 redundancy on any given component and then you build the datacenter around the mainframe. This is how you beat the CAP theorem. With brute-force engineering. Sure, you aren't truly defeating it in principle, but in practice it still seems to be the best option we've got.

Batch is the other aspect that takes mainframes to unbeatable status. When you run a batch job, you basically say "ok all daily OLTP is halted, we are going into a completely new architectural mode". Obtaining an exclusive lock on a table or an entire database for a single series of processes to operate on can yield insane amounts of throughput when you are finalizing the data from the day's operations.

A few years ago as a more junior developer, I will admit to being strongly "anti-mainframe" based on the principles I held at that time. Why can't we just put it all in the cloud, throw some MongoDB out there and pray it all works? After witnessing actual business cases for mainframe/batch unfold, I quickly started to change my tune. Mainframe is not for every business, but it does seem to be a tool you can reach for when someone says "everything just has to work always or someone dies", and "we have infinite money".


Is this built into the hardware then? Might you or someone else elaborate on how mainframe architecture provides for this level of CA?


You use it in an environment where you need an extreme level of data integrity and availability. Think banks, airports, government, etc.

Due to io offloading to co-processors and a whole range of supporting cpu types you can set up a system which can handle thousands of transaction per second. Most of the application that are used on mainframe's are databases or message que's

Recently there is an increased interest in running Linux on mainframe's something which can be done since early 2000 you get the benefits of high available and secure hardware and the relative ease of Management of Linux. Another benefit is that you don't really need to train personnel in more exotic operating systems like z/OS.


Nope, computational science and engineering tends to use commodity server hardware (Xeon or similar) with or without GPUs. There may be many nodes or even many racks, but it's quite different from Z-series, which are a very poor pricepoint.


For processing large amounts of structured data using decimal arithmetic to do exact monetary calculations.


I had the same question recently and although high availability is certainly one reason an equally and perhaps more important reason is "cost per transaction."

Think of the number of transactions entities such banks and airlines perform a day. This article talks about this in the section "what is a mainframe today":

https://www.nanalyze.com/2017/10/ibm-mainframe-computers-tod...


There used to exist many type of high end servers like solaris and mainframes. But consumer grade hardware became so reliable that it became hard to justify the price of the high end serves. Some features was hot pluggable components like hard drives which has now also got into consumer grade desktop computers. Today when you have cloud database providers it makes even less sense to run your own high end server.


It must require massive parallelism and a huge amount of IPC, otherwise clusters will beat your mainframe with a much smaller budget. I don't know many of those problem (in fact, I can't think of any). Programmers are usually very eager in optimizing IPC out of any parallel algorithm.

I have seen a handful of mainframes on use, but I have never seen one used for something that it is good for.


The death of the UNIX RISC vendors and also minicomputers creates an interesting situation where the server "middle class" is gone. We now have commodity servers where you hardly care about the vendor on one end, and hyper engineered mainframes on the other. Not much between.


That's because, ultimately, mainframes and embedded computers are the only natural computer platforms. I expounded on this at length (yes, this is a self-plug).

http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.h...

Long story, short: We will really have to watch out for our privacy because those two remaining platforms give The Management(tm) an irresistible temptation to stomp on us.


Interesting essay. Personally, I think AWS might be the new mainframe. You're guided to a specific set of proprietary services to build solutions, not terribly unlike the heyday of CICS, VSAM, JCL, MQ, RACF, etc. The playbook of Lambda, Fargate, AWS Batch, DynamoDB, IAM/Cognito, SQS, CloudFormation, etc, looks pretty similar.


AWS is absolutely the new mainframe. The big difference is that when you rent from Amazon instead of IBM they provide the building the mainframe lives in.


In the 1970s there were companies called “mainframe bureaus” that did exactly that


And that's mostly because our 'dial-up lines' are better than they used to be. IBM would have been happy to rent you the building as well.


Like they say, "plus ça change, plus c'est la même chose." And was it Santayana who said, "those who do not remember the past are condemned to recreate it in Javascript?"

Evidence of the demise of desktop computing abounds, but I do hope you're merely being alarmist, not prescient.

Great essay. Please write more.


That's an interesting essay.

It seems to me if desktop general purpose computing becomes a distinctly minority need, then the future of hardware design will bend towards that article's view. Large scale design and manufacturer of hardware platforms will be (mostly) exclusively towards central servers and to specialized devices on the edge. I expect there's an awful lot of legacy desktop design that will disappear.


Fyi your very first link out to the "Wheel of Reincarnation" is dead. Maybe it too needs to be reincarnated?


Interesting take, though I don't fully agree with your conclusions. I take some heart that in the time since you wrote the piece, both normal people and non-geek politicians seem to have noticed that all is not well with the way things were going.

We're starting to see borderline-draconian privacy and data protection laws such as the GDPR in Europe, and while I question the implementation of that law and how effective it will be, the very fact that a group of politicians covering all of Europe managed to identify a risk and make actual law to try to deal with it is noteworthy in itself.

I suspect what will really make a difference in the near future though is that now social media and "fake news" and other consequences of the centralisation and commoditisation of online communications are messing with elections and our democratic systems. Politicians, even those who might otherwise give big businesses a pass on questionable ethical behaviour, do care about the systems that get them into positions of power, and they care very much about attempts to compromise those systems in ways that might remove them from power. If there's one thing in life that I have found quite reliable under all circumstances, it is the ability of a class of people in power to recognise threats and take steps to protect itself.

I also take some comfort in a few conversations I've had with non-techie friends in recent years, particularly those who are of the younger, digital native generations. It's become very clear to me that while my slightly older generation have sometimes been quite naive about the implications of new technologies, those behind us are much less so. Things like basic steps to stay safer online are taught in schools now. Social media accounts are ephemeral and kids switch to different networks in a way that would make it very difficult for the likes of Facebook to reach critical mass as it has with the older generations. The constant updates and sometimes breakages of software or access to multimedia content are getting old. And again, perhaps most heartening of all, while the younger generations consider these technologies an integral part of their lives and accept to some extent that there are compromises made in order to use them, that doesn't mean they like those compromises, and they will switch away if better options become available.

All of this is probably bad news for the long term prospects of businesses like Facebook and Google (and all the other big data hoarders we don't see because they run their marketplaces discreetly behind the scenes instead of with big public websites on the front). But it's probably good news for those of us hoping the centralised/distributed pendulum for computing is starting to swing back towards the distributed side again, in part driven by privacy, reliability and longevity concerns. The biggest weakness I saw in the article was quite a big jump to the conclusion that embedded and mainframe are the only two natural kinds of computing. I don't see why personal devices -- or rather, running substantial software and doing substantial data processing locally on those devices instead of just using them as essentially thin clients -- shouldn't be on that list as well. The form factors might change, but I don't see it as inevitable that personal computing will revert to being primarily a hobbyist's endeavour. There are some very good reasons it should not.


Additionally the mainstream adoption of high level languages, made the OS irrelevant outside some niche use cases like microcontrollers, Fintech and AAA games, for example.

If the language runtime supports the platform (even bare metal), the AOT/JIT does a good enough job even exploring some vectorization and the large majority of the standard library is available, then it is just an instance of cattle OS.

Which by way, has been the way that those hyper engineered mainframes from IBM (and Unisys) are designed, with their "language environments".


>Additionally the mainstream adoption of high level languages, made the OS irrelevant outside some niche use cases like microcontrollers, Fintech and AAA games, for example.

Has this really been the case though? Apps are still very much OS specific. Even Web Apps in Backend are also OS specific ( Linux Specfic ). I just don't see any "mainstream" adoption of high level languages that does this.


Java. We develop on Windows and deploy on Linux, BSD, OS X, mainframes, Amazon and Azure Java runtimes.


Even java has to deal with platform quirks. Serving files from NFS (ESTALE), hardlinks, unlinking files that are still mmaped and stuff like that.

With a large enough codebase you'll accumulate some unix-specific logic where it'll be less painful to just develop under linux rather than trying to keep it running on windows.

Of course those pain points are minor and you could write some fallback code for windows, but that code would only ever be exercised on developer machines.


Application servers take care of that.

All data should come from databases, which also don't matter on which OSes they're running on.


> Even Web Apps in Backend are also OS specific ( Linux Specfic ).

The web app may be OS specific, but does the OS really matter? Any unixy thing would probably work fine with at most mild effort for almost all backend apps.


FPGA can fill that middle path. sort of.

Microsoft (Azure) is investing heavily in FPGA based configurable cloud

https://www.youtube.com/watch?v=v_4Ap1bjwgs


Not sure why you were downvoted here. Specifically engineered systems with things like FPGA, GPU, TPU, Infiniband, Optane, "minion cores", etc, are probably the best candidates to re-establish something like a proprietary mid-range where you care who the vendor is.


I was eyeballing the Pico Computing boxen for this:

https://www.micron.com/products/advanced-solutions/advanced-...

One could get a lot of mileage in many areas out of a desktop with a good CPU, lots of RAM, a graphics card for general-purpose use, and one or more FPGA's w/ HLS tooling (or just buy the modules). Especially if the basic components were standardized like the PC with minimum requirements on cores, slices, etc on each version of the platform. An app ecosystem could be developed or emerge with capabilities regular desktops couldn't match.


I'm unfamiliar with the "minion cores" reference in your comment. Is this the same things as this RISCv architecture?

https://www.lowrisc.org/docs/minion-v0.4/overview/


That's what I was referring to. The PRUs in a TI AM335X (like the BeagleBone) are similar.

Basically real time capable microcontrollers that share memory with the mainstream CPU that is probably running Linux.

Allows for interesting use cases, like high speed data in or out to drive LED displays, sample signals, generate audio, etc.


Thanks, this is really interesting. Might you have any other links or resources on this you could share? Cheers.


The basics: http://www.righto.com/2016/08/pru-tips-understanding-beagleb...

Project that uses PRUs for audio: https://bela.io and how it works: https://hackaday.com/2016/04/13/bela-real-time-beaglebone-au...

Driving an LED matrix display: https://trmm.net/Category:LEDscape

Emulating an old Macintosh SE video board: https://trmm.net/Mac-SE_video

Another sort of "minion core" in Allwinner ARM boards: http://linux-sunxi.org/AR100


Almost a month back there was a post on HN about company called Upmem which puts CPUs inside memory

Post:- https://news.ycombinator.com/item?id=20766283

Tech use cases :- https://www.upmem.com/use-cases/


The OpenPOWER RISC systems fit nicely in the middle-tier niche.


Agree, though that's basically the last gasp of the UNIX RISC vendors. I wouldn't be surprised to see them shelved soon. Not that they aren't terrific products, but demand is what it is.


There's an opportunity for someone to sell "reasonably" secure hardware at this price point. Especially since Intel certainly isn't selling anything that fits this description.


I think AMD already has that covered with Epyc 2 (Rome) processors.

POWER is also known for being highly SMT, which could lead to them being even more prone to the issues that plagued Intel's implementation of hyperthreading. A single POWER9 core has 4 to 8 threads.


People who are troubled by security issues in Intel's Management Engine may also be suspicious of the firmware in AMD's roughly equivalent Platform Security Processor, which doesn't have a spotless record either. See, e.g., https://www.theregister.co.uk/2018/01/06/amd_cpu_psp_flaw/


I think we were talking about Spectre, Meltdown, and related attacks. Not the PSP or IME, although those are concerning too.


Well, Spectre affected pretty much all processors except very slow ones (Cortex-A53) :)


It affected AMD's Zen architecture significantly less. A number of exploits that applied to Intel didn't apply at all to Zen, and the ones that did apply had less costly mitigations than Intel.


I would expect that it would actually be much harder to pull off such an attack, because you've got 4 threads fighting over the same resources instead of 2. These attacks essentially require statistical analysis, and the more noise there is the more difficult the attack would be. If you've got more than just the attacker and the victim threads running on a core than it's a noisier environment to pull off such an attack.


That’s a good point. Also imagine if the attacker controlled 7 out of the 8 threads on a core... maybe they would have even clearer statistics for analysis.

Or maybe POWER9 is completely secure.

My main point was that Epyc has already proven to be much more resistant to these attacks than Intel’s architecture, and it wouldn’t require dealing with porting your applications to run on a niche ISA, with very limited options to buy server hardware, and very few (if any) options to rent cloud instances on POWER9.


>"POWER is also known for being highly SMT, which could lead to them being even more prone to the issues that plagued Intel's implementation of hyperthreading."

Which SMT issues are you referring to here on Intels?


You’ve got an interesting point. I could see that middle ground being filled by customized silicon. It’s more possible now with things like customizable ARM chips.


More and more PAAS where you don't care if the server is a mainframe or a dell 1ru.


Nvidia dgx1?


I find the mainframe era romantic. I remember in college when I first got there people sending statistics jobs over to the mainframe.

I wanted a future where people had big mainframes in their basement (like a furnace) as the sole computing source for their house and terminals in every room. Instead we got no one using computers and everyone having a dumbed down phone. :(


Except that the "dumbed down" phone most likely has a whole lot more computing power than any of the big mainframes you might have used in college.


So I can submit jobs to them and have them do arbitrary computations for me? I can’t even get easy unfettered access to the filesystem for christ’s sake.


As it turns, out, the vast majority of people don't need to do arbitrary computations on a regular basis.

For those two do need it, you don't need a mainframe sitting in your basement like a furnace when a Rapsberry Pi has 500x the processing power of an IBM 4300 from the 1980s for a fraction of the price and size and power consumption.

There is absolutely no need to shit on the capabilities of modern smartphones when "arbitrary computation" devices are so cheap, small, and ubiquitous.


Comparing apples and oranges. The smartphone or Raspberry PI may have more CPU power, but has far less redundancy.


Yes, but dumbed down probably refers more to the capabilities rather than the computing power. Phones are great, but aren't really for real work.


consumers instead of people ️ i guess it's true for many other aspects of life. but it's a bit worse.


It’s a bad analogy. Phone is the mainframe eras terminal (also dumb btw) and the internet is the time shared mainframe you “can submit jobs to”.

You probably can’t access mainframes file system directly either. That’s a security and reliability feature.



A lot of people here keep mentioning mainframes and reliability/uptime. Mainframes were not particularly great at uptime (it was not a prime concern for batch processing), until Tandem, which was founded specifically to make fault-tolerant computers for on-line processing, started shipping systems in the late 1970s and taking market share away from IBM and the other manufacturers, forcing them to add fault tolerance.


I wonder what kind of machines Amazon is running in AWS to offer huge EC2 instance types. Anyone got insight into that? I feel like they could either go the Google way (cheaper off-the-shelf parts and a lot of replication) or the IBM way (scale up).


I've also wondered this, specifically around the number of vCPU's they offer. I'm sure there is a percentage of over-commitment here, but how do you offer a 96 vCPU instance?

Someone suggested that they used infiniband or something similar to make a single instance span multiple machines, but I don't buy this. There would be performance characteristics that would show this, and it would be documented.


96 vCPUs = 48-core machine with hyperthreading?

I think all sizes they offer are available as single machines, although the really large ones maybe are somewhat exotic (NUMALink or other interconnects to get a machine with 8 sockets? Not sure if the top Intel platforms do 8 sockets natively)


1) There are Xeon Scalable processor packages with 48 and 56 cores on the market, thus with 96 or 112 simultaneous threads. Cloud providers here probably sell that thread as "one vCPU", although

2) ... they could also time-share one core to as many virtual cores as they like, even more than the real number of simultaneous threads. This seems to make sense for better utilization of part of capacity, but I have no idea if they do it. Probably not for marketing reasons, they don't want to support an idea that their vCPUs are weaker than those of the competitor. Small providers are more likely to do this, I think, much less concerns about PR.

3) Big GAFA-like corps have access to specialized hardware from Intel. The cpus they operate may not be publicly known.


https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance...

4 cpu, 12 cores each, 2 threads per core.


I wonder how nany security vulnerabilities persist in the mainframe ecosystem simply because the the skillset and platform accessibility is relatively rare.


A few years ago, I had the good fortune of working at IBM's Poughkeepsie location on a mainframe subsystem team. What everyone is saying is technically correct, but it's not completely accurate either. Note that I do not work there anymore; these are thoughts I had when I was there and since then.

A large entity purchases 12 fridge-sized mainframes from IBM for over $100 million. Who might do that? Airlines, banks, governments, logistics, and others needing high levels of reliability.

To understand why this clientele would use a Z-series mainframe, first consider what the "z" in the name stand for: "zero," as in zero downtime. Typical compute providers express their downtime as "#-nines". For example, 5-nines reliability would mean you're down for around 30 seconds per year, on average. The Z-series mainframes are sold as having zero downtime, period. A remarkable amount of research, development, and engineering effort goes into achieving this level of reliability. Now, these clients usually perform jobs which are not computationally difficult (validating a credit card transaction, for example) but must work, since the economy depends on the availability of these services. The Z-series mainframe shines in processing these loads of many, short jobs.

There's a security angle to mainframes as well. Commodity hardware allows for fast scaling and redundancy. However, commodity hardware also allows for exploits to be shared easily. Once those exploits are discovered, companies need to patch, and there's no guarantee the patch will happen. Now, imagine trying to develop exploits for a system which is not commercially available (governments could still presumably acquire one), is a completely custom computer architecture (Z/Architecture, custom compiler, Z/OS, pretty much every layer below JVM), and has very few design documents available online. Oh, and consider that, from z14 onwards, any data in the mainframe is encrypted when at rest. (Decryption/encryption is handled beneath the ISA; once an instruction is run, the mainframe uses the central key management chip (tamper-resistance, designed to handle natural disasters, etc.) to decrypt necessary information. The information is processed, then encrypted again before the instruction is completed.) The likelihood of a script-kiddie getting into and exfiltrating data from one of these things is very unlikely. Hacking one of these mainframe would take an intense, coordinated effort.

Another important component is backward-compatibility. Take IBM's two main in-house storage protocols, FICON and FCP (FCP is FICON, minus most support for old systems to get higher throughput). FICON connects mainframes with giant storage arrays from EMC, Teradata, and others. FICON replaced ESCON, which replaced the parallel data communication system from the System/360 era. When a company upgrades their mainframe, knowing that your 20-year-old storage unit can still talk to your new machines relieves stress. Companies WILL pay for this level of backwards compatibility, and there's no reason to hate them for it.

Supporting backwards compatibility has historically not been too much of a problem for IBM. I worked with a person who took a class in IBM Poughkeepsie's now-abandoned Education Building on this hot new programming language called C (this was sometime in the 80's). Multiple people in my department were around for the development of not just the current generation of IBM tech but those before as well. The levels of technical depth they had were immense. I've heard people say, "oh, but that depth is narrow and won't get them jobs outside IBM mainframes." Perhaps, but in my experience, they don't care. They build systems the world depends on, whether the users of those systems realize it or not. I'll also add that in the days of Big Blue, your job was basically secured. Even after the layoffs of the 90's, IBM still needed to retain the old talent. (Imagine a company with lots of employees who've worked there less than 10 years and lots who've worked there more than 30 years. You'd describe IBM's mainframe division well.) Makes me sad to hear that IBM is discriminating against their older employees to push them out.

One commenter asks why IBM doesn't have "micro-mainframes" for smaller companies. For all I know, they could be moving this direction. At the same time, it seems like it wouldn't make much sense for IBM to do this. Why deal in thousands of dollars when you can deal in millions? Why put engineering effort into building computers for non-critical companies when, as long as you keep advancing performance and capabilities, your mainframes will provide you one of the best long-term cash flows possible?

Another commenter said new companies do not consider mainframes because they aren't cost-effective. I think it's for a different reason: new companies come and go. Their services aren't that important to the world, but they're trying to show the world their importance. Because of that, startups whip-up an infrastructure concoction which is inefficient, but that's ok because 1) they aren't encountering the issues of scale and 2) their workload and information can run anywhere. They just don't need a mainframe because they don't need that level of reliability.

Happy to answer other relevant questions you might have.


There is no such thing as a zero downtime system, and it is a fiction that makes it difficult to have real conversations about trade offs with a business. Moving towards zero downtime requires rapidly escalating cost to incrementally move the needle.

At a minimum, supporting infrastructure (power/networking/internet/etc) will eventually fail even with backups. On top of that, no mainframe is going to work when under water (flooding) or on fire (remember the delta outage?).


https://news.ycombinator.com/item?id=20977332

> The last time he did shutdown the Mainframe was 15 years ago

And that was a controlled shutdown, not uncontrolled.

If more than fifteen years is your time horizon to validate the zero downtime claim, that's cool. By that point, though, the system will have proven its worth.

Right; unreliable supporting infrastructure is the inherent trickiness with saying the mainframe has zero downtime. The manufacturer isn't being deceptive, though, if, given reliable supporting infrastructure, the system will stay online for as long as is stated. It's just not their problem.

Can't imagine a company would blame a manufacturer for downtime in the event of a flood/earthquake to protect the equipment. If a piece of equipment starts on fire, that's another story, but I suppose a zero-downtime system makes assumptions that it won't start on fire.


Agree from theoretical perspective you are correct - however, with a good implementation of a real-time multisite replication you can just about eliminate the variable..

https://en.wikipedia.org/wiki/IBM_Parallel_Sysplex


Thanks for providing this perspective. Very insightful. I think your comment should be pinned to the top, if HN had such a thing.


> imagine trying to develop exploits for a system which is not commercially available

Well, looks like plenty of mainframes are very well exposed to the internet, which.. helps.

https://www.youtube.com/watch?v=Xfl4spvM5DI

https://www.youtube.com/watch?v=5Ra4Ehmifh4

https://mainframed767.tumblr.com/post/29672128939/yall-encou...


I think part of your argument is security-by-obscurity, which is undesirable. But the rest of your post still stands, and is almost self-evident-> best reliability will be achieved when it's engineered into the hardware itself. It the best way to approach the problem if funding allows. Mainframe is really a computing cluster that does all the things redundant databases and other systems do, except it's implemented at the hardware level.


> Hacking one of these mainframe would take an intense, coordinated effort.

Just as well Soldier of Fortran doesn't exist or this would be a silly assertion.


This one was already posted on HN a little while ago.

http://www.winestockwebdesign.com/Essays/Eternal_Mainframe.h...

The new "cloud" running on racks and racks of cpus with memory and storage on nodes is really close to how mainframes are.

With Google and other cloud vendors making more specific hardware to deal with certain processes, we are entering even closer to similarities with mainframes. They have a lot of supporting processing power that offloads the CPU.

Mainframes now can also run Linux :)


Excuse my utter ignorance.

Isn't Mainframe just powerful Servers? At least I thought the term meant ( or used to ) mission critical and massively powerful. Nowadays I could get 2S EPYC 2 with 128 Core and 4TB of Memory. What makes a IBM Server with POWER10 any more reliable than a powerful x86 Server? After all many SuperComputer are now running on x86 as well.

Or are we now using the word Mainframe specifically for IBM products? Rather than a category of its own?


The mainframe is a different beast.

Can you change physically change CPU, Memory, and internal components without any stop/delay?

Not only the Mainframe is Very robust it was made not to stop.

I used to work on a Data Center with Sparcs, Intel, Blades servers, and Mainframe.

The only time I saw the Operator with Fear was when he had to turn off the Mainframe because of electrical maintenance. The last time he did shutdown the Mainframe was 15 years ago.

VMs? The mainframe has it since the '60s.

IO? The mainframe is a BEAST on IO

Backward Compat? IBM guarantees that your code from the '40s will run today.

The whole research in mainframe is to Be a powerful beast with high availability and safety.

If you may check more about it, it is a part of Tech that is very beautiful yet.


> Backward Compat? IBM guarantees that your code from the '40s will run today.

I hope this is exaggeration, because I don’t think you’ll be running Colossus “code” (from the first programmable digital electronic computer) on your z15 :)

Your point still stands. They’re in a different class from x86 machines.


Mainframes are a category of their own and IBM is the last company in the category.

Mainframes have spectacular capabilities, like running a compute on multiple CPUs in multiple data centers to ensure integrity and survivability built in so that any software can take advantage of it. They have the lowest transaction processing costs of any machine. They detect hardware issues and phone home to order repairs without user intervention. And on and on and on.

Quite frankly, a lot more companies should be using IBM mainframes than trying to build a reliable infrastructure on the cloud.


Few orgs wants to pay for reliability/quality. Whether hardware or software costs. They are ok with a bunch of 99.5 or 99.9 availability systems. With no thought for high availability deployment.


I wonder if there are any startups out there that resell mainframe services presented and organized akin to the aws type of view.


Unisys still hasn't given up.


It's a mainframe as in the category (nowdays, only IBM and Unisys are left). You are buying a mainframe, not just a CPU and a mobo like AMD EPYC. You expect serious RAS. This mainrame comes with spare CPUs for reliability, it has redundant power supplies. It has 99.9999% availability. Additionally, they come with massive I/O subsystem with drawers of PCIe connections and various other connections.


Historically, mainframes have a different architecture from servers. Specifically, they have redundant power supplies, IO buses, memory buses and CPUs. So you can even install or remove memory or CPUs from them without turning them off.

Today, servers are adding redundancy where it matters most, but they still have a different philosophy where you should think about adding or removing servers, instead of components.


"So you can even install or remove memory or CPUs from them without turning them off." - I did that with IBM pSeries servers decades ago and with commodity Linux servers running on x86-64 hardware within the last couple of years. That's not reserved for mainframes.


agree those particular technologies are not now mainframe exculsive - however mainframe grade systems allow switching / reconfiguring all components on line. E.g. plug in a new processor unit (e.g one of N processor racks plugged into the system bus), new IO unit, new disk unit, replicate all data, move all jobs, decouple the old one so that after the operation is complete 0% of the old hardware is running and 100% new, with no downtime.


POWER is not mainframes, z/Architecture is mainframes.


The Model T Ford and Honda Civic are both cars. It should be obvious that any modern car is massively more reliable, safer, and has capabilities the T can’t even dream of.

Commodity x86 and mainframes are both computers. But are as different as T and Civic.


"Car analogies are the Rolls Royces of analogies."


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: