Case a) Lots of years ago IBM changed CPU architecture on their System i. Every program we made was transparently recompiled to the new instruction set on first run, and just worked after that.
Case b) Legacy technologies are taken very seriously. Decades old programs made in obsolete versions of COBOL can still be compiled without issues on the modern systems. No unexpected side effects, no new bugs, it just works. It provides a comfortable path to gradual upgrade.
Case c) The tech stack builds upon itself, which means you can devote years to learn the stack and still enjoy the fruits of your effort for decades. You just have to keep up with the changes, not relearn everything every five years.
Case d) The manuals and reference are exhaustive and absolutely superb, you can become an expert just by reading them. How many questions about mainframe development are created daily in Stack Overflow?
Their biggest drawback is their cost. Mainframe as a Service exist, but it needs to be affordable enough to run cheap experiments, otherwise it won't be attractive to startups.
So everything he misses is there. It's just that running legacy code is too risky these days unless you are completely isolated from the internet. And modern infrastructure has a lot more essential complexity.
>It is amazing how well things work – the modern assembler, linker, and debugger handles the code generated by GCC 1.27 without any problems.
And the write-up he based his experiments upon:
I'd say this kind of source-based backwards compatibility is superior to what's happening in Windows world -- not only it encourages you to keep the source code open, but you also don't have to keep as many hacks in the kernel/system libraries.
Then you get laid off at 50 and have no hope of finding another job at the same salary.
Learning `vi` in 1979 then using it for 40 years seems like a good return on investment.
Both would also be examples of survivorship bias.
It isn't as flexible as a PC, but you shouldn't expect it to. You learn the design, conventions and idiosyncrasy of the platform and work with them, not against them.
Maybe this isn't the general standard, or maybe this wouldn't work for multinationals or companies/workforces with different values, but it works here. I suppose it also depends on what your company is doing; we're in fast moving online retail, if you are in a slower operating environment (i.e. only doing batch processing or bulk messaging with ETL of different kinds) where changes aren't giving you an edge over competition if must be a different set of operational requirements too.
Certainly not the i. There is so much documentation that it might be more of an issue of it being overwhelming. there are multiple active communities sharing code.
this is a system which is based on DB2 meaning SQL in native, development tools are varied, from COBOL, RPGLE, JAVA, and more. You can web face nearly any existing process, node.js work is one area we have lots of current work. no down time for reorganizing files, indexes always maintained, and tools open to all on the platform to optimize sql statements. I have developers using all this and more. most use an eclipse based platform (RDi for Cobol/RPGLE) or eclipse itself to work.
admin wise it is a breeze, no need to hunt down a tcp/ip stack, db tools, schedulers, or such. its all built in.
Legacy systems are only limited by the imagination of those using them. the reason the systems persist and grow is because they are not frozen in time and their architecture readily adapts to new technology because they were designed to not be locked down with an inflexible design.
I worked on mainframes for years and cheered when minis came along.
I worked on minis for years and cheered when PC's came along.
I worked on DOS for years and wept when Windows came along.
I worked on windows for years and sang choruses of joy when Linux came along.
I haven't moved off Linux because it just keeps getting better and better.
Would I recommend anyone go back off Linux to ye olde mainframes?
Not on yer Nelly. Never.
Linux and UNIX used to be installed using stacks of diskettes and piles of patience.
> It should not be an acceptable practice to just insert a CD and indiscriminately install software onto a production machine.
This assumes your machine is set up correctly. I've worked on some mainframes where everyone could access customer data and swap out production executable freely. Why? Because no engineer wanted to deal with 6 layers of bureaucracy to debug an issue or update a library. Segmenting the data and isolating specific infrastructure has proven out to be more efficient and more secure.
> Nor should it be acceptable to just flip the switch and reboot the server.
These are more about the fact that there is one mainframe. The reality is for distributed systems that anyone can flip the switch at any time and you better plan for that up front. Its a good thing that I can restart my laptop, or production servers whenever I need for updates, and don't need to do 6 months of business continuity planning for "update day".
> In mainframe shops there is an entire group of people in the operations department responsible for protecting and securing mainframe systems, applications, and data.
As does any non-negligent mid-sized organization. Having a security team is very standard at this point.
> Ever wonder why there are no mainframe viruses?
Because no hacker can afford to have an IBM mainframe running in their garage?
> There is even a term – the accidental DBA – that has been coined in the SQL Server world for developers who become the DBA because nobody else is doing it. Such a situation is unheard of in the mainframe world – indeed, you’d be laughed at if you even suggested it!
Great now I need to hire extra people to specifically administer my very expensive database deployments instead of relying on a cloud provided database and relying on general "DevOps" people to have enough knowledge to troubleshoot some specific problems. The DBA didn't go away for no reason, they've just largely outlived their economic usefulness.
> Because no hacker can afford to have an IBM mainframe running in their garage?
There have been IBM mainframe malware in the past, the most famous incident being https://en.wikipedia.org/wiki/Christmas_Tree_EXEC
If an attacker wants to learn to attack IBM mainframes, they don't need a mainframe in their garage. Using the open source Hercules emulator, you can run IBM mainframe operating systems emulated on x86. Legally, you can only run really old versions, but piracy of newer versions is widespread. (IBM will sell you a legitimate way to emulate newer versions on x86 for a few thousand dollars annual subscription - but we can assume an attacker isn't going to be bothered with obeying copyright laws.)
Security through obscurity is a bad practice, but sometimes it really works. Exotic platforms tend not to get much malware, not because they lack security vulnerabilities, but simply because knowledge of how they work (and in many cases, even opportunity to acquire that knowledge) is rare.
More importantly, the return per hour spent to write that malware is lower. There are fewer macs, though knowledge is widespread, so there are more windows viruses because there is a larger pool of victims.
I knew a mainframe engineer and he certainly didn't have multiple spare mainframes laying around to play with. There might have been some form of partitioned install they called "test" but by the time people bureaucracy gets involved noone is "playing with it" and ergo, there's a lot less actual experts involved.
Bonus question: Was there never a firmware bug that would take down an entire mainframe? A certain server line was advertised with "mainframe like redundancy" on Windows, yet that's exactly my experience.
Not sure about bugs, but I suspect mainframes have fairly redundant software with enough hardware and software to handle unknown bugs.
Yes, there was quite an ugly one not that long ago in fact. A local mega-corporation which still has some mainframes had its main (and redundant) systems just roll over and die suddenly, plus their hot backup systems which were offsite. I don't remember all of the details, but IIRC this was a case of "We [the vendor] knew we had a major bug, and we had a fix for it, but we were remiss about getting it out to all of our customers in a timely manner." This type of thing is shockingly rare in the mainframe world, though.
1. The IT Security Profession is young, in many ways, it's still in its infancy. I say this because today you can still get a B.S. in Computer Science without being taught secure coding best practices or having that be integrated into the curriculum. So given that the state of the art is young, and yet mainframes are quite old and have a reputation of being slow to change, how could they possibly keep up with best practices?
2. The mainframe world tends to be highly proprietary, even with proprietary protocols. I am skeptical that they adopted appropriate information sharing and training mechanisms, and how can they recruit security talent when access to the system is so closed? How would we even know of attacks if they are kept secret and not reported?
As an example of how things are done on the legacy side, I was once dealing with a piece of Power hardware which could be configured as either a Linux system or an iSeries system. There was a USB port in the hardware which was enabled on Linux systems but disabled on iSeries systems, and AFAIK it couldn't be enabled, either. This puzzled me at first, until I found out how USB ports can be used to attack systems and then it made perfect sense - they had chosen to disable the USB port for security reasons.
Hercules is a great teaching tool, but it will limit the kind of exploits you can develop (the most interesting being hardware based ones anyway).
On top of that, mainframes are very observable machines. Because a lot of companies rented out extra capacity and billed it by usage (70's AWS), everything is audited, logged, measured, traced and, unless the intrusion went really deep, you'll have detailed information on how it happened.
There is a course I never took ("Hack the mainframe") by a couple nice folks I know from Twitter that teaches a lot about vulnerabilities in default settings and how to secure mainframe based services.
None of it works for actual distributed systems.
He once mentioned in the pub at lunch that "oh my first boss was Diskaja".
This applies to pretty much any production environment. Why would you have to work on a mainframe to understand this.
As a side note I just realized I’ve never physically touched any sever I’ve worked on professionally.
I just realized I’ve never physically touched any sever I’ve worked on professionally.
- Loading a custom removable boot drive.
- actually dialing in a hex address of a boot device on a console
- knowing and using command sequences to bring up and down systems with thousands of simultaneous users
- using quiesce to halt a mainframe, leaving all processes intact but frozen, so major hardware components could be worked on... then all processes resumed in place
- bailing somebody out of a screwup that would otherwise require hours and thousands of dollars to redo
- working on technology than spans 20+ years in the same shift
Best of all, I was mostly on my feet all night rather than sitting at a desk.
As for subsequent realms (UNIX, PC, VAX, etc), I started before servers were banished to racks and always had hands-on access back then. It was a big improvement when we could do DOS builds using cross-compilers on Sun and then serve the binaries over PC-NFS (welcome to 1993).
He had 2 Solaris servers under his desk and I loved to listen to everything he said about them and Sun. Too bad his talent was wasted when his job was outsourced to cheaper locales, I mean RIF-ed excuse me.
Sadly, the age of the 3278 is gone.
Many of you came here to talk ish about mainframes, and by doing so you precisely illustrate why new IT people need to learn about mainframes.
It's disingenuous to say that GNU/Linux is getting better when that is hardly the case. It has business-itis. One distro's lessons don't translate well or sometimes at all to another's. It's growing subsystems and methods that don't have well thought out or even decently documented rationale, and the rate of change doesn't appear to be slowing.
By trying to make examples of some of the worse characteristics of mainframes, some of you are showing that you're not speaking to the article at all.
The point is that what we call modern with regards to reliability is simply aggregation and redundancy. It's a collection of solutions to symptoms and they don't address the underlying problems at all. This is nohow like a mainframe. Not even a little bit.
Heck - a modern virtual machine farm can't even cluster as well as VMS can cluster. Why? Because our virtual machine farms have evolved to fix the symptoms that come from the problems of less reliability, and have taken little or nothing from mainframes.
Mainframes have already learned about the problems of reliability, addressed those problems, then moved on, and they did that half a century ago.
Netflix's 'chaos monkey' software is an example of something that goes around randomly breaking things within clusters just to see if something fails to transparently recover without service interruption.
One of my favorite things about MFs is the auditability. It is actually very transparent system, if you know where to look. There is a lot of information logged by the system (from many different perspectives and on all levels, system/middleware/application, performance/debugging, etc.).
Just one example, since people laugh at security in particular but it is more a cultural thing really. Every data set (file) access and every process execution is logged by default on the mainframe.
Also MF applications seem to be simpler and more performant. I work for a company which does both MF and distributed enterprise software and curiously, mean time to ticket resolution is lower (I think more than 10%) for mainframe. I don't know why, it seems counter-intuitive to me (given that on surface, the development tools for distributed are better), but it is a fact.
In some sense, it's like big airplanes, lot of really good operation procedures are figured out, so most of the things are pretty routine.
This sounds very expensive, how it handled?
On the other hand, the system also often logs every execution of DB transaction. These can go into billions of records per day for large MF installation.
The other part of the story is that the main logging mechanism - SMF data - is structured binary log. A production system can produce about a TB of these data per day, the information about individual transactions are by far the largest portion (70-90%) of that. There are many different sources of this data. And that's just what is collected by default, there are also other collectors that are targeted for debugging a specific problem. (One example is that the zSeries CPU has actually hardware tracing facility that can log important execution events like system calls, interrupts, I/O operations and so on. It can also be configured to track memory access. System can be then configured to save the recent trace, if a problem occurs.)
Also, any given action (such as job or transaction execution) is typically logged in several places. So you might find the trace of it in data set access, in security access (system checks if the user is allowed to access the dataset, and it is logged), accounting for the job execution, measurement of system performance (CPU increased for that category of work because it was executed, particular device was accessed)..
And yes, doing all that is somewhat expensive (especially on transaction level), but it's worth doing it, because you get amazing visibility into what the system is doing. In fact the transactions are logged anyway (into transaction log), so it's a relatively small overhead compared to that.
Another big part of story of why mainframes are probably more efficient is that today's programming is very oriented to developer comfort. On mainframe, almost everything is pre-allocated and usually has a somewhat fixed size. This requires more design and testing up front but you save on run time costs, and also support costs, because everything becomes a lot more predictable. Things usually fail earlier because they crossed some boundary like that and they don't take down the whole system.
> Ever wonder why there are no mainframe viruses? A properly secured operating system and environment makes viruses extremely unlikely. And with much of the world’s most important and sensitive data residing on mainframes, don’t you think that hackers would just love to crack into those mainframes more frequently? Of course they would, but they can’t because of the rigorous security!
This isn't always true. There are breaches  and a lot of the traffic seems to still be sent unencrypted (or just in exotic EBCDIC) .
The biggest problems we've run in to while I'm here (with new development, not maintenance) is external services from other parties being unreliable or not representative of their production environment in test environments, and management trying to implement Agile methods where we have 4 yearly releases and a single possible (but ruining our delivery target) reboot window once every two weeks in case of broken functionality that needs patching.
I don't know much about modern development and systems, but I really should get on that. It just feels very difficult to get into with all kind of systems and languages.
Really, it's not the technology that's unique with a mainframe, that gives it the properties it has; rather, it's the scale, the multitenancy, and the SLA of the use-cases that the technology is being put to. In other situations where those same things are true (such as the hypervisor control-planes of big public clouds, or tier-1 network switches, or telecom equipment) you see the same kind of top-down operational planning form around them.
And, I think, this is where we can learn the most from mainfrfames - they have been where we are for a long time now and solved some of our problems in vastly different ways.
If you ever get to visit Seattle,
Paul Allen, before his death, set up something called “The Living Computer Museum” . I think it was maybe $4 in admission, but you could literally sit down and program a working PDP 1 (via a teletype machine for both input and output), PDP 11, Xerox Alto, Windows 3.1, and spectacularly, a Cray 1 supercomputer.
What I can believe is that it's much easier to mismanage a distributed fleet of consumer PCs, as a result of mainframes mostly only having been used in situations with extensive change management and change audit processes.
This is a huge simplification.
less capable than the underlying mainframe, you mean. Unless you have tons of separate VMs running on your mainframe, you can have tens of terabytes of RAM available to your job and a fridge-sized cabinet of PCIe channels to do its IO (which is done by specialized CPUs).
The gist of the whole article (as already stated a few times) is "work in a properly managed place".
I have had experience on mainframes, AIX platform, AS400, windows, linux, AWS, VSphere, docker, IBM cloud, etc. Heck I have even worked with punch cards!
I have programmed in environment specific tech stacks as well (JCL, REXX, Info Management on z/OS vs Python, JS, Kotlin on linux, etc).
I would have completely agreed with the author in 2015 because this article is from 2015 -- in 2015, Cloud adoption in corporate America/Canada was minuscule and general cloud experience was missing in the IT industry.
But now, in 2019, cloud is the new mainframe and the reliability gigantic cloud providers such as AWS easily rivals (may even exceed) the reliability of a on-prem mainframe setup.
And to top it off, majority of mainframe staff and managers are really not that sharp or tech savvy -- you will have a handful of crazy good developers and tech leads while the rest are still doing level of programming they learn in their college days...back in 1970s/1980s.
Also, I will someday write a blog post about how to get into the field, starting with "at first, grow up on a farm". The point being that there are hundreds of ways to get into this field and have serious fun without following some Official way.
Part of the appeal of mainframes, and of consultancies that focus on them is the seriously low salaries. Oh, also hostility to new ideas and lack of development tools.
Still, some of the other points are more correct than at first sight - the author's failure is to compare mainframes with single servers when, in reality, they behave more as tightly coupled clusters of specialized machines. When you switch to this mindset, a lot of the problems we face managing our clusters have mainframe counterparts since the mid 80's.
>> Nor should it be acceptable to just flip the switch and reboot the server. Mainframe systems have safeguards against such practices. And mainframes rarely, if ever, need to be restarted because the system is hung or because of a software glitch. Or put into words that PC dudes can understand: there is no mainframe “blue screen of death.” Indeed, months, sometimes years, can go by without having to power down and re-IPL the mainframe.
This can also be true for a standard cloud server running several web services or other processes. How are any of the author's points specific to mainframes?
Where I disagree with him is that I don't think right out of college professionals should be working in a mainframe environment. As a person who has spent the past 3.5 years (right out of college) working in mainframe capacity planning and performance, I think mainframe technology can be pretty niche and I very nervously wonder if my very specific mainframe systems engineering experience will translate anywhere else. I feel as though getting a new tech professional started in their career would be better in an environment that is popular and they are familiar with, and then working from there.
finds job board posting $60-$80 USD
> we need someone to replace multiple existing mainframe systems...
Doh. Well. Somebody tell the author, someone's out for their job!
There is something to be said about finding work when the client is completely unable to assess your skills. Or just doesn't care because the environment is so restrictive and bulletproof you can't damage it no matter how incompetent you are.
... in order to experience and learn that:
* Max depth of your file system tree is 2
* Password must be changed every month
* Max password length is 8
* Alphanumeric chars only
* No upper/lower case distinction
* There's nothing wrong about using java 1.7 in 2019
* "if <customer == XYZ> then <feature>" is just fine as a code pattern.
* To release uncompiled & untested code is OK.
* To release code with compilation errors is OK, too.
* To use CVS could be a huge step forward. Because the thing you've been using can manage only code files but not configuration files. Albeit "manage" is a huuuge overstatement here.
* Print out the list of changes and stamp it with a rubber stamp is the way to fix it.
* Permanently-Partially. Half of the team was printing the other half was copy-pasting the list of changes to a file.
* For years. On the PCs. Not on the mainframe.
* In MS Word documents, not as a plain-text. Because reasons.
* Ok, several months after my predecessor left I gained enough confidence to ask for the permission to abandon the printing in favor of the MS Word. I was injured at the time and I thought like the pitiful picture of me limping between the printer and the desk 4 times a day could help me in an argumentation.
* There are mainframe developers who don't even know how e.g. XML looks like. No kidding.
* And they won't get fired neither asked to learn anything new. And their salary has 6 digits.
Every shop I worked in has decent change management, builds handled automatically by CI, restricted access to production db's. His particular point of "anyone being allowed to pop in a CD" has never been true. Even back in 2002 you would at least have machines in a locked room.
DBA as a job is dead because modern db's don't need dedicated employees to maintain.
Super high reliability is not needed for servers because everyone load balances several instances. Often in different data centers. DB's are an exception but often have hot standbys for the same purpose.
Mainframes are dead. At least 10x cost for the same performance. The world has moved on
And having a "locked room" doesn't mean anything when you have remote access.
This is the opposite philosophy to Chaos Monkey or failover drills, where you kill your instances/servers/racks/zones/dcs/continents at random and fix whatever broke. Cattle vs pets.
I haven't heard about this, production looks fine. Even if that's the case: the key point being "parts". A meteor can blow up a whole AWS datacenter, and your production systems should still survive. There may be a blip while capacity is provisioned. If you _really_ want to be safe and invest the engineering resources, you can deploy across regions. Or even clouds.
Try that with a mainframe.
...And a cloud-based budget.
Summer of 2000 I started working as an intern at a Fortune 500 w/ a series MF. Ostensibly, I was to be a dedicated (primarily) web developer for the DBA team (around 8 people excluding me). The primary RDBMS was DB2 on the MF, also IMS.
The first year, most of my time was spent doing VB Script in ASP and some occasional DB query tuning. My second year, .Net was new and I convinced my boss to migrate to ASP.net (I'd done a project in school with it, so I had a decent understanding of .Net for the time).
Anyways, a couple of highlights:
Pioneering use of BLOB storage in DB2 at the company (was a huge pain in the ass working with BLOBs from classic ASP).
I crashed the development LPAR (logical portion, sort of a VM in MF parlance) tuning a SQL query (hit a bug in the DB2 query optimized that somehow knocked out the entire OS). My green-screen term disconnect shortly after I tried to run my query. I got a call from the NOC along the lines of "whatever you did, dont do it again. You just took down the entire dev LPAR". Knocked about 10000 people off with a SQL query.
I spent an entire summer trying to get DB2 Connect (required software, at least at the time, to talk to the MF DB from either Windows or *nix), spent 8 hours a day most days of the week on calls with IBM support trying to get it to work on a windows server cluster. Followed their documented instructions to the letter. Never got it to work. After the summer ended, they came back with: "this is an unsupported configuration", despite clear documentation on how to set it up in their printed manuals.
Had a bug against DB2 where the "select" columns could cause different number of records to be returned. It was a difference between using a coalesce vs "case when null". We happened to have a 3rd party DB tuning expert on site teaching a class at the time. When I showed the 2 theoretically queries she was stumped. IBM got a copied of our non-sensitive DB to replicate the results. Never did hear about a resolution on that.
Apparently about a month or so before I started, they had just retired the last punch card reader. The extra unused punched made for great notecards.
Finally, one of my favorites was when I attended a series of meetings at the datacenter. The DC was over 100 miles away from the office, because apparently it got the company a huge cost savings in disaster insurance. The building was basically a bunker in the northern Midwest. 1 window in the entire building, a little more than a foot square. For the security guard. It was also inch thick bullet proof glass. No heaters, but around 2 dozen or so industrial air conditioners, each the size of about 2 residential refrigerators.
So, along with getting rid of the punch card reader, they had recently upgraded to a tape silo, instead of having human runners get a tape and load it, when requested. The new tape silo also had incredibly higher storage density. So, all the old tapes were gone, nearly every was in the silo. Thus, the old tape room adjacent to the server was no longer needed for tapes, and the room was turned into a meeting room. The tape shelves were removed, and conference tables and chairs were added. Nothing else change. Including the automated door shutting and HALON activating in case of a fire. IIRC, they warned us that you had about 15 seconds from the time the fire alarm sounded to evacuate that room before the door forcibly closed and the HALON activated and near certainly killing you.
I always went for the chair closest to the door.
Because no one reads their personal email or does casual web browsing on a mainframe?
Well, I can't even do that. First, generating a "CD" is too much work. Second, not sure who in AWS or GCP I would have to talk to to get a CD in their datacenter - and once there, how to read it without drives.
The equivalent of "inserting a CD" is pulling container images. And EVERYONE in engineering should be able to do that.
Why? Because, if you have sane processes in place, such code will:
- Be peer reviewed
- Be built automatically. Unit tests, integration and whatever else needed (load?) will run automatically.
- Go to some form of staging environment, where QA can do their jobs
- Get moved to production, possibly with a canary, blue/green or what have you, and automated rollbacks if metrics start to get funky.
> Nor should it be acceptable to just flip the switch and reboot the server
We don't care if said servers are cattle. We shouldn't even care if those servers are part of a primary data store. PG has master/slave and we setup a pool in front of it which will manage connections. Same for other SQL systems. Other data stores will be inherently distributed and will not care much if a node is lost.
> Indeed, months, sometimes years, can go by without having to power down and re-IPL the mainframe.
Yeah, but why do I care? I do care that there are enough active servers to handle requests. I don't usually care about uptime, and we actually actively terminate older ones.
> Security should not be the afterthought that it sometimes can be in the Windows world.
Windows world... Most servers are not running Windows these days. But even if they are, security has improved a lot.
I have never seen a virus in the wild on a server in the cloud. Not even worms, if the security team is doing their jobs.
> Oh, we may turn it on its side and tape a piece of paper on it bearing a phrase like “Do Not Shut Off – This is the Production Server”
Holy. I don't know were this guy has been working before, but this is really bad.
Ok, maybe PC servers won't be able to survive if the CPU goes up in smoke (but they are expected to survive if the PSU goes up), but nowadays, we don't care. Shoot a server, another will take its load.
> The bottom line is that today’s distributed systems – that is, Linux, Unix, and Windows-based systems – typically do not deliver the stability, availability, security, or performance of mainframe systems.
Says who? Most large companies created in the last decade are not running mainframes. The likes of Facebook, Netflix, even Google. And they are doing just fine. It is not clear that any outages could have been prevented by having a single point of failure, however robust it is.
Sounds like the author should move _away_ from mainframes and look into what's been done as of the past couple of decades.