
Why Do Big Irons Still Exist? - diegolo
http://www.di.unipi.it/~nids/docs/why_do_big_irons_still_exist.html
======
ChuckMcM
The general purpose answer to that question is OLTP. The transaction
processing community has a number of benchmarks which look at the cost per
transaction and large mainframes typically "win" in those scenarios. As for
_why_ they win, that is an interesting question.

As a systems enthusiast and someone who has watched as computers got small and
then big and then small and then big again, I believe the fundamental answer
is based in state machine theory. Specifically around how data becomes
"entangled" with other data. That is the essence of what makes transactions
hard.

I first ran into this looking at scaling file systems. Unlike RAID where all
of the blocks in a stripe are related mathematically, a "file" as a sequence
of octets is defined not only by the mutations that happen to it, but the
order in which those mutations take place. So "append 1, 2, 3", back up one,
append 4, 5" leaves 1, 2, 4, 5 if applied in sequence but leaves 1, 2, 3, 4 if
the last two steps are swapped. Thus both operations and the order of the
operations are important. To hold the state of a complex sequence stable, you
generally have to have it all in memory ready to complete (commit) and then
rapidly verify its stable, and then commit it.

Clusters of smaller systems have a hard problem with this. That said, I would
love to play with some of Google's spanner systems to see how well they handle
the OLTP workload with respect to cost/size/power. The paper suggests that
there is a credible path there as flocks of distributed systems get cheaper
and more easily connected.

~~~
etrain
Spanner maxes out at ~10 transaction (batches)/second in that paper. By
comparison, a standard PostgresSQL installation (on littleish iron) is capable
of 50k TPS depending on the benchmark you're running.

In the Spanner design the maximum throughput is governed by the accuracy of
GPS and atomic clocks and, ultimately, the speed of light. You can only make 7
round-trips per second between a data center in New York and on in Singapore
unless you're willing to bore a hole through the center of the earth.

Order is expensive and, more fundamentally, coordination is expensive. At the
center of traditional OLTP workloads is a promise given by the system to the
developers that their transactions will happen in some order.

Peter Bailis and Joe Hellerstein are two prominent academics working on
addressing this problem, and at the core of their research is giving up pieces
of this promise.

------
nickpsecurity
Here's a nice summary of their advantages:

[http://ezinearticles.com/?Advantages-and-Disadvantages-of-
Ma...](http://ezinearticles.com/?Advantages-and-Disadvantages-of-Mainframe-
Computing&id=7413087)

Hard to toss out a trouble- and hacker-free system that has handled everything
thrown at it for 30+ years, can run any workload, maintains backward
compatibility, and supports new stuff. Channel I/O is also frigging awesome:

[https://en.wikipedia.org/wiki/I/O_channel](https://en.wikipedia.org/wiki/I/O_channel)

Note: I wrote in Ganssle's Embedded Muse that real-time could benefit from a
beefy CPU and low-end one for I/O interrupts. He agreed and one of us found a
SOC that did something like that with quite some results. :)

Virtualization, pay for what you use, hardware accelerators, I/O offloading...
all these "new" things have been in mainframes since the 70's. Unlike the
modern stuff, the people coding them focus on making them boring, predictable,
and reliable. Plus your old software is future-proof and you can do new stuff.
So, risk-adverse businesses think they're worth the HUGE amount they spend on
them.

That said, there is a negative reason many companies stay: lock-in. The older
companies invested decades worth of money in mainframe-specific software and
libraries/tools from companies that no longer exist. Porting all that over to
modern architectures would cost way more than a mainframe plus have risk of
catastrophic failure. So, they just pay the bill each year and accept any
improvements they get.

~~~
zurn
Re "hacker-free", Pirate Bay's anakata was jailed for breaking into
mainframes. See [http://insecurety.net/?p=877](http://insecurety.net/?p=877)
(the doc link there is broken, try [https://wikileaks.org/gottfrid-
docs/](https://wikileaks.org/gottfrid-docs/))

~~~
nickpsecurity
Most of them, not all, avoid serious hacks. There's not major disruption. Plus
there's deniability: "people only hack mainframes in movies damnit!"

I always thought ancient interface and barrier-to-entry (esp price) added to
it given it's hard to otherwise explain hackers ignoring such high value
targets. There's at least one that's publishing mainframe hacks now. Blood in
the water. More sharks will arrive over time.

------
fiatmoney
Big Iron exists to do tons of high-reliability (as in, "fire some buckshot at
the server, swap some parts, zero downtime") transaction processing.

Little known fact: the original high-throughput NoSQL document database was
written by IBM and is still around.

[https://en.wikipedia.org/wiki/IBM_Information_Management_Sys...](https://en.wikipedia.org/wiki/IBM_Information_Management_System)

~~~
sklogic
Before Codd, all the DBMSes were "NoSQL".

------
bmh100
I work with a platform that fits exactly into the big iron use case. A typical
machine will host millions to billions of rows of transactional data in
memory. The round time from a user interaction to fully joined (across several
or dozens of tables), aggregated (across several dimensions), and visualized
data rendered by the client browser is expected to be less than 2 seconds.
Each second of wait is an exponential cost to user experience. Yes, you have a
single machine with dozens to hundreds of GB of RAM. You also have a
responsive analytics experience, and that value makes it all worthwhile.

~~~
nickpsecurity
Curious, what's your hardware configuration for that?

~~~
bmh100
I mainly work on the smaller side of servers, so you would be looking at Intel
Xeon or i7 2.8-3.2 GHz (4-8 cores), 16-32 GB RAM, Windows Server 2008, a 100+
GB SSDs in RAID, 1 TB HDD, with nightly backups. Interestingly, management
analytics applications can handle longer downtime than operational systems, so
a full system restore from the nightly backup is an acceptable process. Data
can recovery can just be performed without downtime, with background processes
and zero-downtime database updates.

~~~
nickpsecurity
Interesting. Especially them doing a nightly restore. Many forget to ensure
the restore process will work as well as the backup. That company will
probably have no problem there. :)

~~~
bmh100
The restore is only as-needed, but full system image backups are made every
night.

~~~
nickpsecurity
Ah OK. It was too original and extreme to be true. :)

------
spo81rty
Some database heavy applications were not designed with sharding/scaling out
in mind. Rewriting an entire application to do so is not a trivial task and
takes away the focus of an entire company and development team during the
process of such a major endeavor.

At my last company we had this problem. We had a simple application for
tracking inventory that morphed in to a CRM with tons of data from emails,
notes, etc. Database grew to be terabytes in size. Using fusion-io cards and
replicating to read only databases was required to keep us going and it
worked. We had to keep buying bigger iron. A database server with 2TB of RAM!

At my new company, Stackify
([http://www.stackify.com](http://www.stackify.com)), I made sure we gave each
client their own database so we knew we could scale out from the beginning.
Now we manage over 1000 SQL databases and that has its own set of challenges.
We use SQL Azure so that makes it all pretty simple thanks to their new
elastic pool features. Versioning SQL schemas and using tools from red gate
have helped us keep our sanity.

------
Theodores
A linguistic aside...

I always thought it was 'big iron' as in something big and made of 'iron'. The
idea of 'Big Irons' makes me think of ironing shirts with some super-sized,
barely lift-able iron rather than something the size of my Philips iron. I
imagine the steam from a 'big iron' could be quite fearsome.

Having made the link I now have a sensible name for the server room where I
work. We didn't put a sign on that door because it would be helpful for
thieves. 'Ironing Room' as in what hotels have might be more befitting even
though a server and a firewall does not make 'big iron'.

Either way 'big irons' is now added to my lexicon to go along with other
deliberate misspellings including 'nucular' and 'skelington'.

~~~
jacques_chester
The other name I've seen used is "dinosaur pen", and by extension the
operators are "dinosaur herders".

~~~
digi_owl
Sounds like it comes from the same world as pet (one physical server, one job,
carefully watched and maintained by admins) and cattle (containers/VMs/cloud
where you can go for 1 to 1 million in a matter of seconds) servers.

------
knappador
One reason big machines might make a comeback is the increasing capability
putting off the super-linear cost growth into the realm of > 100GB in-memory
or > 10TB on disk. CPU hasn't kept pace unless you consider GPGPU or Phi
parts.

Super-linear disk cost back when disks were already atrociously slow compared
to the rest of the machine have largely gone away with SSD's hitting huge
capacities and tech like NVMe, solid state RAM modules, and Intel's upcoming
Optane tech ensuring that more than ever, scaling horizontally can be put off
way more than used to be possible.

If you look at scale-out vs scale-up for any applications that were disk
limited, disk performance is now ridiculous - > 1GB/s and IOP's measured in
100's of thousands. I'm expecting a bit of a comeback for HA over HP. More
than likely, your app can be served well by a single big machine that is well
within the linear scaling regime, and you need several for durability and geo-
availability.

~~~
Johnny555
Terabyte sized memory will soon be possible with Xeon (if not already), Amazon
announced x1 instances with 2TB of RAM + 100 cores. They are using Xeon E7
CPU's:

[https://aws.amazon.com/blogs/aws/ec2-instance-
update-x1-sap-...](https://aws.amazon.com/blogs/aws/ec2-instance-
update-x1-sap-hana-t2-nano-websites/)

~~~
lsc
Oh yeah, supermicro has been selling dual-socket xeon boards with 1.5tb ram
capacity for a while.

[http://www.supermicro.com/products/motherboard/Xeon/C600/X10...](http://www.supermicro.com/products/motherboard/Xeon/C600/X10DRi-T4_.cfm)

that thing takes Xeon e5-26xx v3 CPUs which are pretty middle of the road
server chips; Nothing fancy. If you want to go quad-socket with E7 xeons or
something, you can get even more ram in one box, but the E5 Xeons are
dramatically more economical than the E7 xeons.

I mean, the linked motherboard would be money, sure, but it would be the sort
of cash that my company would be able to come up with.

------
jetsnoc
A single "big iron" server is mostly simple and very understood whereas
architecting distributed systems gets very complex very fast. It's easy to
have a very large MySQL server and a MySQL slave whereas sharding gets
complicated quickly and there is always network latency involved.

------
n00b101
The dictionary definition cited in the article confirms that the term "Big
Iron" usually refers to High Performance Computing ("supercomputers" as
opposed to database servers): "Used generally of number-crunching
supercomputers such as Crays"

The question is still interesting, why does Big Iron still exist in High
Performance Computing? I'm not completely up to speed, but I think the reason
has a lot to do with specialized network interconnects, such as three-
dimensional toroidal interconnects [1] ... These specialized interconnects
differentiate "Big Iron" from commodity clusters. Another differentiating
feature relates to memory, such as very large memory capacities and unique
memory hierarchies using NVM, SSDs, etc. A third possibility is very large CPU
socket capacities, going beyond the standard dual-socket or quad-socket
configurations. This type of technology can certainly play a role in databases
and it sheds some light on the "database appliance" trend (integrated
hardware/software solutions).

[1]
[https://en.wikipedia.org/wiki/Torus_interconnect](https://en.wikipedia.org/wiki/Torus_interconnect)

------
trhway
corporate developers (and their tools) can work with a DB on Big Iron - it is
basically an appliance to them with SQL interface. The
distributed/horizontally scaled systems are far from being an appliance and
require Google, FB, etc.. type of employees to work with it, and there aren't
that many of those employees around. This is why companies who make the
horizontally scaled systems accessible to typical corp are going that well.

~~~
Sanddancer
There are other advantages that typically come with a Big Iron type setup.
They tend to have a combination of hot-swap RAM, CPUs, and IO cards, as well
as more extensive circuitry to detect when a piece of hardware has failed,
such as having CPUs and/or RAM operate in lockstep. With real big iron,
they'll have hot spare processors that will automatically activate, plus make
a service call to the hardware vendor to tell them to come and swap out the
bad module. That SQL appliance will work come hell or high water, which is
something that horizontal systems still have trouble with from time to time.

------
derefr
Unrelated question I've been wondering about for a while now: I read a lot of
"Unix-ist" writings growing up, and one constant target of hate was
VAX/VMS—comparing it, usually, to something like an overwrought 747 cockpit
where everything is automated and shiny and nobody can really understand
what's going on underneath.

Given that even Linux is now used in HPC scenarios, are the "true mainframe"
OSes really still so different, either in architecture or in average sysadmin
experience? Or did the microcomputer OSes effectively converge to have all the
same features?

------
wilyk
At first, there was the mainframe and all was good.

Then Sun pushed out SPARC boxes with 3 mouse buttons (don't even LOOK at those
let alone touch those sayeth the sysadmin)

Then we moved everything server side to the cloud, and virtualize everything
in our dev environments.

In the future, I predict that things will come around full circle: quantum
computing will bring us back to the Big Iron days of "don't even look at it,
don't even touch it" but given cooling requirements of QCs these days you
won't ever see it.

"All of this has happened before, and it will all happen again."

~~~
stephengillie
Imagine your datacenter is in a container floating in space. Nuclear-powered,
advanced magnetic shielding, physically secure in Jupiter orbit, and networked
to Earth with quantum entanglement networking.

But yes, everything new is old already.

~~~
throwaway2048
and a sleek black outer finish

------
dbhacker23
I want to put by DB in ram. I just need to keep it under 64TB.
[https://www.sgi.com/products/servers/uv/uv_300_30ex.html](https://www.sgi.com/products/servers/uv/uv_300_30ex.html)

------
jopython
"also Microsoft SQL Server on top of Solaris on Sparc"

MS SQL Server DOES NOT run on Solaris on SPARC. The author should have done
his/her research before making that claim.

------
ljw1001
To press big shirts?

------
gaius
A "cloud" _is_ a mainframe to all intents and purposes.

