There's definitely financial savings to be had by doing it in-house (either literally or utilizing existing data centers as here). But what I've experienced is that companies are often attracted to Big Cloud™ not just for financial reasons but to flatten/simplify their corporate reporting structure. It is the ultimate delegation.
This isn't simply about having fewer staff (although that can reduce internal politics and inefficiencies), but making fewer decisions and more importantly reducing the potential for making the wrong decisions (which can be career-costly, regardless of right decisions made previously).
These "soft" costs/savings are rarely discussed because they cannot be measured, unlike a balance sheet. But there's definitely a whole other dimension to infrastructure outsourcing that is worthy of note.
In this case BoA are saving money but in exchange retain additional layers of staffing, that require oversight, and ultimately decision makers willing to take risks.
This is a myth directly from cloud providers marketing campaigns: that using the cloud simplifies infrastructure to the point that you don't need as much and/or as qualified staff.
In reality it is quite the trap: without experienced system administrators AWS and the like are free to dig their hands deeper and deeper into the pockets of their unsuspecting customers.
I agree the cloud providers like this idea, but I'm not sure I agree it's a myth. Businesses need to choose what competencies to focus on. For nearly every other area, businesses rely on suppliers. Auto manufacturers use machines built by other businesses and parts built by other businesses. They also obviously do some amount of in-house part and tool design. It's a balance and I think that it's a lot more likely that most businesses will want a good partner to provide infrastructure (ultimately at margins less than AWS) instead of building it themselves.
Yeah but they generally aren't renting those machines. Renting critical pieces of business infrastructure that aren't easily replaceable is opening yourself up to rent seeking on the part of your supplier. Remember the Oracle business model--lock people in and then keep raising the price just below what it would cost to rebuild from scratch.
In other examples, many airlines do lease airplanes, and even among the airlines that do own the planes, they still usually lease the engines.
As the GP said, there's a balance to be struck. Delta doesn't have the expertise or the economies of scale necessary to build their own engines from scratch, but they do have the ability to lease those engines and then have their own teams maintain them. Similarly, most companies don't have the know-how or the EoS to run their own data center, so instead they buy cloud services and then have a smaller team to maintain it. It's a win-win. Some companies like BofA might be big enough where they can run their own cloud and it be cheaper than using a cloud provider, but that's definitely not a common thing.
Rental only makes sense if there is a marketplace for used machines. Eg, planes and plane engines are components that aren't really tied to one place or company. (The "that aren't easily replaceable" from the parent post.)
But industrial robots for car manufacturing seems like rather specific hardware. If, say, the Ford plant in Claycomo, Missouri no longer needs a machine for making the F-150, who is going to rent it? How is rental a better cost-savings for Ford than buying the machine and later selling it?
I tried looking this up, but failed. The closest I found was that "over half of industrial robot purchases in North America have been made by automakers" at https://www.robotics.org/blog-article.cfm/The-History-of-Rob... .
Also, '"rent" the maintenance contracts' doesn't make sense to me. I sell maintenance contracts to my customers, I don't rent them, and I don't understand what "rent" could even mean in that context.
The parent comment mentions that Delta rents their airplane engines. I imagine that they have the capability to do general maintenance on the engines, but that the engines need to be periodically sent back to the supplier (probably GE or Rolls-Royce) for a teardown, safety inspection, and rebuild. Without regular inspections an engine can't be flown, and is essentially worthless. Not much point in actually owning the engine if you can't use it.
With that in mind, I can see support contracts as an operational expenditure.
But if "over half of industrial robot purchases in North America have been made by automakers" then that's a lot of capital expenditure. I never got the feel that the companies which make modern industrial robots primarily rented them out (akin to IBM's renting out of tabulation machines 100 years ago). I really thought they mostly sold the robots.
An assembly line isn't really a "product," per se. It's more of amalgamation of several products and technologies, many legacy, that forms a cohesive but loosely-coupled whole. You may have several vendors' products / systems in a line. So what you end up with are ongoing contracts that provide for installation, programming, troubleshooting, maintenance, etc.
There's actually a whole specialized industry that does nothing but design assembly lines. It's pretty neat.
There's Boeing which also used to have an airline. It was made illegal IIRC.
And there are things in between these two extremes.
Some airlines have a heavy maintenance organization. Some have outsourced it. Different companies have different market segments and safety records etc.
The point is, there's no one true way for all.
Of course it's a question of scale, but a lot of arguments I've seen usually come down to CapEx vs OpEx accounting which is silly because a dollar is a dollar.
The cloud is powerful, but for long running businesses who can afford to invest in the competancy it's stupidly expensive. Hybrid cloud installations are ever more the norm, private clouds increasingly capable (PaaS, IaaS, SaaS), and toolsets like Kubernetes reshape how a lot of those legacy apps get looked at.
So no... not necessarily... CapEx is not necessarily paid out all at once. So the accounting ought to be the same whether its a CapEx purchase under a Subscription Agreement/Loan, or if its an "OpEx" payment for 1 month of services.
I mean. You save the headcount in one area and you consume it in another. And you pay overhead in outsourcing costs.
It’s likely worth it. But you need to determine that on a case by case basis.
My (large) company is too incompetent to do hosting properly so cloud was a blessing for us.
At a certain scale, you need all these things with internal clouds too. Resource provisioning, basic security, and managing "insider risk" don't just go away when you own the infrastructure.
This is like saying that "You also need to have a cashier!" when you're comparing a shopping mall to a corner store.
I think it's a loooot easier to design and build a cost-efficient multi-tiered high availability application than it is to make reasonable cost projections in whatever micro-currency every seconary and tertiary service uses.
And I'm ever surprised by how time-intensive it is.
Tooling has only improved.
We were able to cram a large set of applications into a group of instances using CapRover.
Instead of 2-3 instances, and a whole bunch of RDS, we ended just recreating the databases there along with minio for storage (we write a small amount of data to S3 storage).
Colocation is probably ideal for anyone using custom hardware: like GPUs or FPGAs. If you're using "normal" CPUs, just buy dedicated instances instead.
Anyone who has been in the industry since pre-cloud explosion knows that most people just rented bare metal and at the more complicated end they might purchase their own hardware and colocate.
And for those that own their servers, many lease them and HP or Dell or whoever maintains them on site if anything goes wrong.
I'm surprised that more companies aren't playing moneyball with nice, but cheap, locations.
Believe it or not most U.S. programmers don't live in the Bay Area.
You can rent vs buy at any level : rent only machines, rent a 1/2 a rack, rent a full rack, rent a cage, rent 1/2 a dc, and so on.
Same with connectivity: you can plug your machines into a lan managed by the hosting provider, or run your own routers and peer with their in-house ISP, or you can buy peering directly in the building, or you can rent lightpath from a telco with pop in the building, or you can go rent a backhoe and start trenching across the parking lot...
Cloud is awesome for companies where the capital investment needed to deliver the SLA is too expensive.
If you’re big, and you run the numbers, lots of services are better in a datacenter that you manage. Many of the early advantages of cloud tools aren’t a differentiator today. If anything, the granular billing is a nightmare to manage in many large enterprises. There’s a reason AWS heavily targets .gov customers — the Feds probably waste billions on idle services.
I’ve found that the shitty to support or known workload services are best somebody else’s problem. Email is an example of both. Most other things are trading IT complexity with accounting complexity.
Maybe cloud usage can be an indicator whether a company has plans to maintain independence into the future, despite the loyalty-engendering words of leadership.
Situation: some manager is tasked with updating some antiquated COBOL monstrosity. They're given twelve peanuts, some shoe string, bubble gum and a little bit of duct tape. The experienced staff have all been aged out, quit out of frustration or laid off and replaced with fresh college grads for half the price which was a great deal and significantly reduced costs for several consecutive quarters, earning a well deserved healthy bonus. Unfortunately reliability of the system has been suffering and the young engineering team unanimously agrees the old system should be replaced.
Option 1: do it in house with the young team of rockstars the manager has been bragging to management about. Unfortunately, the manager secretly has reservations and has doubts about the team's experience, let alone skills. It's too risky. Not too risky to the company who can easily absorb the loss and move on, but too risky for the middle manager who will look bad and possibly get fired.
Option 2: outsource the entire thing to IBM or Oracle. Tempting, but not in the budget. Also, significant parts of the old system are on IBM anyway and the cost of hiring IBM people to fly out has been increasingly expensive after the latest round off layoffs saw Scott put out to pasture. No, it's too risky. Not risky to the company who can have uptime guarantees in the contract, but risky to the manager who will look bad if he comes in over budget.
Option 3: The Cloud. It's the 'in' thing. FAANG are all all in on it, and if it works for them it could work for us. Much of the new team used AWS for their web dev class in college, so the skills are fresh. There's very little risk to it. The staffing requirements will be dramatically reduced so costs will drop. The manager will be able to deliver an awesome demo next quarter and the S3 instance they'll host it on will only cost one or two of their peanuts. The only risk is if the manager doesn't get promoted before the problems start, which shouldn't be a problem because AWS has such great onboarding programs. It is slightly more risky for the company though, who may find themselves no longer in control of critical infrastructure.
Risk for the company isn't the same as risk for a mid level decision maker. Small fuckups carry the same penalty as huge fuckups. (you get fired) From my fake story, doing it in house has a relatively high chance of being a minor fuckup, and AWS has a relatively low chance of being a major fuckup. The manager is going to choose the low probability risk, even though the low impact risk is the company's best interest.
And let's be honest... a remarkably large fraction of small IT shops are really bad at certain basic competencies, notably backup reliability and timely application of updates. AWS is pretty good at sysadmin and accounting 101, even if it stumbles at 201.
Insightful. Can you clarify what you mean by "low probability risk" and "low impact risk" ?
impact risk - Odds of a project failure causing large impact to company.
There are two things worth considering. The BOA cloud is slow. That is to say, it is hard to procure machines to get your projects going. There is a lot of control on who will be paying for them. There is also A LOT of staff to manage it. So those 2 Billion could easily be tweaked downward when you consider the staff and lost of agility.
On the flip side, AWS is hella expensive. To do the things that they do it would cost A LOT of money. Maybe even more than they think. I have seen (and maybe a lot of people here also) small companies with million dollar AWS bills. So this number can also be tweaked up when you consider dozens of teams each having their way (at BOA scale) on the AWS console.
But I would expect that a large established company would do better to create it's own reasonably priced reasonably managed infrastructure.
Sort of rent vs build/own.
And something that seemingly everyone forgets: it's not black and white... you can have some things in the cloud while you have some thing on-prem, leveraging scalable infrastructure and redundancy while also keeping your core assets fully under your control. It's pretty easy to make an internal S3 replacement, it's hard to make a better S3.
I think some companies, like BOA, look at this as the safer alternative.
We all assume that large Cloud providers are bulletproof. That's just because we haven't seen them taken down yet. Someday we will. Just wait until there's another major world conflict. These global companies with outsourced IT will be in ruins because politics and digital DMZ's will keep them from acting cohesively.
For an organization like a bank that's worth over $400b that layer of staffing, equipment, oversight, and all the inefficiencies are a $400b insurance policy that you will be able to continue operating regardless of geographic or political events outside of your control.
I've been at multiple Fortune 50 companies at multiple levels. Many times these types of infrastructure building are just forms of empire building and an attempt to embed ones' services in the organization at a very deep level. You convert one set of costs (AWS/GCP/Azure bills) for another set of costs (Infrastructure + Payroll). You also convert one set of headaches (CSP overcharging/auditing) for another (prima donna internal infra engineers.)
Don't get me wrong, most corporate IT is deeply incompetent at their core job. It's just that you can't fix the deep problems by outsourcing the computer part.
There's this great thing about BOA-sized support contracts where you get to call up the senior product managers and demand that they build the features that you want. And if you're not happy about something, engineers will drop what they're working on to appease you.
I wonder how many customers were interested in setting up a Custom Keystore for KMS on CloudHSM?
Of course, for small or medium customers whose finger snaps are ignored... WYSIWYG. The roadblocks in not-your cloud are still your roadblocks.
But (in a vacuum) when developing a solution to serve diverse needs and stakeholders, the HiPPO effect is definitely an anti-pattern.
With CSPs, at least you have four major players (AWS, GCP, Azure, Oracle Cloud, ?IBM?) competing prices downward and multiple theoretical options. Yes, you have your in-house CSP expert biasing things, but since most of the work is external, at least you dont have a heavy internal weight on a home-built choice.
IMHO, the better option would be a CSP-agnostic Kubernetes based layer. But IMHO we're not there yet.
It was about decision maker's personal (career) risk when making decisions. Keeping things in-house means more decisions are taken in-house which increases personal risk to the management for getting things wrong. In context this is much clearer:
> This isn't simply about having fewer staff (although that can reduce internal politics and inefficiencies), but making fewer decisions and more importantly reducing the potential for making the wrong decisions (which can be career-costly, regardless of right decisions made previously).
The post above doesn't discuss cloud Vs. non-cloud security/safety at all.
To your point, we need only point to recent events to show the universal customer pain that can be caused by a bad code push at Azure or AWS.
If a company (or person) really wants complete visibility and control into their relationship with law enforcement, they have to host their own hardware... ideally, their own data center.
I think the opposite will happen. Cloud providers are becoming commoditized. Kubernetes, blah blah. Developers hate lock-in, and on a long time scale, developers drive tech decisions. Nowadays new software is cloud agnostic, and it will continue to be that way. This is commoditization, which generally drives down prices.
That said, buying commodities from a market with four participants is more like buying from a cartel than a marketplace. I definitely worry about the consolidation of the world’s IT spending going into the coffers of three or four massively profitable and morally corrupt corporations.
You and enterprise software should get together and have a introductory meeting...
There is absolutely value to delivering a product to market a year or more sooner than otherwise.
Separate from all of that there is a valid question whether the software stack at a big cloud provider gives your engineering department the ability to deliver features faster at scale and redundancy, and with lower ongoing maintenance overhead. I think this question is very much up for debate.
At a small scale there are absolutely cloud frameworks that let you deploy services without an IT department at large scale. But then there’s a realistic argument of whether the high premium for theoretical scale—when in actuality you’re serving dozens not millions of customers—is worth the high cost of capital.
E.g. several startups of mine have persistent on a couple colocated dedicated servers for over a decade at a cost of $300/mo. The equivalent deployment in the cloud is closer to $3,000/mo.
I know this because I got $300k of Azure credits to play around with and ended up not using most of them. The long-run cost was too high to move to Azure. But there were months when I “spent” $6k of credits on a dozen VMs and associated storage and network that had equivalent performance to my rented bare metal.
I’ve encountered quite a few devs whose only focus is on achieving the specified goal without considering the full security scope. I don’t blame them, that’s what they are there for. Stepping in for ensuring the customer security and regulation requirements is what I’m here for.
It's like using package managers like Maven and npm. Sure, they make going to market much faster for small organizations. But at companies that have strict security procedures, downloading unvetted stuff from the internets and putting it into production does not fly. You'll have internal repos that work with your package manager tools, but nothing gets into those repos without at least a cursory review.
The two companies I've used AWS at both operated this way. In fact, each team has multiple product accounts (for dev, testing, production). So I don't think it's unusual at all.
I've always wondered about this. In IT now instead of having 5 system operators one has 10 "cloud consultants" and nobody knows what they are doing. I've been through a couple of "migrations to the cloud" now and I've never seen a reduction in IT staff or the cost of running IT operations.
About the wrong decisioning part, if you chose the wrong "cloud" for your business the cost to change can be enormous.
The primary motivation for management is from what I've seen so far the ability to point to someone else if shit hits the fan, not cost reduction and all that other stuff.
You start wondering if 90% of corporate world are there to do nothing if not erecting roadblocks for people who are actually doing something. But then you start wondering maybe it's a good thing that all of these people get paid and could spend their money so that majority of the country has purchasing power and not live in piss poor poverty.
For example if I can project steady growth or decline of my needs, it seems that I benefit less from the infrastructure-on-demand like AWS/Azure/GC
Also, depending on the size of the organization it seems that there at least several options on data centers:
- have my own data center
- collocate my hardware in other's data center
- rent hardware
- use infrastructure-on-demand + software-as-a-service
Many organizations deploy Pivotal's
So they are not actually 'building' the infrastructure management layer from scratch.
Also for investment banks (due to exchange connectivity) , telecoms (due to peering points, physical tower locations) always going to have their 'Front-office' applications in data centers that offer lowest latency for those apps.
Midoffice applications (or BOSS as they are labeled in telecom) can live in data centers reasonably distant (in terms of ping time) from the front office.
So the infrastructure for those has a different cost profile.
Most of this applications touch trades or trade events that carry millions of dollars worth of contract value
(not Facebook's likes or favorites... )
There is a lot of logging, reconciliation, post executing enrichment that's probably going on.
Finally, from my 2nd hand understanding,
Investment banks can have from 3K to 10K applications in production. With, perhaps 20% of them having more than million lines of code.
Some of those apps are built in their own proprietray languages developed specifically to manage contracts and lifecycle of contracts (Banks had 'smart contracts' DSL and interpreters since 90s).
A company like BoA will eventually choke until this in the future.
Also, this stuff sounds great until you need improved peering, or software-defined networks or massive bandwidth improvements, etc.
... and I get that the HN community is clever but banks and other institutions like this REALLY struggle to find this talent.
$2B is a lot except when you're losing $10B a year in revenue because you can't compete in the market.
In a way this would be like Amazon arguing it should build its own bank. It's not what they're good at... Focus on your strengths.
I joined just at the point where they were moving from VMware to AWS.
It was never a cost saving exercise. It was a way to completely change the culture of the IT department. As they were one of the very first news websites, they had at the time ~18 years of cruft to deal with.
The drive to the cloud was about flexibility, and if we are honest, shaking the staffing tree to get rid of the ossified staff.
They went from three data centres (mixture of SUN and Intel blades all backed by FC) to pure AWS/heroku. The running costs went up significantly, the opex was close to £3mil in AWS bills alone, all to put text on a web page.
It will continue to rise because it offers flexibility, and as each product now has its own account, standards are difficult to enforce. This means lots of snowflake installs of X that are now critical to Y.
The cloud is more expensive for large companies, for most workloads. For SMEs its the total opposite.
The cloud was probably always more expensive if you only considered the cost of computers. But it was about so much more than that. Mostly, it was about creating a hostile environment for the BOFHs who were creating a hostile environment for everyone else.
They weren't seen as so unprofitable they had to be eliminated, but unprofitable enough that competition could triangulate their price vs pain.
If you are paying significant cash for network, then what ever you are doing you need a CDN of some sort (I assume most people are web here.)
VPS bandwidth is not unlimited, and it is a single point of failure.
The way I read this is that the operating costs are higher in the cloud, but presumably dwarfed by recouped opportunity costs. This makes sense to me, given my limited experience working in large companies where every interaction with IT involved a 2-6 week turnaround time. Need a VM? Fill out a bunch of paperwork and then wait a month. We missed a lot of opportunities that way.
The true opportunity cost in large corporations is from, IMHO, been the general attitude of "we can't do foo new thing because that's [not best practice, not (barely) supported by some other large corporation, too cheap, too big of a change, etc]."
The ability to create extremely flexible workloads in minutes means nothing when security has to approve sneezes and containers are only something you'd find in a fridge.
Its not so much the availability of physical servers (although that might be an issue) it was the features, and the time it took to port new bits and bobs.
A classic example is databases. We had a puppet script that provided a HA-mysql cluster. However, it was very fragile, never really worked and I'm mostly sure the backups never actually took place.
It would take me (a devop who was tasked with doing this sort of thing) the best part of 3 days to setup and verify. The whole thing collapsed when the DBA that made it left.
RDS is a one click operation. yes its more expensive to run long term, but the saving on staff time was enormous. I will never go back to inhouse rolled puppet, unless it was proven to be outstanding.
What was the opex before cloud?
But software licensing costs play a massive factor in the overall opex.
A cracking business opportunity: a cheap "personal cloud" for SMEs that just works? An extension would be an associated "public cloud", allowing customers to seamlessly move between their personal and the public cloud to cover short term scaling issues whilst they buy more private infrastructure.
What is FC?
There was a full rack switch, which must have been easily 400 + ports.
In fact they could have saved even more, and earlier, if they had already been in the public cloud, where you can just spin down the VMs and stop paying for them. In your own data center, a server is already sunk cost that you cannot entirely eliminate just by shutting it down.
But the results have been dramatic. The company once had
200,000 servers and roughly 60 data centers. Now, it's
pared that down to 70,000 servers, of which 8,000 are
handling the bulk of the load. And they've more than
halved their data centers down to 23.
I'm going with clickbait title. Nothing to see here other than someone getting a temporary savings which we will see as a write off in 5 years.
"Server" implies redundant power supplies, 4-?? disks, an excess of high quality capacitors, a heavy steel case, a motherboard probably 4x the area of a typical ATX motherboard, ECC RAM, far more RAM slot capacity, etc.
Furthermore, extended service contacts are the norm. Five year same day on site service isn't unusual. Five year 24/7 on call/next day replacement is typical. One year warehouse service is unheard of.
That's just the device itself. Servers also imply a rack to put them in, real estate for said racks, 24/7 HVAC for said real estate set to substantially below room temperature, and an expectation to run it (and consume electricity) continuously. You also need network architecture to support them.
This isn't even getting into the realm of stupid specced servers. 512GB RAM? Easy. 128 cores? Sure. Disk space, network capacity are cost limited: if your server is limited by network speed or disk capacity you can simply punt a briefcase full of money at your supplier and get more. You want all of that arbitrary capacity disk space to be SSD that's just an accounting problem.
It's not like Amazon doesn't know this too (e.g. they try to make the transition harder by offering lots of proprietary services that help you ramp up faster, but can't be used elsewhere)
Were the opposite true, you'd wonder about the economics of the cloud business model.
I think a more interesting way to look at this is, are the other advantages of cloud services (e.g. lots more flexibility and bundled proprietary technologies that you can't economically build yourself) actually worth the extra money?
Yes, by not using a cloud service you might save $2B a year, but does that cost you the opportunity to make even more than that, given you're probably moving slower or at least less efficiently than you otherwise could?
The potential opportunity of fast-moving cloud features needs to be weighed against the opportunity costs of slow-moving cloud features. Where bespoke solutions can immediately provide tailored performance and maximize technical capabilities, a missing feature in any of your cloud providers services can be a showstopper or unmitigatable roadblock. And while lots of the technology is past the bounds of reasonable economical replacement, some of the technologies being shared through the cloud are nigh unfathomable to recreate.
Which is to say that the black ju-ju behind Windows update probably takes making a new MS to build up to present maturity, and unless you're a certified "big boy" letting small teams somewhere else fully dictate what you can and can't do at service boundaries probably impacts you in the long run.
Based on that, and IMO/IME: the answer isn't a binary choice but a constantly shifting point on a spectrum between the two, where on-premise/local-cloud and remote-cloud services are aware of one another and maximize capabilities while minimizing costs. Hybrid installations are just stronger, and are easier to reshape according to costs.
There are plenty of reasons to choose a point in the space denominated by the axes make/adapt-existing X insource/outsource X proprietary/open . None are zero cost unless you really think your platform is good enough that enough others will write for it without you doing anything.
It's possible that you would chose to write your own because nothing in the market is even close to good. It's possible that you would open source it in the hopes that it becomes standard and that others will add new features that you will benefit from too (this is not a zero-cost option either). It's possible that you would even keep it proprietary because that's cheaper. But given the way Amazon operates it's pretty clear that they offer their own proprietary services to lower the barriers to adoption ("simply offer a better option") and raise the barriers to switching.
I don't see anything wrong with this strategy legally or morally, though I mostly avoid amazon proprietary tools because the risk of locking is one I'm not willing to embrace. There are of course situations where they aren't a problem (e.g. a tool with an anticipated finite lifetime: the cost of lock in in that case is essentially nil).
The method of encoding production database credentials was rot-13. No joke. In the Quartz interface, you could double click on a starred-out set of credentials, and it would run rot-13 on it and display the password. This was for FX, rates, credit card, mortgage, etc etc etc. Having access to this cloud system gave effective access into all of Bank of America and Merrill Lynch.
They probably save a lot of their money by using very, very bad practices.
Still only the second worst security fail I've seen.
These are probably positives from large org internal perspective.
"We want to switch x from y to z." "We can't. In fact we want to switch u from z to y."
> If they have 50 000 IT employees
BofA has 205k employees in 2019 as per Wikipedia. No way 1/4 of them are in IT.
I did recently wonder if it made sense for banks to put all their stuff on other people's computers, instead of maintaining their own, but for the bank I work at, the old on-premise systems are maintained by IBM and ridiculously restrictive, so either AWS or Azure is already a massive improvement.
A lot of them are managers, analysts, or people who work with tech, but aren't all trained software engineers. A scrum master is considered an engineer. They had a blanket update to a lot of their role titles a couple years back where everyone in tech suddenly became a Software Engineer as their title (without much meaning behind it).
(Note the 2.)
I've copy pasted it before, but i think you just made the syntax click
I know of one very large company where any request for a change in their cloud infrastructure always required a minimum two week advanced notice. In AWS such a change is just a mouse click away, and could be done in seconds.
AWS also has a really amazing integration of a large variety of services which is really hard for in-house clouds to match. I wonder how many AWS services the BoA cloud has, and how their own integration of those services matches that of AWS.
I've been in places where the engineering team plugs in an old PC under a desk somewhere, gives it a public IP address of there's the production server. That's the self-hosted equivalent of the engineering team having full AWS access and any change is a mouse click away.
I've also experienced places where any change does take a week or two of approvals even though it is hosted on public cloud.
There is a time and place for all approaches. What works best with a three person startup putting up a MVP is quite different from what is best in a very large corporation operating in a regulated environment.
This is all on owned hardware. The difference is that we're a SWE driven company (corporate IT is off in its own world, run in the more traditional way, but they don't touch engineering's production datacenters). Infrastructure teams provide APIs, not JIRA forms.
That being said, we control the HW and SW stacks end to end so I don’t have to worry as much about the nightmare scenarios the NordVPN folks went public about today. Critical given we’re in fintech ...
So much of the Fortune 500 lust for public cloud seems to come down to working around inefficient IT procurement and provisioning processes. They haven't automated, they don't maintain enough excess capacity, they haven't managed vendor relationships to assure fast order turn-around, financial controls are too onerous, etc.
Everything's virtualized but otherwise they're still operating like it's 1995.
But all this clout leads to a rather head in sand mindset on most strategic items, like technology choice, programming language choice, cloud choice (or non-choice in this case), version control choice etc. Everything was done in-house, in the most boring safest way possible tech ( mostly Java about 3 versions behind, some strange python lib where any function call was automatically logged!, and Excel & Matlab all over the quant land. I mostly sftp-ed financial data from some Quartz cloud...felt very quaint to do these sort of things in 201X. All laptops were locked down windows dell boxes on which you couldn't install anything, & ran some strange norton antivirus which hogged all the memory. My interview itself was so old fashioned. I thought since it was a Quant job, I'd get questions on math & finance. They trotted out their "chief developer" who wanted to know how to model a chair with 4 legs using Java OO. You know the 1990s Grady Booch garbage full of UML, with parent Table class & child Table & Leg Class & friend function & all that jazz. I was like Jesus this regressive inheritance based OO shit is still alive! Its a deeply old fashioned slow moving place. Very large IT budget with pretty much half of Charlotte working in some capacity for the bank. So yeah, if you had all the personnel & all the money, why wouldn't you build your own cloud. You are paying for all these people anyway, might as well give them something to do. Of all the employers I've worked for in my lifetime, this was the one place where I was personally asked NOT to work so hard, because I stayed at my desk after 5:30 pm.
(Sorry I have a regular 4 digit HN account but the bank doesn't like it if you talk about them. One of their lawyers once tracked me down because I mentioned some harmless datapoint about a technical problem I had worked on.)
Ok so now you said that won’t it be pretty trivial to identify you again? How many “4 digit HN” users have they really tracked down before .. 1?
Am I the only one completely befuddled by this number? What the fuck are they doing with these money and 200k servers? These are facebook numbers. For a bank. What?
- They have a reasonably similar number of users (a fraction, but a large one).
- Mistakes cost a lot, so they have to be a lot more careful. It's a lot easier to make money hacking a bank than hacking facebook.
- They have to comply with all sorts of regulations.
- They probably don't trust their own employees to not be trying to commit fraud.
- They have to parse data on a scale that is likely similar or greater than facebook's. To detect fraud/lost credit cards/.... To decide who to give loans to. To price insurance. To decide how to trade stocks. ...
- They have to run a physical fleet of devices in the field, outside of their control, that have to give people the right amount of money ~100% of the time.
At a glance I see that Facebook has something like 300 petabytes of data . I've worked at a bank, my team had something more like 10, but I don't think much of it was things like video that are just naturally huge. BOA is also approximately an order of magnitude bigger than the bank I was at.
One rumor I heard while there was that there had been a bug in one of our mobile apps that had been costing us a million dollars a day in server time.
As for usability, probably a degree of incompetence, mixed with design-by-committee and legacy. Edit: It's worth pointing out that banks usually don't gain or lose customers based on their UX, so it's not something that the business optimizes much.
I don't think there is anything technically difficult about almost any of the consumer software facebook offers, except scale. The same applies here but exchange "scale" for "scale, reliability, security, and regulatory compliance".
You can get in to my account by verbally relaying my grandfather's first name over the phone. You can open a bank account with a SSN and no photo id. What security? Their "security" is a fraud department, much like our credit card industry.
> Otherwise they'd be hemorrhaging money left right and center to North Korea and the likes.
This is not how transactions work.
> Edit: It's worth pointing out that banks usually don't gain or lose customers based on their UX, so it's not something that the business optimizes much.
All banks offer the same shitty experience. What does differentiate them if not their software? They offer literally nothing my local credit union doesn't offer.
> I don't think there is anything technically difficult about almost any of the consumer software facebook offers, except scale.
No argument here, but facebook at least manages to hire designers and not impose weird non-sensical patterns of auth, like "Look for this image when you log in".
I've talked with info sec employees working in their Charlotte office, and have not been given an indication that they are slacking.
Gluing together layer upon layer of legacy systems that are so old & opaque they are essentially black boxes nobody dares remove or replace.
oh and that small regional bank that was acquired 4 years ago? Yep...they've got an entirely separate stack of legacy systems. And that other jurisdiction with different data laws? Everything is different there too.
It's insane looking under the hood of these things. How banks manage to not lose half the money daily is a complete mystery to me.
13 exchanges x 3000+ securities x 2 quotes for each security x 23,400 seconds in the trading day (with the very very very generous assumption that quotes only change once per second)
That’s the inbound data alone, now consider all the algorithms they have to run on it, the models, the strategies, etc.
And then to each of their several hundred institutional clients, they have to give a unique and dynamic price.
Again, this is only the sell-side investment bank, not even looking at their absolutely gargantuan consumer bank business
Tech companies also aren't subject to the same level of scrutiny / regulatory requirements as big banks, plus the need to support old processes and software, etc
I could turn this around and ask, why would Facebook ever need infrastructure on par with BofA? I'm not an expert on infrastructure, but given their size, revenue, market position, number of customers etc, there's nothing baffling about these numbers.
When you are collecting that much of an audit trail, that data transitions from being a problem to manage to the problem to manage.
I wonder how many bespoke Kafka work-alikes there are out there that are older than the authors of kafka.
- the instantaneous nature of things in today's world is bleeding into finance, where banks want to advertise/offer near instant access to credit, loans, etc. instead of turn around time of days. in order to do that, it helps a lot to basically unify data across tons of previous disparate orgs, shared infra, etc.
- cybersecurity/auditing/compliance can be very expensive to license or contract for and occasionally has to be run on company hardware due to legal issues
Are you surprised that a an organisation handling billions of real money, millions or billions of financial transactions, and decades worth of legacy systems, can make do with only as little in servers and software development as a social network?
Apparently BOA has 4344 branches so the local infrastructure costs will add up both for branch kit and the networking costs.
Part of the organizational allure of migrating to the cloud is being able to rationalize headcount. Hybrid clouds, while bringing a lot of potential upside, are complex. While you can potentially reduce the scale of labor involved, you still have to retain all of the core skillsets you already have to maintain your on-prem locations. Then add headcount to optimize the cloud deployment, _plus_ the ever elusive skillset of the individuals that can marry the two in such a way where you show operational savings from the hybrid model.
They turned on encryption for an option and did not realize the huge price difference between the encrypted and unencrypted option meant $9000 per day in additional charges. After 1 month (30 days) this came to light, and he was fired.
EDIT: btw, he was not fired for using AWS, he was fired for making unnecessary costs.
It's scarily easy to accrue huge bills if you're not watching things closely.
§ = Q3: "The bank said Wednesday that net income excluding an impairment charge rose 4% to $7.5 billion, or an adjusted 75 cents a share. When including the $2.1 billion charge tied to the end of a partnership with First Data, net income fell to 56 cents a share"
Something doesn't add up here.
I used to work in banking... their compute needs are actually quite basic.
I think what we're really seeing here is some creative accounting around how operations and costs are accounted for. They've got to be talking about costs of managing and operating their cloud as well, and most of those costs would be in terms of training & workforce retooling.
I am glad they shared this situation. We often here marketing telling us that public cloud is always better. It's nice to see different perspectives being shared and some details around them.
It is worth doing this if your in-house engineering team is at least 100+ app service developers and have high feature churn rate.
But be warned, your systems engineering / shared technology team should be level-headed and mature and rest of your app service engineering should be good too to pull this off well. If not, you will be in serious developer productivity pain and there won't be quick and easy fixes once you have put down the capex.
Well, enters a typical bank bureaucracy, cost controls, approvals, etc. And now we are back to taking months to provision a VM!
Traditional virtualized hosting environments typically also come with varying levels of technical overheads when setting up new boxes.
What bofa have is a virtually self-service (some business approvals needed) user interface with the ability for devs to set up new virtual servers with little/no knowledge of the underlying hosting infrastructure.
cloud - someone else's computers
private cloud - someone else's computers that you own???? or that nobody else uses, but in this context you also own???
Private cloud usually refers to that same abstraction layer. Hardware provisioning is abstracted away from use and available on-demand from a pool, and likewise deployment works automatically via the same or an equivalent API.
That's very different from the operation of a typical company-owned datacenter from a couple of decades ago.
The 3rd sentence of the article, in bold, explains that the savings were from consolidating on-prem hardware, not from eschewing external cloud.
The article cites BofA itself at sayignthat savigns vs external cloud is tenuous at best:
> Right now, the bank estimates its private cloud is 25 to 30% cheaper than public providers, though it also recognizes that probably won't last forever. Still, the company believes the architecture it has built will give it leverage in negotiating contracts with these companies
For many companies, hiring and talent retention is a problem. There are some risks that are not as superficial as dollar signs on the balance sheet.
Curious if they are all in on providers like Redhat Openshift or Pivotal Cloud Foundry as their PaaS layer.
(Some context: My last employer had an AWS bill of about $3m/year that was mostly EC2. Running PCF on top of that would have been another $2m/year in licensing. And that was after the volume discount.)
Having Dell and Amazon in a bidding war over your next project is probably the best world you can be in.
For my money, you should run one data center in the same location with most of your tech talent, and a second one geographically distant, and regionally load balanced.
But we typically don’t write our software for this, and you can’t get the business to do a rewrite until they see how stupid expensive a lift and shift ends up being.
I think these narratives about saving money are typically covering up a story of how much was squandered starting five quarter ago...
Q. For many of the reasons given in this thread, etc., I'm leaning to having our own in-house servers, server farm, etc. For that, is there a source of information I will be able to use, say, as consulting, depending on the issue, an hour, day, week, month at a time, to get us past the chuckholes in the road on the usual work -- system and network planning, installation, configuration, monitoring, diagnosis, correction, etc.? E.g., can I just call Microsoft for such issues, CloudFlare, VMWare, Cisco, etc.? Assume that money to pay for the products and consulting will not be a problem.
If the broad answer is "Yes", then that will take a lot of entries off my TODO list and let me sleep better.
Cloud DB with no worry about physical hardware (in addition to S3 buckets which is mostly for files). lambdas when you don't want to manage a server, cloudformation when you don't want to manually start your stack by clicking, API gateway for quick ways to create APIs without worrying about stuff. And powerful orchestration layer that connects all of these together.
In fact, a lot of these services are bootstrapped on AWS itself. If you know your way around AWS, you can really find a sweet spot where it's cheap scalable and mostly worry free. The problem is that AWS domain knowledge in itself is _deep_ and not many people know enough about AWS to avoid large amount of pitfall and inefficiencies that a knowledgeable people can quickly spot and fix/alternate.
They used off the shelf management software that they misconfigured. Getting VMs was so painful and they never worked correctly.
When I came back from paternity leave, they had gotten no response from their automated system and deleted all my VMs even though they could have seen heavy usage. If they had manually followed up they would have gotten an out of office email but they didn't.
Those are just two of the examples of how terrible it was to use their internal cloud.
You are always going to be five years behind if you do it internally. That might be a good cost calculation and I wasn't in a position to argue.
But, it meant innovation was terrible there.
- Access to cheap real estate
- Access to enough human capital, especially a team of 80+ to operate all aspects of a cloud: hardware, switches, openstack / kubernetes
- Requires scale of 20k+ servers = 320k+ VMs and a strong inclination to not host on AWS/Azure/Others.
- Lease to servers costing about $250 / server / mo. With AMD, you could go half of that, with VMs costing $10 a month. Add network costs of $15 / 2 ports / server / month.
I wonder what the other 62000 servers are doing.
There’s so much more AWS, GCP and Azure offer these days that trading away all the stuff like metrics, monitoring, dynamo-dB, s3, lambda, scalable RDS, Spark pipelines and generally a whole suite of products that go into building a modern web app for some scripts that can provision on demand seems to be a poor choice.
Note that he leaves open the option for both public and private cloud.
$2B = 40% savings means their opex was $5B and now it's $3B. That is high expense. Guessing most of it is licensed software costs? Like Oracle on every single core?!?