Hacker News new | past | comments | ask | show | jobs | submit login
Bank of America's CEO says it's saved $2B per year by building its own cloud (businessinsider.com)
480 points by sogen 28 days ago | hide | past | web | favorite | 301 comments

I think a strong argument can be made either way.

There's definitely financial savings to be had by doing it in-house (either literally or utilizing existing data centers as here). But what I've experienced is that companies are often attracted to Big Cloud™ not just for financial reasons but to flatten/simplify their corporate reporting structure. It is the ultimate delegation.

This isn't simply about having fewer staff (although that can reduce internal politics and inefficiencies), but making fewer decisions and more importantly reducing the potential for making the wrong decisions (which can be career-costly, regardless of right decisions made previously).

These "soft" costs/savings are rarely discussed because they cannot be measured, unlike a balance sheet. But there's definitely a whole other dimension to infrastructure outsourcing that is worthy of note.

In this case BoA are saving money but in exchange retain additional layers of staffing, that require oversight, and ultimately decision makers willing to take risks.

> This isn't simply about having fewer staff (although that can reduce internal politics and inefficiencies), but making fewer decisions and more importantly reducing the potential for making the wrong decisions

This is a myth directly from cloud providers marketing campaigns: that using the cloud simplifies infrastructure to the point that you don't need as much and/or as qualified staff.

In reality it is quite the trap: without experienced system administrators AWS and the like are free to dig their hands deeper and deeper into the pockets of their unsuspecting customers.

> This is a myth directly from cloud providers marketing campaigns

I agree the cloud providers like this idea, but I'm not sure I agree it's a myth. Businesses need to choose what competencies to focus on. For nearly every other area, businesses rely on suppliers. Auto manufacturers use machines built by other businesses and parts built by other businesses. They also obviously do some amount of in-house part and tool design. It's a balance and I think that it's a lot more likely that most businesses will want a good partner to provide infrastructure (ultimately at margins less than AWS) instead of building it themselves.

> Auto manufacturers use machines built by other businesses and parts built by other businesses.

Yeah but they generally aren't renting those machines. Renting critical pieces of business infrastructure that aren't easily replaceable is opening yourself up to rent seeking on the part of your supplier. Remember the Oracle business model--lock people in and then keep raising the price just below what it would cost to rebuild from scratch.

I don't think the renting example is that true. AFAIK auto manufacturers certainly do "rent" many of the factory machinery they use, and they certainly "rent" the maintenance contracts from the companies who service them (the big exception I'm aware of being Toyota, which makes their own factory robotics).

In other examples, many airlines do lease airplanes, and even among the airlines that do own the planes, they still usually lease the engines.

As the GP said, there's a balance to be struck. Delta doesn't have the expertise or the economies of scale necessary to build their own engines from scratch, but they do have the ability to lease those engines and then have their own teams maintain them. Similarly, most companies don't have the know-how or the EoS to run their own data center, so instead they buy cloud services and then have a smaller team to maintain it. It's a win-win. Some companies like BofA might be big enough where they can run their own cloud and it be cheaper than using a cloud provider, but that's definitely not a common thing.

Could you clarify "AFAIK auto manufacturers certainly do "rent" many of the factory machinery they use, and they certainly "rent" the maintenance contracts from the companies who service them (the big exception I'm aware of being Toyota, which makes their own factory robotics)."?

Rental only makes sense if there is a marketplace for used machines. Eg, planes and plane engines are components that aren't really tied to one place or company. (The "that aren't easily replaceable" from the parent post.)

But industrial robots for car manufacturing seems like rather specific hardware. If, say, the Ford plant in Claycomo, Missouri no longer needs a machine for making the F-150, who is going to rent it? How is rental a better cost-savings for Ford than buying the machine and later selling it?

I tried looking this up, but failed. The closest I found was that "over half of industrial robot purchases in North America have been made by automakers" at https://www.robotics.org/blog-article.cfm/The-History-of-Rob... .

Also, '"rent" the maintenance contracts' doesn't make sense to me. I sell maintenance contracts to my customers, I don't rent them, and I don't understand what "rent" could even mean in that context.

The difference between renting and purchasing gets murky (sure, not literally) when you are dealing with sensitive hardware that you rely on but cannot maintain yourself. Such systems usually require frequent maintenance by the supplier. Sure, you could end your maintenance contracts, but then the systems you have purchased will very rapidly become useless. The up-front cost of hardware is often very small compared to the ongoing cost of maintenance contracts, and so the relationship ends up looking a lot like rental.

The parent comment mentions that Delta rents their airplane engines. I imagine that they have the capability to do general maintenance on the engines, but that the engines need to be periodically sent back to the supplier (probably GE or Rolls-Royce) for a teardown, safety inspection, and rebuild. Without regular inspections an engine can't be flown, and is essentially worthless. Not much point in actually owning the engine if you can't use it.

I guess I have a naive view. I think of "buy" as something which is a capital expenditure, can be depreciated, etc. while "rent" are things which fall under operational expenditure.

With that in mind, I can see support contracts as an operational expenditure.

But if "over half of industrial robot purchases in North America have been made by automakers" then that's a lot of capital expenditure. I never got the feel that the companies which make modern industrial robots primarily rented them out (akin to IBM's renting out of tabulation machines 100 years ago). I really thought they mostly sold the robots.

Many of these industrial systems are built from reusable components. In my experience at a Tier 1 OEM for the US domestic auto market, a line could be torn down and re-purposed to another line by adding or removing physical components, modifying the physical layout where appropriate, and then adjusting the PLC programming.

An assembly line isn't really a "product," per se. It's more of amalgamation of several products and technologies, many legacy, that forms a cohesive but loosely-coupled whole. You may have several vendors' products / systems in a line. So what you end up with are ongoing contracts that provide for installation, programming, troubleshooting, maintenance, etc.

There's actually a whole specialized industry that does nothing but design assembly lines. It's pretty neat.

Thank you for your response and insight!

They probably do sell them. I believe most industrial robotics systems are built custom, or at least heavily customized. But I have no idea (and I'm sure they closely guard) what percentage of their revenue comes from direct sales and what comes from support contracts. Somewhat related, I recall reading that one of the things that caused financial issues for GE Power was broken incentives around support contracts. To meet quarterly targets the sales team started offering to trade larger up-front prices for reduced costs on support contracts so as to frontload revenue. Too many customers saw the golden opportunity and GE ended up losing a ton of money.

It's more common than you think. In industries that are locked into legacy apps, cloud costs are massive. Same for industries where storing massive amounts of data for decades is normal. Healthcare and banking spring to mind. I wouldnt be surprised if big pharma are private cloud too.

There are quite many different types of airlines. If you show up for some flight, it might actually be operated by personnel that are on the books of some other airline.

There's Boeing which also used to have an airline. It was made illegal IIRC.

And there are things in between these two extremes.

Some airlines have a heavy maintenance organization. Some have outsourced it. Different companies have different market segments and safety records etc.

The point is, there's no one true way for all.

Why is it better to be locked into your own company’s IT department rather than AWS? I would argue AWS gives a better ROI than 95% of corporate IT departments, and generally less hostile to boot.

One hopes you're not arguing that companies shouldn't bother trying to figure out if they're in the 95% or the 5%.

No, it wouldn’t make sense for me to make the 95/5 distinction only to prescribe a 100/0 response. My post was a call for sober judgment, which is admittedly not very popular here.

Making up opinions and then arguing against them is not popular here.

No straw men here Greg. Looks like you’ve confused this thread with another.

Many people did not realize that IT infrastructure is a utility, it should not be owned all the time by most of the companies but purchased when needed. AWS couldn’t exist if it wasn’t better than on-prem IT. How better is defined is different case by case. Usually it includes TCO, ROI, turn around time, efficiency in terms of capacity management and fee more. These are the things many small companies do not have to deal with or easily solved. Yet there are many companies choosing AWS to solve these problems for them. In my experience AWS really start to make sense if you are spending more than 100.000 / month on infrastructure.

I think a company that saves $2B (meaning they were able and willing to spend even more before they had this idea) is more than able to have multiple competencies. Acquisition and vertical integration is how many companies became massive.

Of course it's a question of scale, but a lot of arguments I've seen usually come down to CapEx vs OpEx accounting which is silly because a dollar is a dollar.

CapEx vs OpEx isn't about raw numbers, it's about who can make the spending decisions and how quickly they can be made. At most companies getting a couple additional servers is a lot easier when you're running on public cloud vs running on-prem.

The general trend with cloud investment is that it makes utmost sense early in a startup or business expansion when time to market is the most important and uncertainty is high, but once the business stabalizes the assets move in house because of costs.

The cloud is powerful, but for long running businesses who can afford to invest in the competancy it's stupidly expensive. Hybrid cloud installations are ever more the norm, private clouds increasingly capable (PaaS, IaaS, SaaS), and toolsets like Kubernetes reshape how a lot of those legacy apps get looked at.

the difference is CapEx is dropped all at once and OpEx is over time. so the accounting is not the same

You can use financing arrangements to avoid dropping the money all at once -- for example: A service provider offers you a monthly subscription or "lease-to-own deal" where you sign a contract to buy the software or hardware in exchange for monthly payments to be made over a period of time; that could be indefinite, or you become the owner of the hardware after the end of its useful life. Or just go to a bank, and they will be happy to create a loan and write a check out to your vendor for the lump sum, and so you are only responsible for making required monthly payments of the interest plus any principal, etc, required by the contract.

So no... not necessarily... CapEx is not necessarily paid out all at once. So the accounting ought to be the same whether its a CapEx purchase under a Subscription Agreement/Loan, or if its an "OpEx" payment for 1 month of services.

in the case of computer equipment that needs replacement very 4-5 yrs (or less depending on compute and other reqs) you’re perpetually in a lease and this starts to become opex

Yeah but you don’t need data center engineers, real estate specialists. Power specialists. Etc.

Sure. You just need more people to keep an eye on what your consumption rates are. Or to handle access control systems or manage service accounts and client certificates, ensure firewalls, additional tooling around firewalls to ensure things aren’t getting exposed outside. Countless cost projections on top of a complicated billing system.

I mean. You save the headcount in one area and you consume it in another. And you pay overhead in outsourcing costs.

It’s likely worth it. But you need to determine that on a case by case basis.

My (large) company is too incompetent to do hosting properly so cloud was a blessing for us.

> Sure. You just need more people to keep an eye on what your consumption rates are. Or to handle access control systems or manage service accounts and client certificates, ensure firewalls, additional tooling around firewalls to ensure things aren’t getting exposed outside. Countless cost projections on top of a complicated billing system.

At a certain scale, you need all these things with internal clouds too. Resource provisioning, basic security, and managing "insider risk" don't just go away when you own the infrastructure.

Not nearly in the same way.

This is like saying that "You also need to have a cashier!" when you're comparing a shopping mall to a corner store.

How so? Presumably someone with an internal 'cloud' needs someone to be responsible for a large kube cluster, to handle a uniform way to do scheduling/resource management, etc. Having an engineer who knows how to do this in AWS doesn't seem more expensive than an engineer who can roll their own internal 'cloud' system...

Just curious, do you put 'cloud' in quotes to indicate that you're using a different definition of the word?

> Countless cost projections on top of a complicated billing system

I think it's a loooot easier to design and build a cost-efficient multi-tiered high availability application than it is to make reasonable cost projections in whatever micro-currency every seconary and tertiary service uses.

And I'm ever surprised by how time-intensive it is.

You can outsource the physical data center and connectivity without paying massive premium for a software stack which in many ways requires more expertise to operate at scale and redundancy than the in-house alternative, while costing an order of magnitude more money and locking you into a potential competitor’s ecosystem.

You can rent dedicated server or onprem ? people always go from cloud to building your own cpu in these threads, never in the middle

Yeah we literally turned our $7k a month AWS bill into $1k a month by changing to 3 dedicated servers (Fully managed!) and cloudflare. It's better in every conceivable way.

For anyone doing this... look around.

Tooling has only improved.

We were able to cram a large set of applications into a group of instances using CapRover.

Instead of 2-3 instances, and a whole bunch of RDS, we ended just recreating the databases there along with minio for storage (we write a small amount of data to S3 storage).

what instance types were you using? do the dedis have the same specs as the ec2 instances? what about contracts (do you have one on the dedis?) are you only using servers or other aws services?

Colocation is another in-between. Build your own server, but the racks / datacenter / power delivery / real estate / network is all rented from another group.

Colocation is probably ideal for anyone using custom hardware: like GPUs or FPGAs. If you're using "normal" CPUs, just buy dedicated instances instead.

Ya, I feel like this must be coming from the younger crowd who is just used to seeing all the cloud marketing comparing cloud computer to building your own data center.

Anyone who has been in the industry since pre-cloud explosion knows that most people just rented bare metal and at the more complicated end they might purchase their own hardware and colocate.

And for those that own their servers, many lease them and HP or Dell or whoever maintains them on site if anything goes wrong.

Yes, this also seems weird to me. And I think it might be a cultural thing, I noticed that in Europe renting dedicated servers is far more popular than in the US.

Maybe it’s because wages are a lot lower in Europe? Better to outsource to the cloud than hire more people if people are expensive.

IDK where you get the 'wages are a lot lower in Europe' thing. There are lower cost and higher cost places, but people are expensive and talent is mobile.

I don’t know, in the Bay Area 150k is entry level and big companies pay 250k or much higher to senior engineers. Whenever I hear about pay in Europe it’s a fraction of that. It may be worth it for a different lifestyle, I’m not making that argument. Regardless, from the company POV it must lead to different choices about buy vs. build.

This is why you see a lot of insurance companies, for instance, with their tech folks quartered in Ohio. Easily half the cost for 90% of the talent.

I'm surprised that more companies aren't playing moneyball with nice, but cheap, locations.

Bay area or NYC sure, that's like being in London, but what about the Midwest, or any of the places where you can easily set up an IT shop without paying a premium for space?

My impression is that programmer and IT salaries are SF>>NYC>>London, even if living costs aren’t similarly different.

how will you attract the talent?

- Get your own bedroom. - 30 minute commute. - Not everyone you meet will be in tech. - You'll be rich compared to everyone else in town. - Partner with local academic institutions to offer recognized research projects and training. 20% time or 3 month project stints etc. Adjunct professor ships for people with an existing research record. - Throw in X free flights to east/west coast, home town etc.

there’s 30 minute commutes in san diego, la and sf if you live in the right areas. what about weather? cali has really nice weather. schools / doctoral candidates / etc are basically useless in the real world. if i’m making enough money the free flights are useless. nothing you said here makes me want to move to the middle of the US. exactly what i’m looking at is the lack of opportunity. say i decide to move for some job and hate it. now what? pick up the family and move again?

The Bay Area is a tiny fraction of the U.S.

Believe it or not most U.S. programmers don't live in the Bay Area.

They 100% are. They're not even close.

More likely about trust/security/privacy issues.

The popularity of MBA is nearly zero in the EU

Renting bare metal is exactly how the business worked prior to the rise of EC2. e.g Rackspace, Linode.

You can rent vs buy at any level : rent only machines, rent a 1/2 a rack, rent a full rack, rent a cage, rent 1/2 a dc, and so on.

Same with connectivity: you can plug your machines into a lan managed by the hosting provider, or run your own routers and peer with their in-house ISP, or you can buy peering directly in the building, or you can rent lightpath from a telco with pop in the building, or you can go rent a backhoe and start trenching across the parking lot...

If you rent the backhoe... make sure you call before you dig and don't cut one of my fibers, please.

Well spotted.

Sure you do, you’re just blending them into the rate, at a significant margin.

Cloud is awesome for companies where the capital investment needed to deliver the SLA is too expensive.

If you’re big, and you run the numbers, lots of services are better in a datacenter that you manage. Many of the early advantages of cloud tools aren’t a differentiator today. If anything, the granular billing is a nightmare to manage in many large enterprises. There’s a reason AWS heavily targets .gov customers — the Feds probably waste billions on idle services.

I’ve found that the shitty to support or known workload services are best somebody else’s problem. Email is an example of both. Most other things are trading IT complexity with accounting complexity.

Counterpoint: Granular billing of real money can actually drive behavioral changes in ways that fake money and/or letting one department slide because you’re hoping it’s head will rubber stamp your promotion doesn’t.

If you reduce companies to "developers hiring developers, led by a couple of business people," how resilient will that company be? I'd say it keeps them at a disadvantage for either elimination or acquisition, but dependent nonetheless.

Maybe cloud usage can be an indicator whether a company has plans to maintain independence into the future, despite the loyalty-engendering words of leadership.

If you build your own DC you would outsource that to a consultancy or contractor.

Nor do you have to pay the rent on the buildings where those things happen.

Why does this argument apply to AWS but not to desktop IT providers, paper office supplies, janitors, cafeteria services, and everything else that a business hires vendors for?

There's nothing simple about debugging opaque error messages & client quirks on AWS.

I don't necessarily think you're wrong, but I think you're missing the point.

Situation: some manager is tasked with updating some antiquated COBOL monstrosity. They're given twelve peanuts, some shoe string, bubble gum and a little bit of duct tape. The experienced staff have all been aged out, quit out of frustration or laid off and replaced with fresh college grads for half the price which was a great deal and significantly reduced costs for several consecutive quarters, earning a well deserved healthy bonus. Unfortunately reliability of the system has been suffering and the young engineering team unanimously agrees the old system should be replaced.

Option 1: do it in house with the young team of rockstars the manager has been bragging to management about. Unfortunately, the manager secretly has reservations and has doubts about the team's experience, let alone skills. It's too risky. Not too risky to the company who can easily absorb the loss and move on, but too risky for the middle manager who will look bad and possibly get fired.

Option 2: outsource the entire thing to IBM or Oracle. Tempting, but not in the budget. Also, significant parts of the old system are on IBM anyway and the cost of hiring IBM people to fly out has been increasingly expensive after the latest round off layoffs saw Scott put out to pasture. No, it's too risky. Not risky to the company who can have uptime guarantees in the contract, but risky to the manager who will look bad if he comes in over budget.

Option 3: The Cloud. It's the 'in' thing. FAANG are all all in on it, and if it works for them it could work for us. Much of the new team used AWS for their web dev class in college, so the skills are fresh. There's very little risk to it. The staffing requirements will be dramatically reduced so costs will drop. The manager will be able to deliver an awesome demo next quarter and the S3 instance they'll host it on will only cost one or two of their peanuts. The only risk is if the manager doesn't get promoted before the problems start, which shouldn't be a problem because AWS has such great onboarding programs. It is slightly more risky for the company though, who may find themselves no longer in control of critical infrastructure.

Risk for the company isn't the same as risk for a mid level decision maker. Small fuckups carry the same penalty as huge fuckups. (you get fired) From my fake story, doing it in house has a relatively high chance of being a minor fuckup, and AWS has a relatively low chance of being a major fuckup. The manager is going to choose the low probability risk, even though the low impact risk is the company's best interest.

And let's be honest... a remarkably large fraction of small IT shops are really bad at certain basic competencies, notably backup reliability and timely application of updates. AWS is pretty good at sysadmin and accounting 101, even if it stumbles at 201.

> Risk for the company isn't the same as risk for a mid level decision maker. Small fuckups carry the same penalty as huge fuckups. (you get fired) From my fake story, doing it in house has a relatively high chance of being a minor fuckup, and AWS has a relatively low chance of being a major fuckup. The manager is going to choose the low probability risk, even though the low impact risk is the company's best interest.

Insightful. Can you clarify what you mean by "low probability risk" and "low impact risk" ?

probability - Odds of a project being a success at achieving it's goals.

impact risk - Odds of a project failure causing large impact to company.

Most medium size companies still would be using a data center provider today to handle the server/hardware maintenance and security. I think it comes down more to how likely is a workload going to change or interact with cloud services or api’s. If you have a CRM system used by a 15 person sales team growing one person a year, it probably is cheaper to self host in some ways. At the same time, if it goes down, someone internally needs to fix it. If all the sudden you want to run some data analytics project on the prospect and customer data, potentially the project is road blocked if more server space needs to be provisioned, etc.

If you already have IT staff in-house it ends up being a duplication of roles and an added layer of complexity. I get some servers and/or resources can benefit from autoscale. USGS gets slammed after a major earthquake so having autoscale capability is important. Or you want to co-locate some servers (payroll) for disaster recovery purposes. There are some legitimate uses cases, but often cloud just feels like renting servers that you could own and manage in-house.

wholeheartedly agree. Also very few companies, need scaling like netflix. But hype of AWS makes inexperienced architect to choose stack without looking into pricing and complexity. Later projects are continued by throwing more resources on cloud.

I worked on the internal BOA cloud. My team wrote software to procure VM's via a website, versus the older (still current) way of doing it manually.

There are two things worth considering. The BOA cloud is slow. That is to say, it is hard to procure machines to get your projects going. There is a lot of control on who will be paying for them. There is also A LOT of staff to manage it. So those 2 Billion could easily be tweaked downward when you consider the staff and lost of agility.

On the flip side, AWS is hella expensive. To do the things that they do it would cost A LOT of money. Maybe even more than they think. I have seen (and maybe a lot of people here also) small companies with million dollar AWS bills. So this number can also be tweaked up when you consider dozens of teams each having their way (at BOA scale) on the AWS console.

Maybe it can make sense for a small company trying to scale where time is a factor using aws.

But I would expect that a large established company would do better to create it's own reasonably priced reasonably managed infrastructure.

Sort of rent vs build/own.

That's generally the economic case put out in cloud planning: a startup or expansion does great in the cloud with quick time to market, but once that business has matured and the risk is gone the assets move in-house because it's wildly cheaper.

And something that seemingly everyone forgets: it's not black and white... you can have some things in the cloud while you have some thing on-prem, leveraging scalable infrastructure and redundancy while also keeping your core assets fully under your control. It's pretty easy to make an internal S3 replacement, it's hard to make a better S3.

not necessarily. for each of these “we build our own cloud” stories there’s quite a few more stories of large enterprises saving tons switching away from their own cloud. BOA themselves know their savings won’t last and they’ll be switching back. They’re already in negotiations

"I have seen (and maybe a lot of people here also) small companies with million dollar AWS bills." Ultimately,it doesn't matter, if the value it provides is greater than the AWS bill.

I am not sure I agree with that. Overpaying for something is still overpaying even if it provides value. Do you spend 100% of your salary on food? You will die if you don't eat. But somehow, the food required to keep you alive probably costs less than your salary.

> In this case BoA are saving money but in exchange retain additional layers of staffing, that require oversight, and ultimately decision makers willing to take risks.

I think some companies, like BOA, look at this as the safer alternative.

We all assume that large Cloud providers are bulletproof. That's just because we haven't seen them taken down yet. Someday we will. Just wait until there's another major world conflict. These global companies with outsourced IT will be in ruins because politics and digital DMZ's will keep them from acting cohesively.

For an organization like a bank that's worth over $400b that layer of staffing, equipment, oversight, and all the inefficiencies are a $400b insurance policy that you will be able to continue operating regardless of geographic or political events outside of your control.

> In this case BoA are saving money but in exchange retain additional layers of staffing, that require oversight, and ultimately decision makers willing to take risks.

I've been at multiple Fortune 50 companies at multiple levels. Many times these types of infrastructure building are just forms of empire building and an attempt to embed ones' services in the organization at a very deep level. You convert one set of costs (AWS/GCP/Azure bills) for another set of costs (Infrastructure + Payroll). You also convert one set of headaches (CSP overcharging/auditing) for another (prima donna internal infra engineers.)

What's worse, someone being a prima donna or the unchangeable facts of AWS operation policies? You will always encounter roadblocks and misfeatures, but can you get around them?

Don't get me wrong, most corporate IT is deeply incompetent at their core job. It's just that you can't fix the deep problems by outsourcing the computer part.

> What's worse, someone being a prima donna or the unchangeable facts of AWS operation policies?

There's this great thing about BOA-sized support contracts where you get to call up the senior product managers and demand that they build the features that you want. And if you're not happy about something, engineers will drop what they're working on to appease you.

I wonder how many customers were interested in setting up a Custom Keystore for KMS on CloudHSM?


Can confirm, I'd say 20-30% of all Azure features come directly from the product teams' Partner Advisory Council (read: top customers) meetings. Like "snap your fingers" implementation.

And what's the alternative? Losing your biggest customers because you couldn't be assed to put a new button on some load-balancer configuration?

Of course, for small or medium customers whose finger snaps are ignored... WYSIWYG. The roadblocks in not-your cloud are still your roadblocks.

Sorry, I guess it wasn't clear that I wasn't complaining or anything, just confirming the facts on the ground.

But (in a vacuum) when developing a solution to serve diverse needs and stakeholders, the HiPPO effect is definitely an anti-pattern.

Hopefully you'd think they compared those costs and still found the in-house option would be cheaper in the long run. Who knows, though, or who knows how accurate their estimates were.

Once you are deeply embedded with the in-house option, you have no competition. Sometimes distinct layers bleed into each other and things get even more messy to untangle. Also, sometimes due to embarrassment alone, you continue with the in-house option.

With CSPs, at least you have four major players (AWS, GCP, Azure, Oracle Cloud, ?IBM?) competing prices downward and multiple theoretical options. Yes, you have your in-house CSP expert biasing things, but since most of the work is external, at least you dont have a heavy internal weight on a home-built choice.

IMHO, the better option would be a CSP-agnostic Kubernetes based layer. But IMHO we're not there yet.

I believe you misread the quoted remark.

It was about decision maker's personal (career) risk when making decisions. Keeping things in-house means more decisions are taken in-house which increases personal risk to the management for getting things wrong. In context this is much clearer:

> This isn't simply about having fewer staff (although that can reduce internal politics and inefficiencies), but making fewer decisions and more importantly reducing the potential for making the wrong decisions (which can be career-costly, regardless of right decisions made previously).

The post above doesn't discuss cloud Vs. non-cloud security/safety at all.

BOA also didn't (doesn't) really trust the public cloud. During several all hands meetings in the org, the inevitable question of "when are we gonna start using the public cloud" came up and the leaders would always say "we aren't because it isn't secure enough" or "we'd rather maintain our current governance model over what our devs build".

To your point, we need only point to recent events to show the universal customer pain that can be caused by a bad code push at Azure or AWS.

A factor that is easy for developers to overlook is that the legal system still cares who owns the hardware. So if all your stuff is at Amazon, a warrant can be served to Amazon and they will let the cops into your stuff. No contract can keep that from happening.

If a company (or person) really wants complete visibility and control into their relationship with law enforcement, they have to host their own hardware... ideally, their own data center.

Exactly. Jeff Bezos tweets something that gets Xi’s panties in a ruffle and AWS is blocked in China and Bank of America can’t do cross border payments... no way should such a big company have their eggs in such a basket.

...but they are also acumulating know how which they might be able to sell in decades to come. Even more if the scenario of mainframes, IBM software, Oracle databases,... will happen and sooner or later I believe it will. Currently cloud providers are operating on absolute minimum, to get as much customers as possible into their trap. And for sure, it is highly beneficial for startups that cant afford to buy on premise hardware, but this wont last forever. History has shown that immediately when you are vendor locked, you will be charged for your dependance. Even more if people beeing able to build on premise infrastructure are becoming rare beasts. I would be glad to see I am wrong but history proves it is best teacher.

Operating at a minimum? How do you come to that conclusion? From a pure cost of infrastructure perspective (ignoring all the human elements), the cloud is always more expensive than any other option. It’s got some of the most ridiculous margins and price gouging of any business out there.

I think the opposite will happen. Cloud providers are becoming commoditized. Kubernetes, blah blah. Developers hate lock-in, and on a long time scale, developers drive tech decisions. Nowadays new software is cloud agnostic, and it will continue to be that way. This is commoditization, which generally drives down prices.

That said, buying commodities from a market with four participants is more like buying from a cartel than a marketplace. I definitely worry about the consolidation of the world’s IT spending going into the coffers of three or four massively profitable and morally corrupt corporations.

It’s got some of the most ridiculous margins and price gouging of any business out there.

You and enterprise software should get together and have a introductory meeting...

Great call out for Kubernetes. I’m a full believer that it will turn cloud providers into dumb aggregations of compute/network/storage (like simple utility companies) for the deployment of CRDs and Operators, which will obviate the need for AWS services like EBS, Route 53, RDS, and many many other offerings. This will lower the cost to entry for competitors and lower prices for everyone.

I think Bezos is even colder than Ellison, it's going to be sad and hilarious when he decides to put the squeeze on those running on the cloud.

A good chunk of the internet is already running just because Bezos alows it that way. The prices will go up substantially the day Lauren Sanchez walks away with his billions.

Also reducing friction in getting products to market. A small team with their own BigCloud account is probably going to deliver a product to market much faster than one that needs to go through the friction of dealing with the various IT departments which invariably exist at large companies.

There is absolutely value to delivering a product to market a year or more sooner than otherwise.

This is a false dichotomy. There is the issue of restrictive IT departments, legal review, compliance review, extended QA, documentation, product literature, etc. all extending a product launch timeframe...

Separate from all of that there is a valid question whether the software stack at a big cloud provider gives your engineering department the ability to deliver features faster at scale and redundancy, and with lower ongoing maintenance overhead. I think this question is very much up for debate.

At a small scale there are absolutely cloud frameworks that let you deploy services without an IT department at large scale. But then there’s a realistic argument of whether the high premium for theoretical scale—when in actuality you’re serving dozens not millions of customers—is worth the high cost of capital.

E.g. several startups of mine have persistent on a couple colocated dedicated servers for over a decade at a cost of $300/mo. The equivalent deployment in the cloud is closer to $3,000/mo.

I know this because I got $300k of Azure credits to play around with and ended up not using most of them. The long-run cost was too high to move to Azure. But there were months when I “spent” $6k of credits on a dozen VMs and associated storage and network that had equivalent performance to my rented bare metal.

The friction from various IT departments are often beneficial. I’m probably speaking from a point of bias as I’m one of those departments, but we often have to sanity check the implementation of things and review the security implications.

I’ve encountered quite a few devs whose only focus is on achieving the specified goal without considering the full security scope. I don’t blame them, that’s what they are there for. Stepping in for ensuring the customer security and regulation requirements is what I’m here for.

But at a large company, how likely is it that a team is going to have their own AWS account AND have free rein over it? You'll still have all of the friction for compliance and security. And turf battles.

It's like using package managers like Maven and npm. Sure, they make going to market much faster for small organizations. But at companies that have strict security procedures, downloading unvetted stuff from the internets and putting it into production does not fly. You'll have internal repos that work with your package manager tools, but nothing gets into those repos without at least a cursory review.

But at a large company, how likely is it that a team is going to have their own AWS account AND have free rein over it?

The two companies I've used AWS at both operated this way. In fact, each team has multiple product accounts (for dev, testing, production). So I don't think it's unusual at all.

> This isn't simply about having fewer staff

I've always wondered about this. In IT now instead of having 5 system operators one has 10 "cloud consultants" and nobody knows what they are doing. I've been through a couple of "migrations to the cloud" now and I've never seen a reduction in IT staff or the cost of running IT operations.

About the wrong decisioning part, if you chose the wrong "cloud" for your business the cost to change can be enormous.

The primary motivation for management is from what I've seen so far the ability to point to someone else if shit hits the fan, not cost reduction and all that other stuff.

And nobody got fired for purchasing IBM.

You start wondering if 90% of corporate world are there to do nothing if not erecting roadblocks for people who are actually doing something. But then you start wondering maybe it's a good thing that all of these people get paid and could spend their money so that majority of the country has purchasing power and not live in piss poor poverty.

Would the cost/savings also depend on the 'predictability' of the infrastructure needs? To me, it seems like there is a dependency.

For example if I can project steady growth or decline of my needs, it seems that I benefit less from the infrastructure-on-demand like AWS/Azure/GC

Also, depending on the size of the organization it seems that there at least several options on data centers:

  - have my own data center
  - collocate my hardware in other's data center
  - rent hardware
  - use infrastructure-on-demand + software-as-a-service
AWS/Azure/GC are in 4th category, but cost savings in terms of electricity/building capacity/peering points are available in the first 3 as well.

Many organizations deploy Pivotal's https://pivotal.io/industries/financial-services

or Redhat/OpenStack

So they are not actually 'building' the infrastructure management layer from scratch.

Also for investment banks (due to exchange connectivity) , telecoms (due to peering points, physical tower locations) always going to have their 'Front-office' applications in data centers that offer lowest latency for those apps.

Midoffice applications (or BOSS as they are labeled in telecom) can live in data centers reasonably distant (in terms of ping time) from the front office. So the infrastructure for those has a different cost profile.

Most of this applications touch trades or trade events that carry millions of dollars worth of contract value (not Facebook's likes or favorites... )

There is a lot of logging, reconciliation, post executing enrichment that's probably going on.

Finally, from my 2nd hand understanding,

Investment banks can have from 3K to 10K applications in production. With, perhaps 20% of them having more than million lines of code.

Some of those apps are built in their own proprietray languages developed specifically to manage contracts and lifecycle of contracts (Banks had 'smart contracts' DSL and interpreters since 90s).


IMO there's just a lot of factors that will vary from business to business.

There's also the argument that can be made for rapid iteration.

A company like BoA will eventually choke until this in the future.

Also, this stuff sounds great until you need improved peering, or software-defined networks or massive bandwidth improvements, etc.

... and I get that the HN community is clever but banks and other institutions like this REALLY struggle to find this talent.

$2B is a lot except when you're losing $10B a year in revenue because you can't compete in the market.

In a way this would be like Amazon arguing it should build its own bank. It's not what they're good at... Focus on your strengths.

I used to work at a very large financial publishing company.

I joined just at the point where they were moving from VMware to AWS.

It was never a cost saving exercise. It was a way to completely change the culture of the IT department. As they were one of the very first news websites, they had at the time ~18 years of cruft to deal with.

The drive to the cloud was about flexibility, and if we are honest, shaking the staffing tree to get rid of the ossified staff.

They went from three data centres (mixture of SUN and Intel blades all backed by FC) to pure AWS/heroku. The running costs went up significantly, the opex was close to £3mil in AWS bills alone, all to put text on a web page.

It will continue to rise because it offers flexibility, and as each product now has its own account, standards are difficult to enforce. This means lots of snowflake installs of X that are now critical to Y.

The cloud is more expensive for large companies, for most workloads. For SMEs its the total opposite.

I used to work at a very large and entrenched publishing-cum-tech company, too. Sounds like they were different companies, but very much the same rationale for migrating to a well-known cloud service provider. The ossified and entrenched IT department was becoming one of the biggest challenges to quickly reacting to market challenges and opportunities, and was long overdue for a few boots planted firmly to the rear end.

The cloud was probably always more expensive if you only considered the cost of computers. But it was about so much more than that. Mostly, it was about creating a hostile environment for the BOFHs who were creating a hostile environment for everyone else.

Another way of framing that: BOFHs went from being necessary to the business to preventing the business from acheiving its goals, and got replaced by competition that was more amenable at a higher price.

They weren't seen as so unprofitable they had to be eliminated, but unprofitable enough that competition could triangulate their price vs pain.

It is also more expensive for very small companies and individual developers. You need pay extra for inbound/outbound traffic and any persistent disk space to cloud providers, whereas these are all included in the plans provided by most VPS providers.

Its not all that clear cut.

If you are paying significant cash for network, then what ever you are doing you need a CDN of some sort (I assume most people are web here.)

VPS bandwidth is not unlimited, and it is a single point of failure.

> It will continue to rise because it offers flexibility ... The cloud is more expensive for large companies, for most workloads.

The way I read this is that the operating costs are higher in the cloud, but presumably dwarfed by recouped opportunity costs. This makes sense to me, given my limited experience working in large companies where every interaction with IT involved a 2-6 week turnaround time. Need a VM? Fill out a bunch of paperwork and then wait a month. We missed a lot of opportunities that way.

A private cloud can fix that turnaround time just as easily.

The true opportunity cost in large corporations is from, IMHO, been the general attitude of "we can't do foo new thing because that's [not best practice, not (barely) supported by some other large corporation, too cheap, too big of a change, etc]."

The ability to create extremely flexible workloads in minutes means nothing when security has to approve sneezes and containers are only something you'd find in a fridge.

Pretty much.

Its not so much the availability of physical servers (although that might be an issue) it was the features, and the time it took to port new bits and bobs.

A classic example is databases. We had a puppet script that provided a HA-mysql cluster. However, it was very fragile, never really worked and I'm mostly sure the backups never actually took place.

It would take me (a devop who was tasked with doing this sort of thing) the best part of 3 days to setup and verify. The whole thing collapsed when the DBA that made it left.

RDS is a one click operation. yes its more expensive to run long term, but the saving on staff time was enormous. I will never go back to inhouse rolled puppet, unless it was proven to be outstanding.

> the opex was close to £3mil in AWS bills alone

What was the opex before cloud?

Data centers and the contracting companies’ overhead of managing them. If they’re not that great at it, you lose a lot in terms of availability, business agility, etc. Like, imagine using Softlayer as your provider except even worse. You have to manage ticketing systems for these folks specifically, you need to handle leases for colocation, ad infinitum when you have to manage an enterprise class datacenter and even then it doesn’t mean you get enterprise results.

Thats difficult to work out. Pure hosting was easily less than £1M

But software licensing costs play a massive factor in the overall opex.

> For SMEs its the total opposite.

A cracking business opportunity: a cheap "personal cloud" for SMEs that just works? An extension would be an associated "public cloud", allowing customers to seamlessly move between their personal and the public cloud to cover short term scaling issues whilst they buy more private infrastructure.

> mixture of SUN and Intel blades all backed by FC

What is FC?

Sorry, yes, Its fibre channel.

There was a full rack switch, which must have been easily 400 + ports.

I'd guess fibre channel, to connect the blade servers to storage, like a SAN.

does that account for the zero capex?

Looks like $2B of savings are from reducing the number of servers (200,000 servers earlier to 70,000 now). That has nothing to do with the advantages of a private cloud over public, etc. They were just over-deployed and by reducing the number of servers they would have enjoyed similar savings no matter where they were running them.

In fact they could have saved even more, and earlier, if they had already been in the public cloud, where you can just spin down the VMs and stop paying for them. In your own data center, a server is already sunk cost that you cannot entirely eliminate just by shutting it down.

  But the results have been dramatic. The company once had
  200,000 servers and roughly 60 data centers. Now, it's 
  pared that down to 70,000 servers, of which 8,000 are 
  handling the bulk of the load. And they've more than 
  halved their data centers down to 23.
They say they save 25%-30% over the cloud, which is to be expected since the cloud has a profit margin. They also say that might not last.

I'm going with clickbait title. Nothing to see here other than someone getting a temporary savings which we will see as a write off in 5 years.

Is $15,000 per server normal?


"Server" implies redundant power supplies, 4-?? disks, an excess of high quality capacitors, a heavy steel case, a motherboard probably 4x the area of a typical ATX motherboard, ECC RAM, far more RAM slot capacity, etc.

Furthermore, extended service contacts are the norm. Five year same day on site service isn't unusual. Five year 24/7 on call/next day replacement is typical. One year warehouse service is unheard of.

That's just the device itself. Servers also imply a rack to put them in, real estate for said racks, 24/7 HVAC for said real estate set to substantially below room temperature, and an expectation to run it (and consume electricity) continuously. You also need network architecture to support them.

This isn't even getting into the realm of stupid specced servers. 512GB RAM? Easy. 128 cores? Sure. Disk space, network capacity are cost limited: if your server is limited by network speed or disk capacity you can simply punt a briefcase full of money at your supplier and get more. You want all of that arbitrary capacity disk space to be SSD that's just an accounting problem.

When you are a large enterprise ordering from the likes of HP, $15k is a budget server. I'd see bills for "Hadoop nodes" that were $650k/rack.

I would imagine switches, firewalls, UPS, temp control, fixtures, and facility security are also significant cost factors.

A top tier Xeon CPU is $10k alone.

Once you get into the TB range with memory and disk, and 64+ cores, easily.

I'm not really sure why this should be a surprise. Just as big companies use a mixture of owned and rented real estate for various reasons, the same is true of other large expenses. If you have a large, predictable, core workload it makes sense to bring it in house, and use the elastic (rented) resources for unpredictable stuff (e.g. new efforts).

It's not like Amazon doesn't know this too (e.g. they try to make the transition harder by offering lots of proprietary services that help you ramp up faster, but can't be used elsewhere)

Agreed - past a certain (probably fairly massive) threshold of predictable long term load, it seems clear renting resources is going to be more expensive than the all-in cost of owning and operating your own.

Were the opposite true, you'd wonder about the economics of the cloud business model.

I think a more interesting way to look at this is, are the other advantages of cloud services (e.g. lots more flexibility and bundled proprietary technologies that you can't economically build yourself) actually worth the extra money?

Yes, by not using a cloud service you might save $2B a year, but does that cost you the opportunity to make even more than that, given you're probably moving slower or at least less efficiently than you otherwise could?

> but does that cost you the opportunity to make even more than that, given you're probably moving slower or at least less efficiently than you otherwise could?

The potential opportunity of fast-moving cloud features needs to be weighed against the opportunity costs of slow-moving cloud features. Where bespoke solutions can immediately provide tailored performance and maximize technical capabilities, a missing feature in any of your cloud providers services can be a showstopper or unmitigatable roadblock. And while lots of the technology is past the bounds of reasonable economical replacement, some of the technologies being shared through the cloud are nigh unfathomable to recreate.

Which is to say that the black ju-ju behind Windows update probably takes making a new MS to build up to present maturity, and unless you're a certified "big boy" letting small teams somewhere else fully dictate what you can and can't do at service boundaries probably impacts you in the long run.

Based on that, and IMO/IME: the answer isn't a binary choice but a constantly shifting point on a spectrum between the two, where on-premise/local-cloud and remote-cloud services are aware of one another and maximize capabilities while minimizing costs. Hybrid installations are just stronger, and are easier to reshape according to costs.

I don’t think proprietary is offered to make transition harder. In a lot of cases they simply offer a better option.

Well that's definitely what they will tell you!

There are plenty of reasons to choose a point in the space denominated by the axes make/adapt-existing X insource/outsource X proprietary/open . None are zero cost unless you really think your platform is good enough that enough others will write for it without you doing anything.

It's possible that you would chose to write your own because nothing in the market is even close to good. It's possible that you would open source it in the hopes that it becomes standard and that others will add new features that you will benefit from too (this is not a zero-cost option either). It's possible that you would even keep it proprietary because that's cheaper. But given the way Amazon operates it's pretty clear that they offer their own proprietary services to lower the barriers to adoption ("simply offer a better option") and raise the barriers to switching.

I don't see anything wrong with this strategy legally or morally, though I mostly avoid amazon proprietary tools because the risk of locking is one I'm not willing to embrace. There are of course situations where they aren't a problem (e.g. a tool with an anticipated finite lifetime: the cost of lock in in that case is essentially nil).

I used to work at Bank of America as a level 2 app analyst back when they first started building Quartz. At the time, it was advertised internally as a system to be used for reporting, and so it had lots of built-in functionality to connect to databases, etc. Pretty neat.

That said.

The method of encoding production database credentials was rot-13. No joke. In the Quartz interface, you could double click on a starred-out set of credentials, and it would run rot-13 on it and display the password. This was for FX, rates, credit card, mortgage, etc etc etc. Having access to this cloud system gave effective access into all of Bank of America and Merrill Lynch.

They probably save a lot of their money by using very, very bad practices.

Still only the second worst security fail I've seen.

I'll bite. And the first?

Don't leave us hanging, whats #1?

Could you share the winner?

I work for banks, and I can tell you that when you are doing devops with their "own cloud", you are miles away from a real cloud experience : no os choice, no hardware choice, slow provisioning, no access to repo, low and inconstant virtual disk (EBS) speed... I guess that you get what your paied for, and maybe the 2B saved on cloud are spend on IT service that suffer from such a poor own cloud experience. If they have 50 000 IT employees payed 100k, that's 5b a year. Including professional services and you are maybe at 10b. Just increase the productivity of this 10b by 20% and the money "saved" is not such a good deal...

Probably more than half of the engineers writing the code for their cloud are either based out of india, making 40k a year, or in the USA, making 50% of the market rate.

> no os choice, no hardware choice, slow provisioning

These are probably positives from large org internal perspective.

"We want to switch x from y to z." "We can't. In fact we want to switch u from z to y."

> If they have 50 000 IT employees

BofA has 205k employees in 2019 as per Wikipedia. No way 1/4 of them are in IT.

I was surprised by how many IT employee there are in banks. But please, provide your own estimation.

Having worked at several banks, I would not be surprised at all if a 200k employee bank had 50,000 developers.

If you work at a bank that uses real cloud services (AWS and Azure in my case), they still lock down most of the options. They want control, ease of maintenance, and most of all: auditability. More options creates more overhead. Simple is good.

I did recently wonder if it made sense for banks to put all their stuff on other people's computers, instead of maintaining their own, but for the bank I work at, the old on-premise systems are maintained by IBM and ridiculously restrictive, so either AWS or Azure is already a massive improvement.

50,000 engineers just to run a bank? I don't think so.

JP Morgan has about 50k "engineers."

A lot of them are managers, analysts, or people who work with tech, but aren't all trained software engineers. A scrum master is considered an engineer. They had a blanket update to a lot of their role titles a couple years back where everyone in tech suddenly became a Software Engineer as their title (without much meaning behind it).

250,000 engineers at JPMorgan.

There are ~250k employees at JP Morgan, not engineers.



(Note the 2.)

What tool is this for? Sed?

I've copy pasted it before, but i think you just made the syntax click

Yep, sed supports that syntax (substitute). echo 'There are ~250k employees at JP Morgan, not employees.' | sed 's/employees/engineers/2'

Thanks :)

Oh yeah, 40k engineers, apologies

Why not? It's a lot, but I don't see this as an unreasonable number.

They don't have any choice in public clouds either.


Many large companies save money by having their own cloud, but those clouds sometimes really suck compared to AWS.

I know of one very large company where any request for a change in their cloud infrastructure always required a minimum two week advanced notice. In AWS such a change is just a mouse click away, and could be done in seconds.

AWS also has a really amazing integration of a large variety of services which is really hard for in-house clouds to match. I wonder how many AWS services the BoA cloud has, and how their own integration of those services matches that of AWS.

That's not a comparison of external cloud vs. self hosted though. It is a comparison of different operating models with respect to change control. Either one can exist in both external vs. self-hosted infrastructure.

I've been in places where the engineering team plugs in an old PC under a desk somewhere, gives it a public IP address of there's the production server. That's the self-hosted equivalent of the engineering team having full AWS access and any change is a mouse click away.

I've also experienced places where any change does take a week or two of approvals even though it is hosted on public cloud.

There is a time and place for all approaches. What works best with a three person startup putting up a MVP is quite different from what is best in a very large corporation operating in a regulated environment.

Change control and service delivery time-frames are two different things. When I left BOA in 2017, you could get most production changes implemented with only two days of lead time from a formal change control perspective. But, requesting a new Virtual IP on a load-balancer could easily take two weeks, just for every layer of bureaucracy to wet its beak. And it was impossible to request something so basic without an online service request, and then follow-up emails because the standard service offerings left all kinds of details undetermined and no structured way to provide the information.

Most production changes require a second set of eyes, sometimes from a particular team, but it's all "just" code review. You put your change in the team's queue, their oncall engineer reviews it the same day, you land the change and it gets executed automatically. Most have implemented namespacing so that changes that only affect your own team's stuff can be approved within your team.

This is all on owned hardware. The difference is that we're a SWE driven company (corporate IT is off in its own world, run in the more traditional way, but they don't touch engineering's production datacenters). Infrastructure teams provide APIs, not JIRA forms.

our internal cloud requires months in advance to get changes done :(

It should be named 'server in a remote place' instead of cloud.

That sounds like a Cloud in name only.

You guys are making me feel better about our private cloud.

Hey cool trick thank you.

The appeal of the cloud is less IT politics which in big company is a serious problem. Business Units can sidestep IT and go “self service” and spare a lot of pain and time in dealing with IT. I don’t think any serious manager going into the cloud is doing it for the savings. But of course this is a bank and technology is their future so it makes sense that they want to keep a tight loop on the cloud activities.

You’re also neglecting TTM as a consideration. As an example, I’m stuck having to wait 6 weeks to get new hardware provisioned into an available slot in one of my data centers to spin up a new Hadoop cluster, something my team could probably do in a couple of hours in the cloud.

That being said, we control the HW and SW stacks end to end so I don’t have to worry as much about the nightmare scenarios the NordVPN folks went public about today. Critical given we’re in fintech ...

Friend of mine at a payroll processor you've all heard of was excited to migrate to AWS because it took 4-6 weeks for IT to provision a new VM. On their app team's dedicated hardware!

So much of the Fortune 500 lust for public cloud seems to come down to working around inefficient IT procurement and provisioning processes. They haven't automated, they don't maintain enough excess capacity, they haven't managed vendor relationships to assure fast order turn-around, financial controls are too onerous, etc.

Everything's virtualized but otherwise they're still operating like it's 1995.

I used to work for BofA as a quant in their Charlotte HQ. Their cloud decision is the least surprising. You really have to see it from their pov. My boss used to say - "Charlotte is a 2-horse town. You either work for Bank of America, or you walk across the street to Wells Fargo"! Compared to these 2, the other companies are much smaller along most axes (market cap,employee count, or just sheer heft). There's an uptown walkway( like a private overhead glass tunnel) that safely escort bank workers from downtown to bank without coming into contact with riffraff :) Its just a whole new level of planned design. Imagine if all the FAANGs had an interconnected private glass tunnel walkway that looked down upon the unfortunate denizens of MV/PA/SF,while the chosen ones swiftly segwayed from FAANG to FAANG whilst checking in their latest git commit into kubernetes or whatever it is you guys do :) That's what it was like. Quite unreal.

But all this clout leads to a rather head in sand mindset on most strategic items, like technology choice, programming language choice, cloud choice (or non-choice in this case), version control choice etc. Everything was done in-house, in the most boring safest way possible tech ( mostly Java about 3 versions behind, some strange python lib where any function call was automatically logged!, and Excel & Matlab all over the quant land. I mostly sftp-ed financial data from some Quartz cloud...felt very quaint to do these sort of things in 201X. All laptops were locked down windows dell boxes on which you couldn't install anything, & ran some strange norton antivirus which hogged all the memory. My interview itself was so old fashioned. I thought since it was a Quant job, I'd get questions on math & finance. They trotted out their "chief developer" who wanted to know how to model a chair with 4 legs using Java OO. You know the 1990s Grady Booch garbage full of UML, with parent Table class & child Table & Leg Class & friend function & all that jazz. I was like Jesus this regressive inheritance based OO shit is still alive! Its a deeply old fashioned slow moving place. Very large IT budget with pretty much half of Charlotte working in some capacity for the bank. So yeah, if you had all the personnel & all the money, why wouldn't you build your own cloud. You are paying for all these people anyway, might as well give them something to do. Of all the employers I've worked for in my lifetime, this was the one place where I was personally asked NOT to work so hard, because I stayed at my desk after 5:30 pm.

(Sorry I have a regular 4 digit HN account but the bank doesn't like it if you talk about them. One of their lawyers once tracked me down because I mentioned some harmless datapoint about a technical problem I had worked on.)

> One of their lawyers once tracked me down because I mentioned some harmless datapoint about a technical problem I had worked on.

Ok so now you said that won’t it be pretty trivial to identify you again? How many “4 digit HN” users have they really tracked down before .. 1?

A fascinating insiders view, thank you for taking the time to post it.

> The bank, which has a $10 billion annual tech budget

Am I the only one completely befuddled by this number? What the fuck are they doing with these money and 200k servers? These are facebook numbers. For a bank. What?

I'm not sure why you find this surprising, a bank has a much more complicated problem space than Facebook.

- They have a reasonably similar number of users (a fraction, but a large one).

- Mistakes cost a lot, so they have to be a lot more careful. It's a lot easier to make money hacking a bank than hacking facebook.

- They have to comply with all sorts of regulations.

- They probably don't trust their own employees to not be trying to commit fraud.

- They have to parse data on a scale that is likely similar or greater than facebook's. To detect fraud/lost credit cards/.... To decide who to give loans to. To price insurance. To decide how to trade stocks. ...

- They have to run a physical fleet of devices in the field, outside of their control, that have to give people the right amount of money ~100% of the time.

At a glance I see that Facebook has something like 300 petabytes of data [0]. I've worked at a bank, my team had something more like 10, but I don't think much of it was things like video that are just naturally huge. BOA is also approximately an order of magnitude bigger than the bank I was at.

One rumor I heard while there was that there had been a bug in one of our mobile apps that had been costing us a million dollars a day in server time.

[0] https://www.brandwatch.com/blog/facebook-statistics/

How do you spend that much on software and have such an abysmal usability and security story? I don't think there is anything technically difficult about the consumer software they offer, namely https://www.bankofamerica.com/.

I think you would find that their security is better than you think. Otherwise they'd be hemorrhaging money left right and center to North Korea and the likes.

As for usability, probably a degree of incompetence, mixed with design-by-committee and legacy. Edit: It's worth pointing out that banks usually don't gain or lose customers based on their UX, so it's not something that the business optimizes much.

I don't think there is anything technically difficult about almost any of the consumer software facebook offers, except scale. The same applies here but exchange "scale" for "scale, reliability, security, and regulatory compliance".

> I think you would find that their security is better than you think.

You can get in to my account by verbally relaying my grandfather's first name over the phone. You can open a bank account with a SSN and no photo id. What security? Their "security" is a fraud department, much like our credit card industry.

> Otherwise they'd be hemorrhaging money left right and center to North Korea and the likes.

This is not how transactions work.

> Edit: It's worth pointing out that banks usually don't gain or lose customers based on their UX, so it's not something that the business optimizes much.

All banks offer the same shitty experience. What does differentiate them if not their software? They offer literally nothing my local credit union doesn't offer.

> I don't think there is anything technically difficult about almost any of the consumer software facebook offers, except scale.

No argument here, but facebook at least manages to hire designers and not impose weird non-sensical patterns of auth, like "Look for this image when you log in".

Why do you say they have bad security?

I've talked with info sec employees working in their Charlotte office, and have not been given an indication that they are slacking.

Their security questions are a massive security hole. Credit cards don't require a PIN. I don't see much indication that it is difficult to steal from people.

>What the fuck are they doing with these money and 200k servers?

Gluing together layer upon layer of legacy systems that are so old & opaque they are essentially black boxes nobody dares remove or replace.

oh and that small regional bank that was acquired 4 years ago? Yep...they've got an entirely separate stack of legacy systems. And that other jurisdiction with different data laws? Everything is different there too.

It's insane looking under the hood of these things. How banks manage to not lose half the money daily is a complete mystery to me.

The sell-side investment bank alone is a huge consumer of this tech budget.

13 exchanges x 3000+ securities x 2 quotes for each security x 23,400 seconds in the trading day (with the very very very generous assumption that quotes only change once per second)

That’s the inbound data alone, now consider all the algorithms they have to run on it, the models, the strategies, etc.

And then to each of their several hundred institutional clients, they have to give a unique and dynamic price.

Again, this is only the sell-side investment bank, not even looking at their absolutely gargantuan consumer bank business

And how much data visualization do they do? Software devs vastly underestimate the gravity of the data visualization situation.

Pen-and-paper banking doesn't exist. Everything is done electronically, and BofA is a massive behemoth of a bank.

Tech companies also aren't subject to the same level of scrutiny / regulatory requirements as big banks, plus the need to support old processes and software, etc

> These are facebook numbers. For a bank.

I could turn this around and ask, why would Facebook ever need infrastructure on par with BofA? I'm not an expert on infrastructure, but given their size, revenue, market position, number of customers etc, there's nothing baffling about these numbers.

When I wrote software for telcos, we had levels of logging that cloud users are only starting to reach ten, fifteen years later. From what I hear banks are even worse.

When you are collecting that much of an audit trail, that data transitions from being a problem to manage to the problem to manage.

I wonder how many bespoke Kafka work-alikes there are out there that are older than the authors of kafka.

any software the bank is licensing can run into the 10's of millions per year, for anything they use: loan orignation, underwriting, etc. they have a lot of users also so just think about their basic workspace cost per user for all the software necessary.

- some banks are spending huge on internal infra in order to tie together various arms of the bank, e.g. consumer lending with small business lending with personal banking, quantitative finance getting better feature vectors from other parts of the bank, ... . banks understand that they have mondo data, so want to make it accessible to everyone. sort of like the API craze, where people build cool stuff with APIs once they are exposed; you can't quite imagine what but know value added stuff will bubble up

- the instantaneous nature of things in today's world is bleeding into finance, where banks want to advertise/offer near instant access to credit, loans, etc. instead of turn around time of days. in order to do that, it helps a lot to basically unify data across tons of previous disparate orgs, shared infra, etc.

- cybersecurity/auditing/compliance can be very expensive to license or contract for and occasionally has to be run on company hardware due to legal issues

> " What the fuck are they doing with these money and 200k servers? These are facebook numbers. For a bank. What?"

Are you surprised that a an organisation handling billions of real money, millions or billions of financial transactions, and decades worth of legacy systems, can make do with only as little in servers and software development as a social network?

Might well have small systems at branch level that they are counting as servers.

Apparently BOA has 4344 branches so the local infrastructure costs will add up both for branch kit and the networking costs.

BofA has 205,000 employees and millions of accounts.

If you know your traffic well, private cloud saves a lot of money. Few years back i bet hybrid clouds will take over and for some reason it did not happen. I still believe hybrid cloud is the solution for mid size and up companies. You definitely need cloud provides for handling traffic spikes.

> for some reason it did not happen.

Part of the organizational allure of migrating to the cloud is being able to rationalize headcount. Hybrid clouds, while bringing a lot of potential upside, are complex. While you can potentially reduce the scale of labor involved, you still have to retain all of the core skillsets you already have to maintain your on-prem locations. Then add headcount to optimize the cloud deployment, _plus_ the ever elusive skillset of the individuals that can marry the two in such a way where you show operational savings from the hybrid model.

AWS and Azure is moving into hybrid space, checkout AWS Outpost - your own cloud, AWS software (or something similar)

Nobody ever got fired for using AWS.

I am not at liberty to say the company name, but i do directly know of one person at a director level who was fired for using AWS.

They turned on encryption for an option and did not realize the huge price difference between the encrypted and unencrypted option meant $9000 per day in additional charges. After 1 month (30 days) this came to light, and he was fired.

that's not what people mean by the phrase "Nobody ever got fired for using AWS." http://wiki.c2.com/?NobodyEverGotFiredForBuyingMicrosoft

Before that it was 'IBM' ...

Yes, this happens with AWS. Controlling costs is a very important part of using any cloud, let alone AWS where the charges are... not as transparent as one would wish. :)

EDIT: btw, he was not fired for using AWS, he was fired for making unnecessary costs.

I'm sure at least a few companies have gone bankrupt or people have gotten pink-slipped for using AWS and not paying attention to auto-scaling settings...

It's scarily easy to accrue huge bills if you're not watching things closely.

IT decisions that get people fired are usually not well known to the public anyways.


If the 2 billion were saved over two years, that is an increase in profit of ~4% over those 8 quarters ((75-56)÷56÷8)§. That's a pretty good outcome.

§ = Q3: "The bank said Wednesday that net income excluding an impairment charge rose 4% to $7.5 billion, or an adjusted 75 cents a share. When including the $2.1 billion charge tied to the end of a partnership with First Data, net income fell to 56 cents a share"

That implies their business would have data center costs beyond most tech companies (Lyft for example is spending $300 million with AWS over three years, Snap is spending $3 billion over five years).

Something doesn't add up here.

Tech companies don't necessarily spend more on computing resources than non-tech companies. The article says they had 200,000 servers in 60 datacenters. That is certainly more than most tech companies, and is not surprising. A large bank probably has more computing needs than someone like Lyft.

200,000 servers isn't actually that many. 200,000 c5.xlarge's at on demand rates (which nobody with 200,000 servers pays) will cost you about $300 million a year. So they are realizing savings at about an order of magnitude higher than you'd expect cost wise.

I used to work in banking... their compute needs are actually quite basic.

I think what we're really seeing here is some creative accounting around how operations and costs are accounted for. They've got to be talking about costs of managing and operating their cloud as well, and most of those costs would be in terms of training & workforce retooling.

> Right now, the bank estimates its private cloud is 25% to 30% cheaper than public providers, though it also recognizes that probably won't last forever.

I am glad they shared this situation. We often here marketing telling us that public cloud is always better. It's nice to see different perspectives being shared and some details around them.

If you have $50million+ you can deploy on capex in one shot (economies of scale negotiation doesn't work well at lower numbers), and you can hire systems engineers (C/C++/Go/Rust 5+ years systems experience x 20 engineers), you can put together a single tenant compute cloud needed to run a typical Internet facing web/mobile application with all the bells and whistles (lots of app server clusters, db clusters, big data clusters, GPU clusters etc with HA, scalability, security, DR etc). At this point, capex/opex cost will be easily 30% lower than the best negotiated bulk price from public cloud.

It is worth doing this if your in-house engineering team is at least 100+ app service developers and have high feature churn rate.

But be warned, your systems engineering / shared technology team should be level-headed and mature and rest of your app service engineering should be good too to pull this off well. If not, you will be in serious developer productivity pain and there won't be quick and easy fixes once you have put down the capex.

People seem to be posting hard-paywalled links more and more around here. They were supposed to be banned, but I guess the rules are selectively enforced by @dang here nowadays. The link to the article should be changed to this Outline link.

Thanks, I couldn't read it because of the pay wall but at this link I could read it without any hassle.

Thank you for the link (article was unreadable on original posted link)

All the flexibility that the cloud offers is wasted on a large bank. I remember one of the selling points of cloud computing for developpers is that it takes months to provision a physical server in a large organisation. With [enter cloud name here] a developper can create a VM in only a few minutes!

Well, enters a typical bank bureaucracy, cost controls, approvals, etc. And now we are back to taking months to provision a VM!

Is this a story about virtualization? I fail to see the distinction between data center and private cloud in this context.

I think in this context it relates to the overheads of provisioning new services.

Traditional virtualized hosting environments typically also come with varying levels of technical overheads when setting up new boxes.

What bofa have is a virtually self-service (some business approvals needed) user interface with the ability for devs to set up new virtual servers with little/no knowledge of the underlying hosting infrastructure.

I was going to explain how there's no difference but then I thought about the more facetious terminology

cloud - someone else's computers

private cloud - someone else's computers that you own???? or that nobody else uses, but in this context you also own???

Cloud also means access on-demand by API.

Private cloud usually refers to that same abstraction layer. Hardware provisioning is abstracted away from use and available on-demand from a pool, and likewise deployment works automatically via the same or an equivalent API.

That's very different from the operation of a typical company-owned datacenter from a couple of decades ago.

ITT: people who didn't read the article and believed the false headline that Business Insider made up.

The 3rd sentence of the article, in bold, explains that the savings were from consolidating on-prem hardware, not from eschewing external cloud.

The article cites BofA itself at sayignthat savigns vs external cloud is tenuous at best:

> Right now, the bank estimates its private cloud is 25 to 30% cheaper than public providers, though it also recognizes that probably won't last forever. Still, the company believes the architecture it has built will give it leverage in negotiating contracts with these companies

Public cloud is definitely a great value for startups. When you are small and want the flexibility and speed and the sophistication that cloud providers make possible. The premium they charge is well worth up to a certain point at least. When you are an established business with mature tech and of the size of BoA, it is a whole different math. Your investment in your own cloud if done right can be a huge saving that can give more mileage (on Wall Street) than your competitors speed in execution, esp if you are a bank, where time to market isn’t exactly a winner.

cloud will always be more expensive than bare metal. period. cloud is great to start out when you have no clue about how much power you will need to satisfy your customers and to be flexible enough to cover some unexpected spikes. but once you're out there earning, the only reason to stay on cloud is comfort. business-wise it is a black hole for money. many will argue that cost of personnel to manage your own infra does not favour it over cloud. but if you do the math you will see that savings when renting only three bare metal servers each month will pay a full-time salary for a sysadmin. you don't even need to own your own hardware. the bare metal rentals these days are insanely low and if you truly want to save each penny, you buy your own because in a matter of months you get yor investment back on saved fees alone. the thing is that these big tech companies convinced so many people these days that cloud is the only way that they cannot even fathom how anybody can run on their own hardware. the new sysadmins are already "brainwashed" and inexperienced this way and the old school guys that have experienced managing their own hardware are slowly "dying" out. soon, just a mention of managing your own hardware will become a joke to these newcommers when in reality the cloud was never-ever a cheaper option to go with in the first place. not to mention that these days we have so much technology to make things easy(kubernetes, rancher, proxmox, docker, lxc..) that it is quite laugable to fear bare metal and religiously praise the cloud like some kind of savior.

I continue to be amazed that any business of more than a few hundred employees would ever consider using off-prem hosting. Cloud services are there to bootstrap a business, not maintain it.

Well the issue is that you bootstrap on AWS and then have a hard time going on premise without a major rewrite or initiative. That and you want to keep adding features and driving the product forward... it is easy to see how this happens. Plus you can have 200 people focused on things other than running ESX boxes or docker or whatever you need.

Using cloud services saves a business from having to have a whole vertical stack of expertise in house. And for banks, these infrastructure engineers have to be available 24*7.

For many companies, hiring and talent retention is a problem. There are some risks that are not as superficial as dollar signs on the balance sheet.

"Building its own cloud" = leasing data centers from Equinix or other large data center providers. Financial services have challenges using public cloud due to regulatory and compliance requirements. Most of these challenges are self made - artifacts of moving their teams from "This is how we currently do it" to a shared services model.

Curious if they are all in on providers like Redhat Openshift or Pivotal Cloud Foundry as their PaaS layer.

Given how little money Pivotal was making from Cloud Foundry and how expensive each license was, I’d bet a large sum that the answer is no.

(Some context: My last employer had an AWS bill of about $3m/year that was mostly EC2. Running PCF on top of that would have been another $2m/year in licensing. And that was after the volume discount.)

“Right now, the bank estimates its private cloud is 25 to 30% cheaper than public providers, though it also recognizes that probably won't last forever.”

After a meetup this spring, someone talked my ear off about how Dell is highly motivated to have a private cloud solution that works for people.

Having Dell and Amazon in a bidding war over your next project is probably the best world you can be in.

For my money, you should run one data center in the same location with most of your tech talent, and a second one geographically distant, and regionally load balanced.

But we typically don’t write our software for this, and you can’t get the business to do a rewrite until they see how stupid expensive a lift and shift ends up being.

I think these narratives about saving money are typically covering up a story of how much was squandered starting five quarter ago...

Unfortunately, Dell’s private cloud solution exists today, and it’s an 18 layer shit sandwich of all the disparate companies that Michael Dell has stitched together into his holding empire.

I have a question: I'm doing an ambitious Web site startup; current status, rushing to alpha test; and from this thread and more I wonder about servers and our server farm and in-house network, that is, for the options here, in-house, co-location, AWS, Azure, etc. I'm using Microsoft's software, Windows, .NET, SQL Server, etc.

Q. For many of the reasons given in this thread, etc., I'm leaning to having our own in-house servers, server farm, etc. For that, is there a source of information I will be able to use, say, as consulting, depending on the issue, an hour, day, week, month at a time, to get us past the chuckholes in the road on the usual work -- system and network planning, installation, configuration, monitoring, diagnosis, correction, etc.? E.g., can I just call Microsoft for such issues, CloudFlare, VMWare, Cisco, etc.? Assume that money to pay for the products and consulting will not be a problem.

If the broad answer is "Yes", then that will take a lot of entries off my TODO list and let me sleep better.


For what it's worth, I really don't like AWS - aside from their S3/Route53/CloudFront products. There is a layer of abstraction that is so minimal that you may as well roll out the services yourself & save a ton of money. Of course, if you like that layer, then that's fine too.

If you look beyond that, AWS offer layers of services that you might not realize existed.

Cloud DB with no worry about physical hardware (in addition to S3 buckets which is mostly for files). lambdas when you don't want to manage a server, cloudformation when you don't want to manually start your stack by clicking, API gateway for quick ways to create APIs without worrying about stuff. And powerful orchestration layer that connects all of these together.

In fact, a lot of these services are bootstrapped on AWS itself. If you know your way around AWS, you can really find a sweet spot where it's cheap scalable and mostly worry free. The problem is that AWS domain knowledge in itself is _deep_ and not many people know enough about AWS to avoid large amount of pitfall and inefficiencies that a knowledgeable people can quickly spot and fix/alternate.

My experience working at an aging e-commerce company that has their own cloud: totally shitty.

They used off the shelf management software that they misconfigured. Getting VMs was so painful and they never worked correctly.

When I came back from paternity leave, they had gotten no response from their automated system and deleted all my VMs even though they could have seen heavy usage. If they had manually followed up they would have gotten an out of office email but they didn't.

Those are just two of the examples of how terrible it was to use their internal cloud.

You are always going to be five years behind if you do it internally. That might be a good cost calculation and I wasn't in a position to argue.

But, it meant innovation was terrible there.

Something does not make sense here. The main cost savings seem to come from using fewer servers. Why could you simply not do this on Azure or GCP or AWS? Are the advantages coming from leveraging bare metal servers? I'm a little confused.

For BoA we need to consider that is a Bank and many of the information they store may be too sensitive to pass it to a third party which as we know always have security issues. Better to deal with those internally. Many factors to consider in this case: real estate, expertise in-house to build data centers, budgeting, technology required. In our case without Cloud would have been very difficult to bootstrap our business and manage it, as most of the budget was to hire engineers to work in our product and pay bill.

In my opinion, a company can save money on a cloud if it has these things:

- Access to cheap real estate

- Access to enough human capital, especially a team of 80+ to operate all aspects of a cloud: hardware, switches, openstack / kubernetes

- Requires scale of 20k+ servers = 320k+ VMs and a strong inclination to not host on AWS/Azure/Others.

- Lease to servers costing about $250 / server / mo. With AMD, you could go half of that, with VMs costing $10 a month. Add network costs of $15 / 2 ports / server / month.

> "Now, it's pared that down to 70,000 servers, of which 8,000 of those are handling the bulk of the load."

I wonder what the other 62000 servers are doing.

Lots of dev/test, storage, R&D, BI/reporting, and miscellaneous internal business apps most likely.

Indeed. One of the big advantages of the cloud is you provision just the number of servers you need. And scale out to handle temporary spikes in load.


So one thing people seem to be ignoring is that cloud != automated provisioning of VMs by engineers.

There’s so much more AWS, GCP and Azure offer these days that trading away all the stuff like metrics, monitoring, dynamo-dB, s3, lambda, scalable RDS, Spark pipelines and generally a whole suite of products that go into building a modern web app for some scripts that can provision on demand seems to be a poor choice.

Bank of America should really not brag about the quality of the software it makes based on its consumer website. Maybe they should consider spending more.

JPM's Jamie Dimon talks about his cloud strategy on the most recent shareholders letter, page 34. (https://www.jpmorganchase.com/corporate/investor-relations/d...)

Note that he leaves open the option for both public and private cloud.

Counteractual: Suppose BofA wasted $2B per year by building its own cloud. Or is BofA decided to migrate from woncloud to AWS, and wasted $2B in the process? Would a CEO make a public statement about that? Or would they fiddle the numbers (which are impossible for an outsider to verify) to make their decision look like a winner?

70000 servers in 23 datacenters – sounds awful. Why have it spread out so much? 350 million capex for 70k high end server DC setup may make sense.

$2B = 40% savings means their opex was $5B and now it's $3B. That is high expense. Guessing most of it is licensed software costs? Like Oracle on every single core?!?

I'm not so convinced how great of a job they did.Bank of America has by far the worst user experience I've ever seen. I had an experience recently where I changed my password, and after 2 days my new password stopped working and my old one did.

That's not good, but it's also only one of the many, many things a bank does, though. The most important one is keeping the money safe of course. It's entirely possible for a bank to do some things right and other things wrong.

This is simply a fixed cost vs variable cost decision. Fixed costs give you operating leverage which amplified your earnings in good times and bad. Variable costs don’t have the downside, but impact your margins.

Do banks using third-party cloud services just trust that the provider won't abuse their data? Do they actually store financial data there, or just use them for front-end stuff like websites?

The trust will come from very stringent contracts (beyond the regular contracts that other customers would use), third-party certifications that the provider achieves, and the provider being covered by certain aspects of the regulations that cover banks (or healthcare or whatever).

Most of the cloud providers are PCI compliant: https://aws.amazon.com/compliance/pci-dss-level-1-faqs/

That BofA had >$2B/year of IT spending is the real story here.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact