Hacker News new | comments | show | ask | jobs | submit login
Ask HN: How often is hardware at the big tech companies refreshed?
73 points by webmaven on Jan 11, 2017 | hide | past | web | favorite | 72 comments
Does anyone have data that would suggest how long it takes an org at the scale of an Amazon, Google, or Facebook to entirely replace their HW?

I know that current practice at that scale is to not replace an individual chassis, only whole racks when a certain % of servers have failed, but that can't be the whole story, since presumably servers/racks are replaced if they are old enough even if they happen to still be running OK.

Google for example almost certainly has no servers (or switches, etc.) running that date back to Y2k even though statistically, some 16-year-old HW would still be working if it hadn't been yanked.

So, assuming the HW hasn't actually failed, how old is too old for a "web-scale" company to keep around? And is the answer different for these companies' own services vs. externally facing IaaS/PaaS?

Once you own data centers, energy consumption becomes a major consideration.

I'm not doing stuff like that but I assume the train of thought concerning the economics of energy consumption is like that: You buy new hardware and from experience you know it's going to last on average a few years. During the lifetime of your new hardware you can save some amount of money on electricity because your new hardware is more efficient than the old one. So it would make sense to hit the buy button for the new hardware when you can save money:

(Savings in electricity over the lifetime) - (Price of new hardware) > 0

I assume after a few years the savings may become significant.

It's not so simple. Deprovisioning hardware takes work to move the services away (yes, even in Google) and breaks things. It also costs to physically remove the hardware and ship it and refurbish the DC to host new hardware.

Big tech companies are often running hardware for longer than you think - just not using old hw for cloud hosting.

I assume the costs of repairs and spare parts would also enter that equation.

An aside: it can be a good exercise to use the above reasoning to decide when to buy a new car

That reasoning will push buying new cars away to the moment the old ones completely fail, and not a second before.

For cars it is much more important to calculate the risk of it failing in some moment you need it, and the costs of being suddenly carless. Also, for cars that go on a road, the most relevant factor is safety.

Having done the maths, and living in a country with very high fuel prices, I have to say that everything seems to be in favour of putting up with small old cars as long as possible!

It's also a good reason to update major appliances regularly. Washers that use too much water to fridges that are inadequate insulated.

Lot's of companies lease their hardware, including servers. When that lease expires, that hardware goes on the reseller market dirt cheap. That's how I built my own 17-node server cluster for my (failed) startup, turning what would have been $90K per month AWS expense into a $600 colocation expense. Owning and and maintaining your own servers is not hard at all, folks.

I'm interested in buying hardware for my own nefarious purposes. Can you post links?

You can get it given to you if you print up some "free e-waste collection service" flyers.

Here's one (Australian?) guy and his small business, with some examples of the stuff he collects: https://www.youtube.com/watch?v=BbMmuLDMx4Y

> I'm interested in buying hardware for my own nefarious purposes. Can you post links?

Checkout some of the insane setups on DSLReports [0], check out the pics. Here's an example [1].

[0] https://www.dslreports.com/forum/homephotos

[1] https://www.dslreports.com/forum/r30877673-Darby-Weaver-The-...

You can find tons of used servers on ebay. That's where I bought all my homelab server stuff. Just don't buy very old stuff.

places i found info was reddit homelab[1] and serve the home [2].

[1]: https://reddit.com/r/homelab [2]: https://www.servethehome.com/

google "refurbished servers"

...I'm reasonably certain that GOOG, AMZN, MSFT, FB, etc. don't lease their hardware, but would be fascinated to learn otherwise.

He's not talking about them, he's talking about normal companies, lots of places run their own servers.

AWS/GCE/Azure are an order of magnitude more expensive than running your own. And usually have poor and unpredictable performance to boot.

> He's not talking about them, he's talking about normal companies, lots of places run their own servers.

Then his comment was tangential (at best, and non-responsive at worst) to the OP.

> org at the scale of an Amazon

> Owning and and maintaining your own servers is not hard at all, folks.

Sysadmins are useless /s

Well, technically, maintaining a collocated rack of servers is not that different from maintaining a pile of virtual machines from your favorite provider.

Gitlab is exploring moving their cloud setup to baremetal


Interested as well. Where did you buy it?

You can find used servers on ebay.

Interesting question. I'm wondering if there's some hand wringing over at Intel about refresh cycles. Moore's law isn't what it used to be. My experience in large, non-tech companies is they've tended to follow a 5 year refresh cycle. But, a 5 year old server is still useful, and not terribly power hungry these days. I could easily see a decision to stretch that out to 6.

As a data point, at my current place (a large, university-affiliated R&D lab), the upgrade cycles are different for "servers" that do some externally visible / useful stuff and workstations.

Servers generally get upgraded when the next functionality upgrade topples the current hardware or when the hardware physically dies (often WELL over 5 years).

Workstations / laptops are upgraded whenever a user wants a new one. Many folks (myself included) prefer to keep our current systems for a long time. Components and peripherals get upgraded much more frequently, as people keep buying hard drives, GPUs and other new toys.

Yes...a fixed refresh cycle is more common in corporate environments. And it's more of a budgetary thing. IT has some flexibility to keep old servers, but the budget they have to refresh is calculated on some X years cycle. So they are limited in the number they can purchase each year to refresh existing servers. Net new functionality or applications typically fund the first cycle for servers that have a net new use.

> Moore's law isn't what it used to be.

No it isn't, at least in its original formulation ("the number of transistors in a dense integrated circuit doubles approximately every twenty-four months."), as by that measure the doubling rate is now around thirty months (at least through the 2017 product roadmaps).

"number of transistors per integrated circuit" is becoming less important as power costs rise. As software gets designed and written to take advantage of more cores (which, latency aside, might be on different ICs, in different computers, or even in different data centers), FLOPS/Watt-Hour starts seeming much more relevant, especially for the type of loads the Microogazonbook's of the world typically run.

Meanwhile, for the niches where IC speed really is crucial, we see GPGPU designs (with multiple Ks of simpler cores per IC) taking over.

Servers pulled from prod in many companies will do just fine for dev/test/staging environments - I'd bet the answer here is more complicated than any given # of years..

It does usually boil down to # of years from a budget and amortization view...which year the IT dept gets to refresh a specific segment/number of servers.

Edit: Of course, there are increasingly fewer companies that actually buy, or even lease server hardware.

> Of course, there are increasingly fewer companies that actually buy, or even lease server hardware.

The OP was specifically about those few companies that definitely are buying their own HW, and massive amounts of it, at that.

It use to be customary for a company to deploy a new server to accommodate a new program or initiative. Groups or department in the organization would be hesitant to share a host and often wanted their own machines. Depending on the organization, this often came out of IT's budget and not the various departments.

That's part of the reason why virtualization has been so rapidly adopted.

There are a few factors here.

Depreciation: is typically three years for hardware purchased rather than leased. At the end of 3 years the item has zero value thus can be replaced.

Useful life: is hopefully longer and up to five years. So an item is without value but can still be used. More typical with networking hardware than compute.

The other poster who mentioned power consumption is totally correct. It can make sense to renew hardware even before it has fully depreciated in order to get better datacentre density and lower power draw.

This is what is driving most refreshes in my experience.

Working for Big Company...

Three year rolling cycles for servers. That is, you're not replacing hardware all at once but incrementally.

Similar for dev machines. If you imagine a dev has two dev boxes and a laptop, spread that over the three year cycle.

Is your Big Company one of the Really Big Companies (eg. AMZN/GOOG/FB/MSFT/KRX/etc.) my question was about?

I suspect that as with most businesses, it's not a decision that's driven directly by technology, but rather by the operational economics of new technologies versus the capital costs (where the capital costs include depreciation and resale in addition acquisition and installation costs).

When looking at what large companies do in regard to acquisition, the view through the lens is diffracted by the relationship those companies have with hardware suppliers...e.g. Intel discusses its roadmap with the big consumers of chips and to some degree designs around those companies' anticipated needs -- the chip in my laptop contains features that only benefit someone running a big data center.

Absent explicit data, one way to measure the turn over would be to look at the used server CPU market. The big companies usually don't dispose of Xeons at the landfill and changes to the price and availability of specific chips over time is likely to correlate with upgrade cycles.

You're missing something: Compute HW for AMZN/GOOG/MSFT isn't a cost center, it's a profit center. Reportedly AWS profit margins on a CPU-hour were something like 80% at one point.

I'd classify profits as part of the operational economics. Someone else might not.

Ah, I seem to have glossed over what you actually wrote and apprehended it as "operational expenditures" vs. "capital expenditures" (ie. OpEx vs. CapEx). My apologies.

Datacenters are power (heat) limited. If a device does half the work of a new one using the same power, it is probably on the way out. Certainly by the time it hits say 30% it's time to go unless it's a special device.

So from a power POV, how many months does it take for a new device to hit 2x-3x FLOPS/Watt-Hour of the old hardware?

Refresh cycles depend on 3 values -- end of warranty, end of life, and end of support.

If you're end of support, you have no bug fixes, no technical support, and very limited part availability. You probably won't be running EOS hardware anywhere in your datacenter without a very defined plan to migrate off of it quickly or it's in a lab somewhere no one cares if it dies.

End of Life is when you can't buy the hardware anymore, so you start (or have already started) with the next generation of hardware. Depending on the policy, this could be the oldest hardware you have. You'll get the new stuff in and migrate over, and get rid of these machines most likely once they're out of warranty. Maybe you'll keep this hardware until EOS for low-priority work.

End of Warranty is the last major date, and it depends on date of purchase of a particular piece of hardware, rather than a product line. If you're on a frequent refresh cycle, once the product costs you money to repair then you'll get rid of it. You'll buy a warranty that most likely matches your depreciation cycle - 4 or 5 years. If you choose to keep machines that are out of warranty they start getting downgraded to less critical roles, or are scrapped as soon as they break. Repairs are variable costs in terms of parts and labor, and that doesn't play nicely with budgeting to manage a machine.

In general the harder it is to replace a single piece of equipment or the more expensive it is, the more likely you'll be running until EOL or EOS. Fileservers and networking (especially core networking) will be kept longer than cheap web servers.

EOW, EOS, and EOL all vary depending on vendors and product lines. It's the reason Dell and HP has a separate business line of desktops and laptops, because the EOL and EOS dates are known in advance, and you can standardize your equipment even when purchasing in multiple batches over 2 years. This reduces how much compatibility testing you have to perform, which is really beneficial for huge deployments.

Interesting answer, but how do those values affect the sort of massive DCs that AMZN/GOOG run? They don't even replace individual machines, they replace whole racks once a designated % of machines in it are failing.

Presumably the HW within a rack is homogenous, though, so perhaps it is treated as a single large computer from this perspective?

Yes, when you're buying enough hardware you treat the rack or row as a purchase unit. A rack is typically hardware plus switching for that rack, so just a few network and power cables are needed to connect a rack to the rest of the infrastructure.

If these companies replace a whole rack when a percentage of machines fail that's a business decision based on cost of hardware versus repair costs. Where I worked we had warranties, so the hardware would get fixed (or die in place for things past warranty) and the entire rack would get rolled out at one time, since we bought in rack increments of capacity and thus they all had the same warranty dates. Letting things die in place obviously depends on how much spare capacity you have in your DC, which could be site-specific within large companies.

> Letting things die in place obviously depends on how much spare capacity you have in your DC,

by "capacity" do you mean room? I associate capacity in this context more with power & cooling (neither of which are consumed by disabled machines).

Or do you mean that if the DC's capacity is constrained, letting systems die in place and only replacing "mostly dead" racks on a rolling basis keeps capacity utilization below a critical threshold?

Generally dead systems don't hurt the datacenter in terms of infrastructure utilization, as they don't use power and they don't use cooling, but they could prevent you from bringing in new compute capacity. So you have to figure out whether you can run both the old and slightly broken systems at the same time as the new and faster systems.

Capacity could mean physical space, since if floor tiles are occupied by a rack you'll have to put it somewhere else. But even if you have physical space to put another rack somewhere, that needs connected to power and cooling.

Depending on your cooling layout (hot boxing, cold boxing, vented floors, etc) you may not have appropriate cooling for a new rack, especially a newer, higher density rack. Or you have sufficient cooling overall, but not for that density in that location because all your high density stuff was planned to go in a designated area.

Same thing with power -- you may not have enough power cables for the power distribution units within a rack (you'd typically have 2 so the redundant power supplies don't share a PDU as a point of failure), the cabling may not reach, the electrical panel could be maxed out on amperage or breaker slots, or you'd throw the load on each of the electrical phases too far out of balance (A previous manager of mine was an electrical engineer, I'm not too sure about the real-world technicalities of this other than he tried to keep everything balanced).

Networking could also be a limiting factor. You could run out of switch ports or SFPs because you only planned for N connections per racks on however many racks, and now you want to keep hardware around longer.

Would also be interesting to know what happens with the hardware afterwards (assuming it still works). Is it just trashed? Or something more sustainable? Sold? Given to poor people?

It is definitely sold. It is believed the E5 2670 v1 Xeons crashing below $100, heck at one point below $65 on eBay was due to Facebook liquidating some servers.

There are a number of alternative practices, typically it is sold to companies that refurbish and resell, sometimes it's donated to companies that refurbish and donate to charities. It is generally not done directly by the company that owns the hardware, as selling or donating hardware accrues legal liabilities for maintenance and support should you sell or donate it directly.

not sure if its a practice, but given the large amount of 1st Generation Xeon E5s that came on the market last year (and they would have been 4-5 years old out of those DCs) and also the large amount of Dell C6100 servers that also arrived on the market (I have one in house that came out of Yahoo) my guestimate would be around the 4-5 year mark... but, again, just a guess...

Some of those servers are fantastic value when you get them from a ex-stock supplier. I was seeing 8 thread and 64gb ram boxes for ~£125. If you are building a dummy cloud these are great to play around with.

If you can snag westmere socket boards for E5/E7 hold on to them.

The old westmere E5/E7 socket upgrade CPU's from 3rd parties are dirt cheap literally ~$120 for a 10/20 thread xeon. Sure it's still a first gen so no AVX, AVX2, AES, etc. But it is still a lot of compute power.

The issue is a 4-8 socket CPU board to fit them in will run you ~$3000.

Those old 20 thread xeons DRINK power. I wouldn't recommend them unless you're getting your energy for free. (Though there is no such thing as a free lunch.)

the Westmere E7s seem to be the equivalint of the Xeon X5600 series[1]... so, between 105 and 130w... sticking 4 or 8 of those in a box will blow most power budgets!


yea, the c6100 i got has 8 xeons (4 nodes) 48gb RAM (4x12) and cost about £300 quid... we got some for the office (64gb per node) and they where about £600... given what you are getting, cant complain!

Out of interest, did you encounter any driver issues? I was warned by a few people saying that some of these servers are designed for the client and therefore have non-standard NIC's and finding drivers could be a problem. I've not encountered this and found Windows, Linux and BSD (FreeNAS) all run from standard installs.

have Windows Server 2016 running on them currently, and no problems. worked out of the box. not tried any other OS, so cant say about those...

Hmm. I wonder when all the Tesla P100 GPU servers being sold now will get dumped on the market...

Does GPU-compute server hardware obsolesce faster or slower than standard CPU-compute hardware?

I do remember seeing older (not sure the generation) NVidia CUDA gear going cheap... when i seen it, it was about 5 year old... so, possibly the same... maybe less...

Usually data centers take hardware for "lease" from other companies which is usually 3-5 years.

I wonder what's the lifetime of hardware across leases and how it goes to different companies

It seems to be a very different problem to car leasing or plane leasing

I'd hate to be the guy who rents a hard drive from the wrong guy. It's like getting the car from the guy who heard sugar is good for gas longevity.

3-5 years looking at the refresh cycle at AWS.

Looking at the stuff I keep in racks at the DC... I try to replace the stuff that's basically from the time that servers didn't do low-power idling, So stuff like dell 1950/2950's. The newer systems like 9th gen HP, and 10+th gen Dell systems are usually quite low-power when idle. Since pricing is mostly based on power-usage (bandwidth being cheap or free), it's just a matter of calculating what the trade-off is between running current hardware vs replacing it. But about 5 years would be the max anyway, simply because getting replacement parts starts getting more difficult.

Four of the top five comments do not answer the question in any way.

Yeah, I noticed that. The headline got rewritten to de-emphasise the focus on the AMZN/GOOG/FB/MSFT's of the world. "The big tech companies" is much vaguer.

At my organization the hardware refreshes usually coincide with other major changes like application or software upgrades. Often times a clone is made of the server to test out the upgrade process for an application, when this occurs the clone is usually newer hardware.

That being said, this happens less frequently now as there's been a huge push to move from dedicated hardware to virtual instances in a cluster and now to cloud resources.

4 to 5 years depending on company depreciation cycle and it also serves as general support end for most of the OS/Software unless you want extended support. And tech refresh is generally aligns with finance depreciation cycle for server type asset. Desktops -- 3 year

Longest time I worked in the same company was 7 years and I got one new PC in that time.

Can someone post links on where you guys bought the hardware second hand?

ebay I suppose

I'm also curious if anyone has suggestions/knowledge about how large companies go about refreshing not just racks/switches, but employee machines and stuff as well.

If the company own the hardware and is a capital expense, then it depends on the accounting policy. Most will depreciate over 3 years I think.

Once in 5 years.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact