
Ask HN: How often is hardware at the big tech companies refreshed? - webmaven
Does anyone have data that would suggest how long it takes an org at the scale of an Amazon, Google, or Facebook to entirely replace their HW?<p>I know that current practice at that scale is to not replace an individual chassis, only whole racks when a certain % of servers have failed, but that can&#x27;t be the whole story, since presumably servers&#x2F;racks are replaced if they are old enough even if they happen to still be running OK.<p>Google for example almost certainly has no servers (or switches, etc.) running that date back to Y2k even though statistically, <i>some</i> 16-year-old HW would still be working if it hadn&#x27;t been yanked.<p>So, assuming the HW hasn&#x27;t actually failed, how old is <i>too</i> old for a &quot;web-scale&quot; company to keep around? And is the answer different for these companies&#x27; own services vs. externally facing IaaS&#x2F;PaaS?
======
QuantumRoar
Once you own data centers, energy consumption becomes a major consideration.

I'm not doing stuff like that but I assume the train of thought concerning the
economics of energy consumption is like that: You buy new hardware and from
experience you know it's going to last on average a few years. During the
lifetime of your new hardware you can save some amount of money on electricity
because your new hardware is more efficient than the old one. So it would make
sense to hit the buy button for the new hardware when you can save money:

(Savings in electricity over the lifetime) - (Price of new hardware) > 0

I assume after a few years the savings may become significant.

~~~
rerx
I assume the costs of repairs and spare parts would also enter that equation.

~~~
tomahunt
An aside: it can be a good exercise to use the above reasoning to decide when
to buy a new car

~~~
marcosdumay
That reasoning will push buying new cars away to the moment the old ones
completely fail, and not a second before.

For cars it is much more important to calculate the risk of it failing in some
moment you need it, and the costs of being suddenly carless. Also, for cars
that go on a road, the most relevant factor is safety.

------
bsenftner
Lot's of companies lease their hardware, including servers. When that lease
expires, that hardware goes on the reseller market dirt cheap. That's how I
built my own 17-node server cluster for my (failed) startup, turning what
would have been $90K per month AWS expense into a $600 colocation expense.
Owning and and maintaining your own servers is not hard at all, folks.

~~~
HalcyonicStorm
I'm interested in buying hardware for my own nefarious purposes. Can you post
links?

~~~
DanBC
You can get it given to you if you print up some "free e-waste collection
service" flyers.

Here's one (Australian?) guy and his small business, with some examples of the
stuff he collects:
[https://www.youtube.com/watch?v=BbMmuLDMx4Y](https://www.youtube.com/watch?v=BbMmuLDMx4Y)

------
tyingq
Interesting question. I'm wondering if there's some hand wringing over at
Intel about refresh cycles. Moore's law isn't what it used to be. My
experience in large, non-tech companies is they've tended to follow a 5 year
refresh cycle. But, a 5 year old server is still useful, and not terribly
power hungry these days. I could easily see a decision to stretch that out to
6.

~~~
ptero
As a data point, at my current place (a large, university-affiliated R&D lab),
the upgrade cycles are different for "servers" that do some externally visible
/ useful stuff and workstations.

Servers generally get upgraded when the next functionality upgrade topples the
current hardware or when the hardware physically dies (often WELL over 5
years).

Workstations / laptops are upgraded whenever a user wants a new one. Many
folks (myself included) prefer to keep our current systems for a long time.
Components and peripherals get upgraded much more frequently, as people keep
buying hard drives, GPUs and other new toys.

~~~
tyingq
Yes...a fixed refresh cycle is more common in corporate environments. And it's
more of a budgetary thing. IT has some flexibility to keep old servers, but
the budget they have to refresh is calculated on some X years cycle. So they
are limited in the number they can purchase each year to refresh existing
servers. Net new functionality or applications typically fund the first cycle
for servers that have a net new use.

------
dc2447
There are a few factors here.

Depreciation: is typically three years for hardware purchased rather than
leased. At the end of 3 years the item has zero value thus can be replaced.

Useful life: is hopefully longer and up to five years. So an item is without
value but can still be used. More typical with networking hardware than
compute.

The other poster who mentioned power consumption is totally correct. It can
make sense to renew hardware even before it has fully depreciated in order to
get better datacentre density and lower power draw.

This is what is driving most refreshes in my experience.

------
rufius
Working for Big Company...

Three year rolling cycles for servers. That is, you're not replacing hardware
all at once but incrementally.

Similar for dev machines. If you imagine a dev has two dev boxes and a laptop,
spread that over the three year cycle.

~~~
webmaven
Is your Big Company one of the Really Big Companies (eg.
AMZN/GOOG/FB/MSFT/KRX/etc.) my question was about?

------
brudgers
I suspect that as with most businesses, it's not a decision that's driven
directly by technology, but rather by the operational economics of new
technologies versus the capital costs (where the capital costs include
depreciation and resale in addition acquisition and installation costs).

When looking at what large companies do in regard to acquisition, the view
through the lens is diffracted by the relationship those companies have with
hardware suppliers...e.g. Intel discusses its roadmap with the big consumers
of chips and to some degree designs around those companies' anticipated needs
-- the chip in my laptop contains features that only benefit someone running a
big data center.

Absent explicit data, one way to measure the turn over would be to look at the
used server CPU market. The big companies usually don't dispose of Xeons at
the landfill and changes to the price and availability of specific chips over
time is likely to correlate with upgrade cycles.

~~~
webmaven
You're missing something: Compute HW for AMZN/GOOG/MSFT isn't a cost center,
it's a profit center. Reportedly AWS profit margins on a CPU-hour were
something like 80% at one point.

~~~
brudgers
I'd classify profits as part of the operational economics. Someone else might
not.

~~~
webmaven
Ah, I seem to have glossed over what you actually wrote and apprehended it as
"operational expenditures" vs. "capital expenditures" (ie. OpEx vs. CapEx). My
apologies.

------
jzwinck
Datacenters are power (heat) limited. If a device does half the work of a new
one using the same power, it is probably on the way out. Certainly by the time
it hits say 30% it's time to go unless it's a special device.

~~~
webmaven
So from a power POV, how many months does it take for a new device to hit
2x-3x FLOPS/Watt-Hour of the old hardware?

------
caw
Refresh cycles depend on 3 values -- end of warranty, end of life, and end of
support.

If you're end of support, you have no bug fixes, no technical support, and
very limited part availability. You probably won't be running EOS hardware
anywhere in your datacenter without a very defined plan to migrate off of it
quickly or it's in a lab somewhere no one cares if it dies.

End of Life is when you can't buy the hardware anymore, so you start (or have
already started) with the next generation of hardware. Depending on the
policy, this could be the oldest hardware you have. You'll get the new stuff
in and migrate over, and get rid of these machines most likely once they're
out of warranty. Maybe you'll keep this hardware until EOS for low-priority
work.

End of Warranty is the last major date, and it depends on date of purchase of
a particular piece of hardware, rather than a product line. If you're on a
frequent refresh cycle, once the product costs you money to repair then you'll
get rid of it. You'll buy a warranty that most likely matches your
depreciation cycle - 4 or 5 years. If you choose to keep machines that are out
of warranty they start getting downgraded to less critical roles, or are
scrapped as soon as they break. Repairs are variable costs in terms of parts
and labor, and that doesn't play nicely with budgeting to manage a machine.

In general the harder it is to replace a single piece of equipment or the more
expensive it is, the more likely you'll be running until EOL or EOS.
Fileservers and networking (especially core networking) will be kept longer
than cheap web servers.

EOW, EOS, and EOL all vary depending on vendors and product lines. It's the
reason Dell and HP has a separate business line of desktops and laptops,
because the EOL and EOS dates are known in advance, and you can standardize
your equipment even when purchasing in multiple batches over 2 years. This
reduces how much compatibility testing you have to perform, which is really
beneficial for huge deployments.

~~~
webmaven
Interesting answer, but how do those values affect the sort of massive DCs
that AMZN/GOOG run? They don't even replace individual machines, they replace
whole racks once a designated % of machines in it are failing.

Presumably the HW _within_ a rack is homogenous, though, so perhaps it is
treated as a single large computer from this perspective?

~~~
caw
Yes, when you're buying enough hardware you treat the rack or row as a
purchase unit. A rack is typically hardware plus switching for that rack, so
just a few network and power cables are needed to connect a rack to the rest
of the infrastructure.

If these companies replace a whole rack when a percentage of machines fail
that's a business decision based on cost of hardware versus repair costs.
Where I worked we had warranties, so the hardware would get fixed (or die in
place for things past warranty) and the entire rack would get rolled out at
one time, since we bought in rack increments of capacity and thus they all had
the same warranty dates. Letting things die in place obviously depends on how
much spare capacity you have in your DC, which could be site-specific within
large companies.

~~~
webmaven
_> Letting things die in place obviously depends on how much spare capacity
you have in your DC,_

by "capacity" do you mean room? I associate capacity in this context more with
power & cooling (neither of which are consumed by disabled machines).

Or do you mean that if the DC's capacity is constrained, letting systems die
in place and only replacing "mostly dead" racks on a rolling basis keeps
capacity utilization _below_ a critical threshold?

~~~
caw
Generally dead systems don't hurt the datacenter in terms of infrastructure
utilization, as they don't use power and they don't use cooling, but they
could prevent you from bringing in new compute capacity. So you have to figure
out whether you can run both the old and slightly broken systems at the same
time as the new and faster systems.

Capacity could mean physical space, since if floor tiles are occupied by a
rack you'll have to put it somewhere else. But even if you have physical space
to put another rack somewhere, that needs connected to power and cooling.

Depending on your cooling layout (hot boxing, cold boxing, vented floors, etc)
you may not have appropriate cooling for a new rack, especially a newer,
higher density rack. Or you have sufficient cooling overall, but not for that
density in that location because all your high density stuff was planned to go
in a designated area.

Same thing with power -- you may not have enough power cables for the power
distribution units within a rack (you'd typically have 2 so the redundant
power supplies don't share a PDU as a point of failure), the cabling may not
reach, the electrical panel could be maxed out on amperage or breaker slots,
or you'd throw the load on each of the electrical phases too far out of
balance (A previous manager of mine was an electrical engineer, I'm not too
sure about the real-world technicalities of this other than he tried to keep
everything balanced).

Networking could also be a limiting factor. You could run out of switch ports
or SFPs because you only planned for N connections per racks on however many
racks, and now you want to keep hardware around longer.

------
stinos
Would also be interesting to know what happens with the hardware afterwards
(assuming it still works). Is it just trashed? Or something more sustainable?
Sold? Given to poor people?

~~~
chx
It is definitely sold. It is believed the E5 2670 v1 Xeons crashing below
$100, heck at one point below $65 on eBay was due to Facebook liquidating some
servers.

------
tiernano
not sure if its a practice, but given the large amount of 1st Generation Xeon
E5s that came on the market last year (and they would have been 4-5 years old
out of those DCs) and also the large amount of Dell C6100 servers that also
arrived on the market (I have one in house that came out of Yahoo) my
guestimate would be around the 4-5 year mark... but, again, just a guess...

~~~
alex_hitchins
Some of those servers are fantastic value when you get them from a ex-stock
supplier. I was seeing 8 thread and 64gb ram boxes for ~£125. If you are
building a dummy cloud these are great to play around with.

~~~
valarauca1
If you can snag westmere socket boards for E5/E7 hold on to them.

The old westmere E5/E7 socket upgrade CPU's from 3rd parties are _dirt_ cheap
literally ~$120 for a 10/20 thread xeon. Sure it's still a first gen so no
AVX, AVX2, AES, etc. But it is still a lot of compute power.

The issue is a 4-8 socket CPU board to fit them in will run you ~$3000.

~~~
shiftpgdn
Those old 20 thread xeons DRINK power. I wouldn't recommend them unless you're
getting your energy for free. (Though there is no such thing as a free lunch.)

~~~
tiernano
the Westmere E7s seem to be the equivalint of the Xeon X5600 series[1]... so,
between 105 and 130w... sticking 4 or 8 of those in a box will blow most power
budgets!

[1]:[https://en.wikipedia.org/wiki/Westmere_(microarchitecture)#S...](https://en.wikipedia.org/wiki/Westmere_\(microarchitecture\)#Server_.2F_Desktop_processors)

------
sairamkunala
Usually data centers take hardware for "lease" from other companies which is
usually 3-5 years.

~~~
raverbashing
I wonder what's the lifetime of hardware across leases and how it goes to
different companies

It seems to be a very different problem to car leasing or plane leasing

~~~
gravypod
I'd hate to be the guy who rents a hard drive from the wrong guy. It's like
getting the car from the guy who heard sugar is good for gas longevity.

------
unkoman
3-5 years looking at the refresh cycle at AWS.

~~~
sigio
Looking at the stuff I keep in racks at the DC... I try to replace the stuff
that's basically from the time that servers didn't do low-power idling, So
stuff like dell 1950/2950's. The newer systems like 9th gen HP, and 10+th gen
Dell systems are usually quite low-power when idle. Since pricing is mostly
based on power-usage (bandwidth being cheap or free), it's just a matter of
calculating what the trade-off is between running current hardware vs
replacing it. But about 5 years would be the max anyway, simply because
getting replacement parts starts getting more difficult.

------
HowDoesItWork
Four of the top five comments do not answer the question in any way.

~~~
webmaven
Yeah, I noticed that. The headline got rewritten to de-emphasise the focus on
the AMZN/GOOG/FB/MSFT's of the world. "The big tech companies" is much vaguer.

------
cptskippy
At my organization the hardware refreshes usually coincide with other major
changes like application or software upgrades. Often times a clone is made of
the server to test out the upgrade process for an application, when this
occurs the clone is usually newer hardware.

That being said, this happens less frequently now as there's been a huge push
to move from dedicated hardware to virtual instances in a cluster and now to
cloud resources.

------
jsudhams
4 to 5 years depending on company depreciation cycle and it also serves as
general support end for most of the OS/Software unless you want extended
support. And tech refresh is generally aligns with finance depreciation cycle
for server type asset. Desktops -- 3 year

------
k__
Longest time I worked in the same company was 7 years and I got one new PC in
that time.

------
HalcyonicStorm
Can someone post links on where you guys bought the hardware second hand?

~~~
ju-st
ebay I suppose

------
kzisme
I'm also curious if anyone has suggestions/knowledge about how large companies
go about refreshing not just racks/switches, but employee machines and stuff
as well.

------
jaxondu
If the company own the hardware and is a capital expense, then it depends on
the accounting policy. Most will depreciate over 3 years I think.

------
batina
Once in 5 years.

