Hacker News new | past | comments | ask | show | jobs | submit login

Think much harder about power and cooling. A few points:

1. Talk to your hosting providers and make sure they can support 32kW (or whatever max number you need) in a single rack, in terms of cooling. At many facilities you will have to leave empty space in the rack just to stay below their W per sq ft cooling capacity.

2. If you're running dual power supplies on your servers, with separate power lines coming into the rack, model what will happen if you lose one of the power lines, and all of the load switches to the other. You don't want to blow the circuit breaker on the other line, and lose the entire rack.

3. Thinking about steady state power is fine, but remember you may need to power the entire rack at full power in the worst case. Possibly from only one power feed. Make sure you have excess capacity for this.

The first time I made a significant deployment of physical servers into a colo facility, power and cooling was quite literally the last thing I thought about. I'm guessing this is true for you too, based on the number of words you wrote about power. After several years of experience, power/cooling was almost the only thing I thought about.

Something you should do to understand the actual power consumption of a server (ac power, in watts , at its input):

Build one node with the hardware configuration you intend to use. Same CPU, ram, storage.

Put it on a watt meter accurate to 1W.

Install Debian amd64 on it in a basic config and run 256 threads of the 'cpuburn' package while simultaneously running iozone disk bench and memory benchmarks.

This will give the figure of the absolute maximum load of the server, in watts, when it is running at 100% load on all cores, memory IO, and disk IO.

Watts = heat, since all electricity consumed in a data center is either being used to do physical work like spinning a fan, or is going to end up in the air as heat. Laws of physics. As whatever data center you're using will be responsible for cooling, this is not exactly your problem, but you should be aware of it if you're going to try to do something like 12 kilowatts density per rack.

Then multiply wattage of your prototype unit by number of servers. This will tell you how many fully loaded systems will fit in one 208v 30a circuit on the AC side.

Also, use the system bios option for power recovery to stagger bootup times by 10 seconds each per server, so that in the event of a total power loss and an entire rack of servers does not attempt to power up simultaneously.

This is good advice. I'd consider not having them auto power on, however. This allows them to bring things up in a controlled manner.

That brings me to my next point: GitLab should also being mindful on what services are stored on which hardware - performing heroics to work around circular dependencies is the last thing you want to be doing when recovering from a power outage.

Yes, this is a choice that depends on the facility, and what sort of recovery plan you have for total power failure. In many cases you would want to have everything remain off, and have a remote hands person power up certain Network equipment, and key service, before everything else. On the other hand, you may want to design everything to recover itself, without any button pushing from humans.

Depends a lot of server/HA/software architecture.

Think much harder about power and cooling. A few points:

I've only ever built desktop machines, and this top comment drew a surprising parallel to most help me with my desktop build type posts. Granted, I'm sure as you dig deeper, the reasoning may be much different, but myself being ignorant about a proper server build, it was somehow reassuring to see power and cooling at the top!

Nowadays once you get past 10 racks or 50kw, you generally only pay for power - the space/cooling/etc is "free" as your limiting rate is power and the vendor's ability to move thermal. You'll likely want a chimney cabinet like the Chatsworth GlobalFrame [1].

[1] - http://www.chatsworth.com/products/cabinet-and-enclosure-sys...

This does depend a lot on geographical location. You're not going to get free racks in a major carrier exchange point in Manhattan, or downtown Seattle, or close to the urban core of San Francisco. There will definitely be a significant quantity discount, as you increase in numbers, and a lot of your cost will be power and not racks.

In somewhere that is very close (OSI layer 1) topologically to a major traffic interexchange point, you will definitely be paying somehow for the monthly cost of the square footage occupied. For example a colo in 60 Hudson or 25 Broadway in NYC, or in one Wilshire in LA.

Large-scale colocation pricing that is based on power only will be found in locations that are not also major traffic inter exchange points. For example quincy WA or the many data centers in suburban new jersey.

I think your point is right for 111 8th or Equinix/SV1/Great Oaks.

But for other sites with high value peering, including EQIX in Ashburn, Coresite, your limiting rate is likely to be the power, not the space or power. I.e. they'll "give" you 1 cabinet for every 15kw you buy.

So my assertion assumes you're doing large number of dense cabinets.

If one is doing a large number of dense cabinets, they almost certainly should not do it at a high value peering point, and should backhaul it. You should be able to get diverse metro dark fiber for <$5k/mo if not substantially less. Put a single cabinet (or pair for redundancy) of $2k/mo cabinets in Equinix or Fisher and off you go.

fully agreed here, especially leave yourself room for when you misjudged some resource utilization (and need more). Nothing worse than having a resource crunch (cpu/mem/io) and not being able to resolve it because your rack is out of power/cooling/etc - you'll come to appreciate how easy it was in cloud just clicking the button and turning out your wallet.

Great points. We'll make sure to wire to separate power feeds that can both handle the entire load. Suggestions in how to calculate this? Taking the maximum rated load seems over the top.

I recently had to do this. The server I was putting up was rated for 3kW. To determine the expected load, I put it under a dummy load that I reasonably considered the maximum for what I would expect on the server (this was a dev machine, so I picked compiling the linux kernel as a benchmark). I ran that until the power stabilized (SuperMicro servers can measure power consumption in hardware and expose this via IPMI - very handy), because power consumption may keep creeping up for a few minutes as the fans adjust to the new operating temperature. I then repeated the same exercise with a CPU torture test (Prime95), just to see what the maximum I could possible get out of the machine was. The numbers turned out to be about 1.8kW for the linux kernel benchmark, and 2.3kW for the torture test. What I ended up doing was to provision for 2kW and use the BIOS' power limiting feature to enforce this. That would kill the machine if it exceeded its designed load, but that's usually better than tripping the breaker and killing the entire circuit. You may also want to talk to your data center provider about overrages. Some of the quotes I got wouldn't kill your circuit when you went over, but would just charge you (a crazy amount, but better than losing all your servers).

Hope that helps. Your deployment is larger than ours, so there may be other techniques, but that's what we did.

I love the idea of killing the server instead of the circuit, thanks!

Might be better to just throttle the CPU/etc when too much power is being used.

Prime95! I haven't seen that since 2000-2001! Solid tool for burning in a box and putting the CPU under maximum load.

Having designed and run colo data centers for many years my rule of thumb for calculating this for customers was to use the vendors tools if available or take 80% of power supply rating. Keep in mind that most servers will have a peak load during initial power on as the fans and components come online and run through testing. If a rack ever comes up on a single channel and the circuits are not rated right that breaker will just pop right back offline and you'll have to bring it up by unplugging servers and bringing them up in sets. Also most breakers are only rated at 80% load so you have to de-rate them for your load. So, e.g., a 20amp x 208 3 phase circuit really only has 8KW of constant power draw available.

+1. As tempting as it is to say "oh this server only needs full power when it boots" this may come back to bite you. As you grow usage in CPU and Disk, the power used by the server will increase substantially.

note that you can get around the 80% breaker limit by having your DC hardwire the power, if you have enough scale to have them do this for you.

Circuit breakers aren't just there for the amusement value of watching a clustered system go into split-brain mode...

I.e. you would also want to be sure that your wiring was rated for 100% utilization, and that other circuit-breaker-like functions exist.

Fire is an actual thing, and figuring out the best way to recharge a halon system isn't exactly what you want to be doing.

Yes. That's why they hardwire and use a breaker rated at 100%. In most jurisdiction, code requires breaker at 80% if a plug/receptacle is used. If you are hardwired you can use 100% of your capacity before tripping the breaker. You have incorrectly assumed that I suggested that you ignore breakers.

When you hardwire the circuit the electrical code allows you to use a 100% breaker.

Thank you for adding this detail... what you said makes WAY more sense to me now.

What exactly gets hardwired to what? This is surprisingly hard to search for details on.

Instead of your plug of your PDU going into a receptacle, the wires that would go into the plug are hardwired to a panel circuit breaker.

This is less common in DCs historically but more and more as folks do 208v 3phase 100A circuits.

This really is only a concern when you are paying a monthly recurring charge (MRC) by the breaker amp with many power drops.

For a deployment of this scale it should be metered power (For example 1 (or more) 3phase a+b drops to each cabinet) where you only pay a Non-Recurring setup Charge (NRC) and then the MRC is based on actual power draw.

3phase also means fewer physical PDU's (uses less space), but more physical breakers. Over-building delivery capability will eliminate any over-draw concerns for startup cycles.

Not really, although I agree with your reasoning. The other is issue is capex. When I deploy 240kW pods, if I use 80% breakers, I have to deploy 25% more PDUs than if I have 100% breakers.

Since my cabinet number is usually evenly divisible by N*PDUs, this impacts overall capital.

We are talking about 1 to 2 cab density here so capex doesn't carry that much weight.

Having a little headroom on your power circuits is also incredibility important, and not every facility will sell 100% rated breakers. It may make more sense to be in a facility with 80% rated breakers than 100%, even with the added capex of an extra PDU or two.

Goes back to my previous comment. What is important to you, at the pod / multiple pod level, isn't as important to the 1-2 cab deployment.

That's fair and accurate.

Similarly, ensure spare room in the cabinet for adjustments, that thing you forgot, and small growth. Much better to have 70% full and not need the space then to have no free RU and need the space.

Most DC PDUs will stagger your outlet power on to stagger the initially large power draws.

Some DCs provide PDU's and some don't.

At RS we would burn the servers in measure the load with a clamp meter. For large scale build-outs(swift, cloud servers) we would burn-in entire racks and measure the load reported by the DC power equipment.

I would recommend verifying everything is fault tolerant/HA as expected every step of the way. We ran into issues where the power strips on both sides were plugged into the same circuit(D'oh), wrong SST's, redundant routers getting cabled up to the same power strips, etc and you name it.

After a rack is setup have people at the DC(your own employees or the DC's techs) help simulate(create) failure in power, networking, and systems to verify everything is setup correct. It sounds like you have people coming onboard with experience provisioning/delivering physical systems though, so I would expect them to be on the ball with most of this stuff.

Thanks, that is very helpful. And we haven't made these hires yet but we hope we can in the coming months.

You can certainly do some math/estimates based on looking at individual component specifications, but I like using a power meter (built in to some PDUs - which is a feature worth having, or you can buy one, they are quite inexpensive).

A system at idle vs full CPU vs full cpu + all disk will produce very different measurements.

Also keep in mind 80% derating - many electrical codes will state that an X amp circuit should only be used at 0.8X on a continuous basis (and the circuit breakers will be sized accordingly).

For HP gear, use HP Power Advisor utility. For Dell, see the Data Center Capacity Planner. Not sure what SuperMicro has -- check with your VAR.

you want 2*N Power feeds where N = 1 or 2. I recommend ServerTech PDUs. You'll want to factor in the maximum load, but with the amount of servers and power you're doing, you should be able to negotiate metered power at under $250/kW/month. Give them a commit that's 1/3 of your TDW/reportd power load, and then pay for what you use.

I just remembered a blog post I wrote a while back about exactly these points:



These are all points a competent engineer would raise :)

I've found that being considered a 'competent engineer' merely means 'never too arrogant to learn'.

I didn't know half of the stuff in grandparents' post.

I was attempting to confirm that these are all points are important. Being too arrogant to learn and being in incompetent are two entirely different things.

Sorry, I misunderstood. To me, your response came across as a "well, duh!".

No problem; as I read it back to myself I can see how it came across that way.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact