1. Talk to your hosting providers and make sure they can support 32kW (or whatever max number you need) in a single rack, in terms of cooling. At many facilities you will have to leave empty space in the rack just to stay below their W per sq ft cooling capacity.
2. If you're running dual power supplies on your servers, with separate power lines coming into the rack, model what will happen if you lose one of the power lines, and all of the load switches to the other. You don't want to blow the circuit breaker on the other line, and lose the entire rack.
3. Thinking about steady state power is fine, but remember you may need to power the entire rack at full power in the worst case. Possibly from only one power feed. Make sure you have excess capacity for this.
The first time I made a significant deployment of physical servers into a colo facility, power and cooling was quite literally the last thing I thought about. I'm guessing this is true for you too, based on the number of words you wrote about power. After several years of experience, power/cooling was almost the only thing I thought about.
Build one node with the hardware configuration you intend to use. Same CPU, ram, storage.
Put it on a watt meter accurate to 1W.
Install Debian amd64 on it in a basic config and run 256 threads of the 'cpuburn' package while simultaneously running iozone disk bench and memory benchmarks.
This will give the figure of the absolute maximum load of the server, in watts, when it is running at 100% load on all cores, memory IO, and disk IO.
Watts = heat, since all electricity consumed in a data center is either being used to do physical work like spinning a fan, or is going to end up in the air as heat. Laws of physics. As whatever data center you're using will be responsible for cooling, this is not exactly your problem, but you should be aware of it if you're going to try to do something like 12 kilowatts density per rack.
Then multiply wattage of your prototype unit by number of servers. This will tell you how many fully loaded systems will fit in one 208v 30a circuit on the AC side.
Also, use the system bios option for power recovery to stagger bootup times by 10 seconds each per server, so that in the event of a total power loss and an entire rack of servers does not attempt to power up simultaneously.
That brings me to my next point: GitLab should also being mindful on what services are stored on which hardware - performing heroics to work around circular dependencies is the last thing you want to be doing when recovering from a power outage.
Depends a lot of server/HA/software architecture.
I've only ever built desktop machines, and this top comment drew a surprising parallel to most help me with my desktop build type posts. Granted, I'm sure as you dig deeper, the reasoning may be much different, but myself being ignorant about a proper server build, it was somehow reassuring to see power and cooling at the top!
 - http://www.chatsworth.com/products/cabinet-and-enclosure-sys...
In somewhere that is very close (OSI layer 1) topologically to a major traffic interexchange point, you will definitely be paying somehow for the monthly cost of the square footage occupied. For example a colo in 60 Hudson or 25 Broadway in NYC, or in one Wilshire in LA.
Large-scale colocation pricing that is based on power only will be found in locations that are not also major traffic inter exchange points. For example quincy WA or the many data centers in suburban new jersey.
But for other sites with high value peering, including EQIX in Ashburn, Coresite, your limiting rate is likely to be the power, not the space or power. I.e. they'll "give" you 1 cabinet for every 15kw you buy.
So my assertion assumes you're doing large number of dense cabinets.
If one is doing a large number of dense cabinets, they almost certainly should not do it at a high value peering point, and should backhaul it. You should be able to get diverse metro dark fiber for <$5k/mo if not substantially less. Put a single cabinet (or pair for redundancy) of $2k/mo cabinets in Equinix or Fisher and off you go.
Hope that helps. Your deployment is larger than ours, so there may be other techniques, but that's what we did.
I.e. you would also want to be sure that your wiring was rated for 100% utilization, and that other circuit-breaker-like functions exist.
Fire is an actual thing, and figuring out the best way to recharge a halon system isn't exactly what you want to be doing.
When you hardwire the circuit the electrical code allows you to use a 100% breaker.
This is less common in DCs historically but more and more as folks do 208v 3phase 100A circuits.
For a deployment of this scale it should be metered power (For example 1 (or more) 3phase a+b drops to each cabinet) where you only pay a Non-Recurring setup Charge (NRC) and then the MRC is based on actual power draw.
3phase also means fewer physical PDU's (uses less space), but more physical breakers. Over-building delivery capability will eliminate any over-draw concerns for startup cycles.
Since my cabinet number is usually evenly divisible by N*PDUs, this impacts overall capital.
Having a little headroom on your power circuits is also incredibility important, and not every facility will sell 100% rated breakers. It may make more sense to be in a facility with 80% rated breakers than 100%, even with the added capex of an extra PDU or two.
Goes back to my previous comment. What is important to you, at the pod / multiple pod level, isn't as important to the 1-2 cab deployment.
Similarly, ensure spare room in the cabinet for adjustments, that thing you forgot, and small growth. Much better to have 70% full and not need the space then to have no free RU and need the space.
I would recommend verifying everything is fault tolerant/HA as expected every step of the way. We ran into issues where the power strips on both sides were plugged into the same circuit(D'oh), wrong SST's, redundant routers getting cabled up to the same power strips, etc and you name it.
After a rack is setup have people at the DC(your own employees or the DC's techs) help simulate(create) failure in power, networking, and systems to verify everything is setup correct. It sounds like you have people coming onboard with experience provisioning/delivering physical systems though, so I would expect them to be on the ball with most of this stuff.
A system at idle vs full CPU vs full cpu + all disk will produce very different measurements.
Also keep in mind 80% derating - many electrical codes will state that an X amp circuit should only be used at 0.8X on a continuous basis (and the circuit breakers will be sized accordingly).
I didn't know half of the stuff in grandparents' post.