Hacker News new | past | comments | ask | show | jobs | submit login
Google joins Open Compute Project to drive standards in IT infrastructure (googleblog.com)
152 points by rey12rey on March 9, 2016 | hide | past | favorite | 41 comments



Finally, racks go metric.

The 19 inch rack is one of the oldest standards in computing. ENIAC used 19 inch racks. Open Compute, though, uses wider, and metric, racks. 19 inch rack gear can be mounted in Open Rack with an adapter.

The Open Compute spec says that shelves of IT gear are provided with 12 VDC power. There's power conversion in the base of the rack. Facebook has standards for distribution to the racks at 54VDC, 277VAC 3Ø, and 230 VAC (Eurovolt). Apparently Google wants to add 48VDC, which was the old standard for telephone central offices.

Facebook's choice of 54VDC distribution is strange. Anyone know why they picked that number?


54VDC and 48VDC are roughly the same thing in telco land. E.g. I hooked up a Juniper MX960 last week with "-48 VDC Nominal" power supplies. The operating voltage range is -40 to -72VDC.

A "48V" rectifier will output 54VDC to the power bus (feeding batteries and systems) since it's battery float voltage. If the rectifier goes offline (loss of AC power) the batteries will power the bus keeping systems online.

I'm curious what the actual difference in practice is between Facebook's 54VDC and Google's 48VDC.


The historical telecom standard is -48 VDC. 48 is easy to understand (using 4 of the banks of 12 VDC lead-acid batteries in the basement of the CO building). The DC part is also easy to understand these days. But why a negative voltage? Think about it before you look it up!

And why -54 VDC? Well typical lead acid chemistry actually yields 2.25 V cells -- 24 of those is 54, so it's another legacy artifact. Modern battery chemistry is different, but so much of the other tech is too. Here's a modern power supply I just found (http://www.tdipower.com/PDF/archive/10A_dcdc_conv.pdf) that supplies between -47 and -54 ... hmm, wonder why?

Thankfully, telecom is fortunately one of the most backwards-compatible industries in the world, which is why the whole rickety edifice continues to work well (and support tons of innovation) even when people use it in ways never intended by its designers.

BTW I'd be disappointed to learn if any reasonable 21st century EE education didn't still include this stuff.


> Finally, racks go metric.

How is that a benefit for anyone other than hardware makers?


> energy efficient and more cost effective ... engaging the industry to identify better disk solutions for cloud based applications

My pet issue w/IT infrastructure is the management modules. Finding a server w/a management module that works everytime is nigh impossible. Do google and Facebook design their own or do they somehow just work around their quirks?


Google and Facebook don't bother with management modules; They're more likely on switched power for each machine, with significant work in automatic netboot reprovisioning when things go wrong on these systems. In the event that fails, they swap machines out for diagnosis... if they bother with the diagnosis part.


Ahh yes, the tried and true "turn it off and on again" philosophy...

All kidding aside, if it's a random rare issue (some bit flipped somewhere), chances are that power cycling it may fix the issue (upon initialization). If it's anything more serious, the issue will likely persist, in which case taking the machine out of production is likely the best immediate course of action.


AFAIK Facebook tried reboot-on-LAN for one generation then went back to BMCs. I haven't seen any explanation about why, though.


Facebook apparently is just getting to this recently:

https://code.facebook.com/posts/1471778586452119/openbmc-for...


I haven't followed this project too closely, but, it seems interesting that it has taken this long for Google to join (and the conspicuous absence of Amazon). Anyone able to speculate on why now?


Pure speculation: Google was the first to build very high density server farms. As such, they had various proprietary designs and approaches, and became heavily invested.

And then it took them a while to decide that leveraging the open source ecosystem would be better than maintaining those various proprietary elements.


I don't think it is speculation, Google has always considered the infrastructure things they did as a competitive advantage, super top secret stuff that even other employees don't know about kind of silos.

And to your second point, it is interesting that OCP (some of it perhaps influenced by people who had worked at Google and then later worked at Facebook) appears to have minimized some of the advantages Google had, and now they find themselves possibly getting behind (one of the nice things, and bad things, about open projects like this are that you get a lot more resources applied to making things better than one company can muster).

For similar reasons I wonder if ARM will displace x86 if only because there are probably 15 to 20 independent teams of smart people with ARM architecture licenses making better ARM processors, and perhaps 4 such teams at Intel making improvements in the x86 architecture. At some point it seems "open" seems to eventually overwhelms "walled garden." Although I'm totally open to counter examples where that hasn't been the case.


From what I've seen the challenge ARM faces in the datacenter is support for ECC ram.


I agree with that, I would love to see an ARM server chip with a full ECC memory path. However I note that ECC memory subsystems are "well understood" from a hardware perspective so one of the teams working on ARM processors will no doubt apply that to their version and we'll if there is more demand than just the two of us :-)


I certainly agree with the underlying idea, and maybe its not an apples to apples comparison, but Openstack would be an example of an open cloud platform that has failed to provide such benefits.

Rackspace has largely abandoned the project after initially opening it up, and public clouds based on Openstack seem to be getting shut down left and right (I could go on a long rant about why that is the case, since I am an Openstack contributor :)


This makes sense. Perhaps Amazon hasn't come far enough along in that process, or, they see their proprietary designs as competitive advantage.


If you watch closely on Google's move in the area of externalizing internal technologies, you can see a clear trend of reducing secrecy of Google technologies.

The reason, apparently is because Google can no longer sit comfortably behind the scene to watch others play the catch up. Previously Google do not care, because they dont want to share internal Cloud with you. Now they need the profit from Cloud. Google needs to show that it is still the leader. Previously, Google does not need to show anything and all believe they are the leader. After missing the Cloud movement in the past years, Google's reputation has been tainted, and they have to show hard proofs.

For example, when choosing cloud providers, AWS is still the clear no1. People probably still believe that GCP is better than AWS. But GCP does not have the proof. However, if Google somehow can catch up and beat AWS in a decisive manner overnight, then Google still does not need to share anything; and no one will question that Google is the leader. Apparently, Google is not able to do that. It might be that Google is not focused enough. But Google cannot let people have the other impression "Google's technologies might not be as good as we thought" grow in people's mind.

Can you imagine what would happen if Google started the Cloud thingy at the same time as AWS? Then Google certainly do not need to share a damn thing with outside world...


I kind of see where you are going with this, but perception may not be as important here as you are suggesting.

To address your last 'if' scenario - notice that Amazon, as a large online retailer, had a compelling reason to offer up the use of the infrastructure they had to build anyway, which was over-provisioned for most of the year. I don't have numbers, but I would be surprised if Google has anything close to the kind of seasonality Amazon experiences.

Once it became clear that Amazon was onto something with this whole utility computing/cloud thing, the rest had to play catch up, and continue to do so to this day.


It doesn't make sense to me that sharing these details necessarily improves the reputation of GCP. The point of cloud platforms is that you don't have to know or think about how the server racks are made or stored.


Why would you choose GCP over AWS at this point? Other than the potential that in the near future GCP can provide services that are cheaper, more reliable, and more performant (note that none of these are true today).


Just yesterday Quizlet posted a long expose that, in their opinions, shows that GCP is cheaper, more reliable (live migration), more performant than AWS, with details to boot:

https://news.ycombinator.com/item?id=11260137

And last month Spotify did the same thing.

The price differences between EC2 and GCE, especially with Custom VMs, is massive. Up to 40% according to this:

http://www.zdnet.com/article/what-google-says-to-aws-price-c...


It does not sound long at all. This initiative was started by Facebook relatively recently. From the sounds of the article Google have worked with Facebook on this 48v power supply aspect, and having proven some results in a Facebook server, they are now able to contribute that to the standard. They are not adding much else by the sounds of it. I doubt Google will build servers to a common standard, theirs will be far better.


This is great news and just another nail in the coffin of what Wired calls the Fucked By Cloud vendors: http://www.wired.com/2015/10/meet-walking-dead-hp-cisco-dell...


He fails to explain why "Amazon" is the future. He starts off by insinuating it's because traditional vendors are more expensive... I guess I'd ask to see the raw data he's using.

I've run the numbers, AWS isn't cheaper AT ALL unless you're talking bursting workloads that run for less than a month, or a company that only needs one or two servers but still needs the reliability of a larger environment. Anything outside of that is cheaper to do on-prem 9 times out of 10.

Either he's got his head in the sand, has bought into the "cloud is cheaper hype", or he has other justifications he fails to list.


But cloud, because cloud and cloud cloud!

We recently did the math on some cloud database solutions vs. warehousing data ourselves and using Glacier as a last-resort backup. Turns out it's orders of magnitude cheaper to DIY and would remain so unless we needed -- as you say -- a whole lot of burstable compute. In that case we could always upload (or ship) all the data to the cloud if we wanted.

Cloud is cheaper for some work loads and use cases but not others. You have to analyze your own problem and see what works best.

What cloud vendors have managed to do is put a ton of marketing hype out there to the point that cloud has captured the entire discourse. Today if it's not cloud it's not cloud and that means it's not cloud, because cloud. I've heard many stories recently of companies spending 2-4X what they currently spend on IT to put it in the cloud in order to save money. It's hilarious.


While there's arguments on both side for TCO of Cloud, I'll offer one idea - a solution like BigQuery is zero-Ops, lets you scale to thousands of cores in a second, and you pay per-second. This is both cheaper, faster, and easier to use, simply because it breaks many assumptions folks make when doing TCO comparisons:

https://cloud.google.com/blog/big-data/2016/02/understanding...


For our app the killer was storage costs.

Of course our app is probably different from your app. For us we are doing real-time network testing and diagnostics of a bunch of distributed networks and dumping the data into a series of tables for analysis and diagnostic use. We can also use that data to improve our software in the long term.

A lot of data is generated, but if we lose a little bit it won't kill us. The analytics we want to do right now are fairly consistent in terms of load.

For this work load the cheapest option by far is to buy a RAID array and warehouse it locally. It can also be backed up very cheaply in Glacier so that if we do lose some we can get it back if we want it.

But of course like I said our app is probably not like yours. My point was that you have to do the analysis for your exact specific problem to decide what approach -- cloud or not -- is the best fit.


I saw this at RBI (part of reed Elsevier) I had to port a small system I built from linode to AWS and it was 10x the cost on AWS.


But it was on AWS, so it was enterprise.


Ah the old spend millions out sourcing to save hundreds of thousands of internal budget.


He implies that it's cheaper than, for example, buying HP+EMC+Oracle...

And then, not directly, roughly saying that scaling up on cheap commodity type hardware is only possible because of the efforts of Amazon/Google/Facebook.

He doesn't directly compare AWS versus, say Rackspace.


I'm not comparing it to Rackspace. I'm telling you I've run the numbers, and I can do it cheaper on prem with HP + EMC or Cisco + NetApp than I can in AWS. Unless I'm running compute at less than 50% capacity and storage at less than 60% capacity, it's far cheaper to do it on prem.


Well, netflix just migrated all their servers to amazon EC2. I'm sure it was cheaper for them.


And Spotify moved to Google Cloud. However, these are isolated cases that don't reflect on the average experience. For one, they most likely got a much, much better deal because of sheer volume. It still doesn't change the reality for companies with a handful, or even a few tens or hundreds of servers.


Netflix is also scaling up and down capacity based on time of day. Not to mention what they're paying isn't anything close to what you're going to pay. They negotiated pricing that would be cost effective for them, and I guarantee Amazon isn't making anywhere near the margin - they just want the brand recognition.


Or the finance director brought into the hype and there is some nasty costs lurking that netfix haven't thought about.


Interesting that it looks they are dumping 12v and going back to telecoms standard 48V :-)


And, as not an employee of a multi billion pound company, how can I get involved?!

I ask every time, and this project is amazing, but, it feels just for the big guys!


You can buy OCP compatible stuff, promote it, join the mailing list, blog about it, and so on. Otherwise, you can't. It's basically the club of kids who build this kind of hardware stuff (do R&D on infrastructure).

When Facebook started OCP it seemed like an awkward initiative, after all, who would manufacture OCP-compatible stuff, when you can't find anything OCP-compatible at all? (It needs custom racks, for shame! Madness!) But slowly, it turned out, that there are multiple groups working on custom designs, trying to get out of the world of half-assed firmwares, crazy sales calls and useless support vectors associated with traditional vendors.

So initially it only made sense if you had at least a few thousand servers and a team to work on that extra few percentages of efficiency. Nowadays, it seems a much more diverse club. And especially after the network stuff got opened up - the timing with SDN was right anyhow - they seem doomed to succeed.


The equipment is available from various vendors, but the biggest hurdle is finding a datacenter facility that can handle OCP gear. Retail colocation is pretty much 19" racks only and 120V/208V power only. Even small scale wholesale datacenter deployments are tough to do if you want to go whole hog on OCP specs.

You can still get OCP hardware in standard 19" form factors for these environments, but I don't personally see much point in it. You're seriously limiting your vendor and hardware selections, and if you aren't seeing the purported power efficiency gains then it's not worthwhile. The allure of eliminating the pain of classic vendors is still there, but there are paths through the Dell / HP / Supermicro maze that are tolerable without burning it down and going to OCP.

There are other challenges with OCP's overall design in standard facilities beyond the server cabinet level too. Their (optional) battery cabinet design -- where you save on the facility cost by omitting the UPS bank in favor of battery cabinets for every triplet rack -- is explicitly forbidden at basically every top tier datacenter provider that I'm aware of. No batteries allowed on the floor because of the fire suppression system. You can definitely get a provider to build you a room where they'll let you have batteries on the floor, but that is a multi-megawatt sort of problem, not a 50kw sort of problem.


Yes, you need at least a cage, so hundreds/thousands of servers minimum. And that's where power conversion gains might mean you get ~10-20 servers' electricity for "free".

But that means you need to basically run your own DC operations. Cables, batteries, monitoring, hot spares, and good luck finding a DC that even allows you to speak about putting fire hazardous stuff "there".

Yet it can be done. Money talks, hence the seemingly amazing success of OCP out of nothing.

... ah I should have read your last paragraph, before writing anything :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: