Hacker News new | past | comments | ask | show | jobs | submit login
Amazon EC2 Elastic GPUs (amazon.com)
139 points by madmax108 on Nov 30, 2016 | hide | past | web | favorite | 33 comments

For people that don't have huge budgets for this stuff but need a GPU in production for some light work, I wanted to mention with a little bit of effort you can do this for a lot cheaper with gaming/mining GPUs and a 4U/midtower rack space. This also gets you a video card attached to an X interface for doing things like WebGL screenshots with SlimerJS (my use case). I'm not sure if the GPU compute stuff lets you do that, you usually need a display emulator to make X work with headless GPUs.

I'm not mentioning this to be smug - if you don't need massive GPU clusters on demand, the cost difference is substantial. I can build a GPU rig for $1,200 and it's going to cost $40/mo to host it. Compare that to the $500+ per month you're going to spend at a cloud provider.

I'm actually shipping one out today. It looks like this https://twitter.com/neocitiesweb/status/804065175045218304

Are there easy services out there that let you mail them a hardware server and they will host it in their rack? Never heard of doing this on a server-by-server basis... But I'm also not a DevOps guy :P

As someone who previously worked in an ISP / webhosting datacenter and now sees most younger devs as people who don't actually know how their stuff runs ("it's in the cloud"), this is an amusing comment.

"The cloud" is just marketing on top of shared hosting and virtual private servers. Those things have been around for decades. At the end of the day, your code is still executing on some poor Xeon out there, but maybe it's using some kernel features to restrict memory access and you deploy it using a fancy whale with a shipping container. We've had LXC, cgroups, jails, and xen for many years as well (LXC was 2008, Xen 2003). Solaris had zones in ~2004. We have SSDs, DDR4, 10GbE, and 32-core-per-socket Xeons now, so you don't have to really give two shits about crazy amounts of random reads / writes, storing state in 4 separate network-connected nosql systems with REST APIs, 4 layers of reverse proxies, 3 layers of virtualization, and 60-frame-deep call stacks as most stuff just works well enough with little thought put into the systems aspect of it. Despite all of this, Google Sheets is less responsive than the 1985 release of Excel, and takes longer in wall-clock time to enter in 100 rows of integers, then sum up a column. See also: https://en.m.wikipedia.org/wiki/Wirth's_law


The only thing which is really new is a generation of people who don't understand anything system-related deeper than the first abstraction layer.

I'm well familiar with shared & dedicated hosting, and with colocation as a concept - I guess I've just thought of colo as something only big companies with big budgets and lots of servers do. I like the idea of building a single rack server in my apartment for a project, and mailing it away never to worry about it again.

Good points re: abstraction of the deeper system layers though. I guess my position as a frontend developer is that every braincell I spend thinking about things like hardware and memory allocation is one less to spend thinking about the UX of my app. Abstracting layers away is a good thing - except when they leak, which is often...

"never to worry about it again" .. i guess this is why i stopped renting a cabinet at level 3 .. when a hard drive dies, who wants to go fix it?

Standard colocation. Used to be the thing all the kids did before the cloud.

This is really trippy. GPUs generally need a fast interface like PCI-E and won't work over a network like a disk.

So to elastically provide GPUs over a rack how does that work? How do you not have a ton of GPUs just sitting around due to the physical constraints of PCI-E given that you can attach GPUs to some common instance types at any time? How do you not run out of capacity and just have to say no?

You can deploy PCIe with switches over distance and keep the GPUs in an external enclosure:



*there are newer/better/faster versions of the above available, just two examples I had handy as my guess of how to make this happen. I'm confident that even if AWS is doing something similar it's being done on customized hardware.

As a gamer I was initially surprised that works. It makes sense if server side use cased are more batch oriented/compute intensive.

Latency to start and stop "jobs" is critical in gaming as you are trying to hit a 60-144hz job time.

The added latency of a PCIe switch and some cables is 1 us or less; it should be negligible for almost any workload.

GPUs in the cloud aren't targeted at gamers. They're targeted at people doing things like running render farms and training deep learning models.

What about companies interested in real-time streaming of 3D content? That use case is extremely sensitive to latency, probably more so than normal gaming is.

Fortunately the amount of additional latency introduced is likely to be negligible (another comment cites PCI-E switches incurring <= 1┬Ás).

The problem is that transferring data to/from GPUs is already problematic performance wise. I guess it depends on your application whether the data transfer rate hit is acceptable or not.

Looks like the use case that they have in mind is focused on CAD / CAM. Don't see much mention about deep learning and I would guess that isn't being targeted at all.

I'm in the process of moving our local graphics workstations to GRID-powered workstations on AWS, with Raspberry Pi 3 "heads" and 1080p 60fps streams.

The goal is to cut our capital costs, allow access to GPU-based graphics apps "anywhere", and host all of our virtual film production assets on AWS, where they are accessible for rendering, simulation, etc.

> How do you not run out of capacity and just have to say no?

I don't know the answer to your other questions but EC2 can and often does refuse our run instance requests because they don't have the resources available in the AZ we've requested.

One thing you can do is put 4-8 GPUs in each server and just try to bin pack different customers properly.

Or you could use PCIe switches and cables to dynamically reassign GPUs to different physical servers.

Some other sort of ultra-high bandwidth interconnect between instances such as infiband (or whatever, I don't know that space well)?

(This is speculation/questioning, not implying knowledge. I noticed the placement of the ? makes that a bit ambiguous)

PCIe is really the only option; the question is just whether it's in one box or cabled.

I was thinking that you could package the commands across some other wire to load on the GPU. As long as your dataset fits on GPU (so you don't have access main memory, which would be horrifically slow) seems like it might fit the GPGPU use case.

I agree that external PCIe is the obvious good/best solution. I didn't realize that external PCIe switch boxes existed.

That's only for graphics, for gpgpu people routinely downgrade the pcie to x1, etc

I was wondering this too. The best I could think is that they're trunking PCI-E over TCP or something? Or maybe all of their new instance types have something like Infiniband?

You can buy 10M PCIe cables commercially. In fact, some equipment use them for their backplane.

Scale. AWS is huge.

Does this enable remote render farms? That would be huge for independent animation studios.

Most render farms for animation studios are CPU based renderers. VFX companies have been using cloud rendering here and there for a few years now[1]. There has been chatter for 10+ years that GPU based rendering is on the cusp of changing everything (predating the cloud push). Anecdotally, small (1 or 2 man shops) have had luck with onsite GPU rendering when they have a lot of simple things to do. Larger places need more flexibility than GPU renderers have offered and places with large in-house render farms don't have any GPUs installed on them.

Although, recently, GPU rendering has gotten some traction in larger facilities[2]. Cloud rendering makes it easier to move towards these kinds of things becuase you don't have to commit to the hardware upfront. However, the big problem with cloud rendering at even modestly sized animation/vfx houses is transferring the terabytes to and from the cloud (the consenus is to leave it in the cloud or use a local cloud).

[1] http://www.postmagazine.com/Publications/Post-Magazine/2012/...

[2] https://www.redshift3d.com/blog/blizzards-overwatch-animated...

Exciting. Hopefully, the GPUs are made by Nvidia and support CUDA.

I'm curious what the pricing will be.

Yep. I use an Amazon EC2 Linux AMI with nVidia GRID GPU's using CUDA and OpenCV and it works like a charm. Of course it is headless so I tunnel X11 through ssh and use xv to look at the images I am rendering. Saves me from buying hardware. Costs me $0.68 an hour -- well worth it.

They don't mention any manufacturers and do not support CUDA, just OpenGL.

It's clear what they are doing: you call Amazon's OpenGL library, which applies some batching and compression when talking to a remote GPU somewhere else in the cluster. You are not allowed to or even need to know what kind of hardware is on the other side. They could even pick different manufacturers. And because of this proxying, you can only use an open and vendor-neutral API like GL, hence no CUDA support.

I read another comment on /r/machinelearning it's likely that this won't be for CUDA. They would have highlighted CUDA support instead of just showing OpenGL. Furthermore, they only showed Windows maybe Linux might not be available?

Does anyone have hardware recommendations for getting as much GPU performance as possible for $7500 or so(maybe with 8 or 12 GTX 1080/1070s)? And how does it compare with the K80s or P100s that AWS and Google are using?

Just looking at the spot pricing, I see that a p2.xlarge is $0.57 an hour, while a p2.8xlarge is $72/hour and a p2.16xlarge is $144/hour. That tells me there is extreme demand for heavy GPU instances, and a home cluster would be one way to insulate myself from that:


Also check out http://www.bitfusion.io (no affiliation)

Vulkan is definitely interesting, but OpenGL 4.4 or later on Linux is a must for me.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact