The author says Sutherland was dissuaded by having to specify a spot price upfront on AWS but I don't see how that is any different than what Google is doing.
Google Spot Instances (preemptibles) are 80% off and that's it. It's simple.
In AWS you simply tell how much you're willing to pay to keep the instance uninterrupted. If there's someone who is willing to pay more, they will get the instance from you and yours will be shut down.
By comparison, if you are able to create a preemptible VM you are guaranteed to pay .2x retail and then likely (but not guaranteed) to have 24h to do your work.
tl;dr: A maximum bid of .2x retail isn't equivalent to preemptible. There's almost certainly a bid for a given instance for a given runtime that would result in the same price, but it varies over time, space, runtime and instances shape.
Disclosure: I work on Google Cloud (and launched preemptible VMs)
In my several years experience using GCE it is indeed extremely likely. Two years ago it wasn't very likely, but it became very likely about a year ago. Incidentally, I'm the guy that convinced Drew Sutherland of this article to use GCE in the first place :-).
I think AWS might be cheaper in this regard but less predictable, so it is a tradeoff.
I haven't looked yet into "Spot Fleets".
Then it is definitely interesting :)
That said, there is still a "market" here: your work vs Google's work. It just shows up as a private floating probability instead of a public floating price.
Most of the time you get the full 24 maximum window.
Disclosure: I work on Google Cloud (and launched Preemptible VMs).
Economically, because the price is fixed and everyone has the same status, wouldn't that have the effect of oversubscription if that price is always below amazon spot price? Are you eg legally prevented from using a spot-like bid? I'd imagine spot would balance supply and demand better.
If they can sustain a lower price, picking a fixed discount gives them a marketing edge, even if AWS would come out lower overall. Most humans will pay a premium for certainty -- the certainty effect or "taxicab effect".
The distinction, as you've surmised, is that predictability is awfully useful. This 220k run (and Drew's quick 400k run last Sunday) had a predictable price. I doubt that such a run on Spot would have left the market price untouched ;).
That said, we absolutely have excess capacity that maybe having an auction clearing mechanism would fill. But I don't think it's worth the customer pain. Moreover, even Drew has come to us from Spot, suggesting that simplicity at a usually-fair/good price can take market share away (and reminder this is a fast growing market!).
The basis of budgeting and capacity planning is that everybody who participates in a free market secretly wishes they didn't have to.
if I were to launch an instance of Kubernetes (via container engine) with such VMs, would it have an issue? (as in, if I had more load, more preemptible vms would be provisioned automatically based on a previously set max amount and if a vm was removed, another would be assigned based on my minimum amount)
am I wrong in thinking that this is a much better option than running dedicated vms?
The Langlands program is mathematicians version of "string theory". And what this guy is doing with google cloud is comparable to the LHC: searching for new particles (of mathematics) in order to find the deeper unity therein.
Maybe, but one does not necessarily follow from the other. Consider the task of compiling 1 million separate C++ projects. That is obviously embarrassingly parallel, but not well suited for a GPU. It's trivial to do many compilations at once, but compiling itself is not easy to parallelize.
That example is obviously contrived, but I think it demonstrates the principle that it's the computational profile of the core problem that will determine if you can use a GPU. If the core problem requires 10s of GB of RAM, or it's excessively branchy code, it may not be well suited for a GPU.
1. SIMD (parallel lines)
2. Fork-Join (a directed acyclic graph of operations)
3. Message-Passing (a graph of operations)
GPUs are great at SIMD, but bad at the other sorts of parallelism.
On Nvidia GPUs, 16 to 32 warps per SM x 60 SMs on a P100 gives a lot of hardware threads (1 thread == 1 warp) in flight at once; these are allowed to branch completely independent of each other (I forget the maximum occupancy of a P100's SM in warps at lowest resource use). Furthermore, you can use global memory atomics and spin-locks for event driven programming, work-stealing, etc. This kind of stuff is used in, e.g., persistent kernels. Of course, the single kernel that is being run must handle all of the code for all of the tasks. Not easy to write, but possible.
At the level of a single node, TensorFlow uses Eigen . Eigen is like BLAS, but it's a C++ template library rather than Fortran. It compiles to various flavors of SIMD. Nvidia's proprietary CUDA is the SIMD flavor most commonly used by TensorFlow programs.
At the level of multiple nodes, TensorFlow derives a program graph from your Python source code, using high level "ops", in the style of NumPy. Then it distributes the ops across a cluster using a scheduler:
Quote: Its dataflow scheduler, which is the component that chooses the next node to execute, uses the same basic algorithm as Dryad, Flume, CIEL, and Spark. 
Python is the "control plane" and not the "data plane" -- it describes the logic and dataflow of the program, but doesn't touch the actual data. When you use NumPy, the C code and BLAS code are the data plane. When you use TensorFlow, the Eigen and GRPC/protobuf distribution layer are the data plane.
So you can have a big data dataflow system WITHOUT SIMD, like the four systems mentioned in the quote. And you can have SIMD without dataflow, i.e. if you are doing it in pure Eigen or procedural/functional R/Matlab/Julia on a single machine. Languages like R and Julia may have dataflow extensions, but they're single-threaded/procedural by default as far as I know.
A mathematical way to think of the DAG model is where you program uses a partial order on computations rather than a total order (the procedural model) -- this is what give gives you parallelism.
So TensorFlow uses both SIMD and dataflow.
(I work on a dataflow language and system.)
Put together that by using Preemptible VMs (and yes, apologies we still don't offer GPUs as preemptible) it's economically rational to use spare CPUs.
Disclosure: I work on Google Cloud.
I guess since it's discrete math it probably doesn't use Fortran/BLAS?
Disclosure: I work on Google Cloud, launched Preemptible (and approved Drew's quota requests!)
How big a house would be needed for such an array of 'cores'? How much electricity is required, and again can that be compared to aeroplanes, Teslas or even toasters?
At 100W per CPU that's 22kW, which feels low; about a tenth of a Tesla and a fraction of a plane. That doesn't account for the cooling you'll need!
(You can probably do a lot better, but that's my piano tuners estimate)
That makes about 30kW. At $0.15 / kWh we're talking about $4.5 per hour for the electricity.
Other costs dwarf the energy costs.
This is more in line with what I would have guessed for 220k cores.
737-700 burns approximately 750 Gallons per hour at approximately $2.50/Gallon for Jet-A1 purchased wholesale. $1875/Hr for the fuel.
First, preemptible VMs don't preempt each other. So if you got shot it wasn't Drew directly.
However, when we're full and someone needs to get shot that can be you or it could be someone else. Drew being there actually makes it more likely that he would take the heat. But Drew runs all out well off peak (weekend mornings like our docs encourage!) so unless you had a bad weekend it wasn't him :).
Disclosure: I work on Google Cloud (and have gone back and forth with Drew over email).
At worst the revelation here is that someone at google thinks the customer might be using what they've publicly said they used before. And that instance type is the largest high-cpu version available that's not in beta.
I fully expect that Drew will add a simple override map to say "use 64 threads here, 48 here, 32 here and so on" but with a goal to minimize preemptions.