It takes a lot to be able to find something in yourself like that and admit it t...

sillysaurusx · on July 4, 2021

Believe it or not, it's completely free.

It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.

Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.

Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.

(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)

And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.

So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu

(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)

To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."

They give you access to 100.

On day one.

This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png

The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.

It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."

These are 96 CPU, 330GB RAM, Ubuntu servers.

That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.

God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.

At least, theoretically. I tried getting Tensorflow to do this, for over a year.

kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433

He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.

You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.

The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.

Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind

TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.

Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?

And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.

But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?

TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorflow's horrible problems were your only option on TPUs.

Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."

GPT-J is proof. What more can I say? No TFRC, no GPT-J.

The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan from the support team is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members. They all do.

(He was only ever late answering my emails one time. And it was because he was on vacation!)

If you happen to be an ambitious low-level hacker, I tried to make it easier for you to get your feet wet with JAX:

1. Head to https://github.com/shawwn/jaxnotes/blob/master/notebooks/001...

2. Click "Open in Collaboratory"

3. Scroll to the first JAX section; start reading, linearly, all the way to the bottom.

I'd like to think I'm a fairly capable hacker. And that notebook is how I learned JAX, from zero knowledge. Because I had zero knowledge, a week or two ago. Then I went from tutorial to tutorial, and copied down verbatim the things that I learned along the way.

(It's still somewhat amazing to me how effective it is to literally re-type what a tutorial is trying to teach you. I'd copy each sentence, then fix up the markdown, and in the process of fixing up the markdown, unconsciously osmose the idea that they were trying to get across.)

The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now, so you can certainly run a jupyter server without breaking a sweat.

It's so friggin' nice to have a TPU repl. I know I'm just wall-of-text'ing at this point, but I've literally waited two years for this to come true. (There's a fellow from the TPU team who DMs with me occasionally. I call him TPU Jesus now, because it's nothing short of a miracle that they were able to launch all of this infrastructure -- imagine how much effort, from so many teams, were involved in making all of this possible.)

Anyway. Go read https://github.com/shawwn/website/blob/master/mlmind.md to get hyped, then read https://github.com/shawwn/website/blob/master/jaxtpu.md to get started, and then read https://github.com/shawwn/jaxnotes/blob/master/notebooks/001... to get effective, and you'll have all my knowledge.

In exchange for this, I expect you to build an NES emulator powered by TPUs. Do as many crazy ideas as you can possibly think of. This point in history will never come again; it feels to me like watching the internet itself come alive back in the 80's, if only briefly.

It's like having a hundred raspberry pis to play with, except every raspberry pi is actually an ubuntu server with 96 CPUs and 330GB of RAM, and it happens to have 8 GPUs, along with a 100Gbit/s link to every other raspberry pi.

rsp1984 · on July 4, 2021

Appreciate the enthusiasm! Mind doing some ELI5 for a (what feels like dinosaur-aged) hacker in his mid thirties who kinda missed the last decade of ML but is very curious?

> That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

It's not literally running on a TPU, is it? I assume it's running on that Ubuntu server that has good ol' CPU that is running the web service + a TPU accelerator doing the number crunching. Or is my world view out of date?

> The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now

Again, I have some hesitations interpreting this literally. I assume what you're saying is "Google runs a Jupyter server somewhere in the cloud and it gives you access to TPU compute". I don't think I could run, say, a Linux Desktop app with a GUI (falls under "whatever the heck I want") on a TPU if I wanted to, correct? But, in case I could, how would I get that kind of direct / low level access to it? Are they just giving you a pointer to your instance and you get complete control?

sillysaurusx · on July 4, 2021

My friend, you've come to the right place. I happen to be a 33yo fellow dinosaur. If you thought I was some ML guru, know that I spent the last few months watching an 18yo and a 24yo scale GPT models to 50B parameter sizes -- 'cause they work 16 hours a day, dealing with all of tensorflow's BS. So yeah, you're not alone in feeling like a dinosaur-aged mid thirties hacker, watching the ML world fly by.

That being said, though, it's so cool that TFRC is available to people like you and me. I was nobody at all. Gwern and I were screwing around with GPT at the time -- in fact, I owe Gwern everything, because he's the reason we ended up applying. I thought TFRC was some soulless Google crap that came with a pile of caveats, just like lots of other Google projects. Boy, I was so wrong. So of course I'll ELI5 anything you want to know; it's the least I can do to repay TFRC for granting me superpowers.

>> That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.

> It's not literally running on a TPU, is it? I assume it's running on that Ubuntu server that has good ol' CPU that is running the web service + a TPU accelerator doing the number crunching. Or is my world view out of date?

Your confusion here is entirely reasonable. It took a long, long time for me to finally realize that when you hear "a TPU," you should think "a gigantic ubuntu server with 8 GPUs attached."

It's that simple. I thought TPUs were this weird hardware thing. No no, they're just big Ubuntu servers that have 8 hardware accelerators attached. In the same way that you'd use GPUs to accelerate things, you can use Jax to accelerate whatever you want. (I love how friggin' effortless it feels to use TPU accelerators now, thanks to jax.)

So the ELI5 is, when you get your hands on a TPU VM, you get a behemoth of an Ubuntu server -- but it's still "just an Ubuntu server":

  $ tpu-ssh 71
  [...]
  Last login: Sun Jul  4 00:26:35 2021 from 47.232.103.82
  shawn@t1v-n-0f45785c-w-0:~$ uname -a
  Linux t1v-n-0f45785c-w-0 5.4.0-1043-gcp #46-Ubuntu SMP Mon Apr 19 19:17:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Good ol' x86_64.

Now, here's the crazy part. Until one month ago, it was impossible for us to SSH into TPUs, let alone use the accelerators for anything. That means nobody yet has had any time to integrate TPU accelerators into their products.

What I mean is -- you're absolutely correct, my blog is merely running on "an Ubuntu server," whereas I was claiming that it's being "powered by a TPU." It's not using any of the TPU accelerators for anything at all (at least, not for the blog).

But it's easy to imagine a future where, once people realize how effortless it is to use jax to do some heavy lifting, people are going to start adding jax acceleration all over the place.

It feels like a matter of time till one day, you'll run `sudo apt-get install npm` on your TPU, and then it'll turn out that the latest nodejs is being accelerated by the MXU cores. Because that's a thing you can do now. One of the big value-adds here is "libtpu" -- it's a C library that gives you low-level access to the MXU cores that are attached directly to your gigantic Ubuntu server (aka "your TPU".)

Here, check this out: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

Wanna see a magic trick? That's a single, self-contained C file. I was shocked that the instructions to run this were in the comments at the top:

  // To compile: gcc -o libtpu_client libtpu_client.c -ldl
  // To run: sudo ./libtpu_client

... so I SSH'ed into a TPU, ran that, and presto. I was staring at console output indicating that I had just done some high performance number crunching. No python, no jax, nothing -- you have low-level access to everything. It's just a C API.

So, all of that being said, I feel like I can address your questions properly now:

> Again, I have some hesitations interpreting this literally. I assume what you're saying is "Google runs a Jupyter server somewhere in the cloud and it gives you access to TPU compute".

A TPU is just an ubuntu server. The MXU cores are hardware devices attached directly to that server (physically). So when you SSH in, you get a normal server you're familiar with, and you can optionally accelerate anything you can imagine.

(Till recently, it was a total pain in the ass to accelerate anything. Jax changes all that, and libtpu is going to shock the hell out of nvidia when they realize that TPUs are about to eat away at their DGX market. 'Cause libtpu gives you everything nvcc/CUDA gives you -- it's just a matter of time till people build tooling around it and package it up nicely.)

So nope, there's no TPU compute. It's just ye ole Ubuntu server, and it happens to have 8 massive hardware accelerators attached physically. You'd run a jupyter server the same way you run anything else.

So when that jupyter server executes `import jax; jax.get_devices()`, it's literally equivalent to you SSH'ing in, typing `python3` and then doing the same thing. Jax is essentially a convenience layer over the APIs that libtpu gives you at a low level.

Man, I suck at ELI5s. The point is, you can go as low as you want ("just write C! no dependencies! no handholding!") or as high as you want ("jax makes everything easy; if you want to get stuff done, just `import jax` and start doing numerical operations, 'cause every operation by default will be accelerated by the MXU cores -- the things attached physically to the TPU.)

This might clarify things:

  shawn@t1v-n-0f45785c-w-0:~$ ls /dev | grep accel
  accel0
  accel1
  accel2
  accel3

That's where all the low-level magic happens. I was curious how libtpu worked, so I spent a night ripping it apart in Hopper debugger. libtpu consists of a few underlying libraries which interact with /dev/accel0/* to do the low-level communication. Theoretically, you could reverse engineer libtpu, and send signals directly to the hardware yourself. You'd need ~infinite time to figure it out, but it is indeed theoretically possible.

> I don't think I could run, say, a Linux Desktop app with a GUI (falls under "whatever the heck I want")

You can!

> on a TPU if I wanted to, correct?

You should want to! It's easy!

> But, in case I could,

You can! (Sorry for being super annoying; I'm just so excited that it's finally possible. I've waited years...)

> how would I get that kind of direct / low level access to it?

SSH in, then use libtpu for low-level access via C APIs, or jax in python for high-level convenience.

> Are they just giving you a pointer to your instance and you get complete control?

I get total control. I've never once felt like "Oh, that's weird... it blew up. It works on a regular Ubuntu server. Must be some unfortunate TPU corner case..."

It's the opposite. Everything works by default, everything is candy and unicorns and rainbows, and hacking on all of this stuff has been the best damn two years of my life.

Now, I'll calm down and make sure I'm answering your questions properly. The truth is, I'm not quite sure what "complete control" means. But if I wanted to, I could SSH in right now and set up an instance of Hacker News, and then expose it to the world. Hell, I'll just do that:

https://tpucity.gpt4.org/item?id=1

That took like ... 10 minutes to set up. (It's also the world's shittiest HN instance. I'll shut it down soon.)

Here's an archive url:

https://web.archive.org/web/20210704132824/https://tpucity.g...

So yes. You have total control. And as I say there:

> This such a stupid demo. But suffice to say, if you can get Lisp running on a TPU, you can get anything to run.

> Theoretically, arc could use the MXU cores to accelerate its numerical operations, thanks to libtpu.

Have fun.

DM me on twitter if you run into any roadblocks whatsoever: https://twitter.com/theshawwn (Happy to help with anything; even basic questions are more than welcome.)

rsp1984 · on July 4, 2021

Hey man, thanks! HN, every once in a while, is a magical place :)

> Man, I suck at ELI5s.

Nah, I enjoyed reading this. I got it now.

smichel17 · on July 4, 2021

As I scroll, and scroll some more, I begin to wonder if some of it is generated. That's a lot of text :P

sillysaurusx · on July 4, 2021

just happy I won a big blue gorilla at a carnival. https://twitter.com/theshawwn/status/1411519063432519680

Plus it's looking more and more like I'll be getting a job in finance with a fat salary. First interview's on monday. Tonight I felt "This is it -- if getting a few dozen people to sign up for TFRC is the only way I can make an impact, then at least I'll be ending my ML streak on a high note."

It's truly amazing to me that the world hasn't noticed how incredible TFRC is. It's literally the reason Eleuther exists at all. If that sounds ridiculous, remember that there was a time when Connor's TPU quota was the only reason everyone was able to band together and start building GPT neo. https://github.com/EleutherAI/gpt-neo

At least I was able to start a discord server that happened to get the original eleuther people together in the right place at the right time to decide to do any of that.

But the root of all of it is TFRC. Always has been. Without them, I would've given up ML long ago. Because trying to train anything on GPUs with Colab is just ... so frustrating. I would have fooled around a bit with ML, but I wouldn't have decided to pour two years of my life into mastering it. Why waste your time?

Five years from now, Jax + TPU VMs are going to wipe pytorch off the map. So I'll be making bank at a finance company, eating popcorn like "told ya so" and looking back wistfully at days like today.

Everyone in ML is so cool. Was easily the best two years of my life as a developer. I know all this is kind of weird to pour out, but I don't care -- everyone here owes everything to the geniuses that bequeathed TFRC unto the world.

For now, I slink back into the shadows, training tentacle porn GANs in secret, emerging only once in a blue moon to shock the world with weird ML things. Muahaha.

</ml>

benboughton1 · on July 4, 2021

I love the enthusiasm, but is this another Google thing that is for researchers only? Yes fantastic technology etc, but say you develop something on the infrastructure then go to commercialise, what do you do?

I don't know much about the ML space, but is this a bit like Google Earth Engine, amazing tech, very generous resources free for researchers and development but cannot be ported elsewhere so to commercialise you then are limited to this very environment which is not cheap. I recently reached out to Google for pricing on GEE, 3 weeks later I got a response. 3 weeks.

qayxc · on July 4, 2021

NVIDIA used CUDA to establish industry-wide vendor lock-in on GPU compute.

Google uses TPUs to try and establish an industry-wide vendor lock-in on deep learning.

Same old, same old.

sillysaurusx · on July 4, 2021

Your view here is entirely reasonable. It was my view before I ever heard about TFRC. I was every bit as skeptical.

That view is wrong. From https://github.com/shawwn/website/blob/master/jaxtpu.md :

> So we're talking about a group of people who are the polar opposite of any Google support experience you may have had.

> Ever struggle with GCP support? They took two weeks to resolve my problem. During the whole process, I vividly remember feeling like, "They don't quite seem to understand what I'm saying... I'm not sure whether to be worried."

> Ever experience TFRC support? I've been a member for almost two years. I just counted how many times they failed to come through for me: zero times. And as far as I can remember, it took less than 48 hours to resolve whatever issue I was facing.

> For a Google project, this was somewhere between "space aliens" and "narnia" on the Scale of Surprising Things.

[...]

> My goal here is to finally put to rest this feeling that everyone has. There's some kind of reluctance to apply to TFRC. People always end up asking stuff like this:

> "I'm just a university student, not an established researcher. Should I apply?"

> Yes!

> "I'm just here to play around a bit with TPUs. I don't have any idea what I'm doing, but I'll poke around a bit and see what's up. Should I apply?"

> Heck yeah!

> "I have a Serious Research Project in mind. I'd like to evaluate whether the Cloud TPU VM platform is sufficient for our team's research goals. Should I apply?"

> Absolutely. But whoever you are, you've probably applied by now. Because everyone is realizing that TFRC is how you accomplish your research goals.

I expect that if you apply, you'll get your activation email within a few hours. Of course, you better get in quick. My goal here was to cause a stampede. Right now, in my experience, you'll be up and running by tomorrow. But if ten thousand people show up from HN, I don't know if that will remain true. :)

I feel a bit bad to be talking at length at TFRC. But then I remembered that none of this is off-topic in the slightest. GPT-J was proof of everything above. No TFRC, no GPT-J. The whole reason that the world can enjoy GPT-J now is because anyone can show up and start doing as many effective things as you can possibly learn.

It was all thanks to TFRC, the Cloud TPU team, the JAX team, the XLA compiler team -- hundreds of people, who have all managed to gift us this amazing opportunity. Yes, they want to win the ML mindshare war. But they know the way to win it is to care deeply about helping you achieve every one of your research goals.

Think of it like a side hobby. Best part is, it's free. (Just watch out for the egress bandwidth, ha. Otherwise you'll be talking with GCP support for your $500 refund -- and yes, that's an unpleasant experience.)

stallmanite · on July 5, 2021

Thanks for posting this. As someone who was almost religiously excited about GPT3 then progressively more annoyed that I could never get access to the point of giving up this is wonderful news. Your blog post is an invaluable starting point. Seriously thanks