I don't really agree, at least from a longer term perspective. It's early days y...

sillysaurusx · on April 23, 2021

Mr/mrs anonymous HN person, please put some info in your profile. You clearly have some deep knowledge of TPUs that I didn’t expect to pop up offhandedly on HN. You’re correct on all counts: dynamic tensor shapes are more or less impossible with XLA, making it more or less impossible to train a model with arbitrary image size inputs, even though the math would allow for that; the pytorch XLA work on TPUs is indeed kind of shitty, and I’m surprised as heck that literally anyone said this except me; and XLA as an IR is promising for portability. Now I’m curious what you’ve been doing to have experienced these things, since there didn’t seem to be many others who have (or at least, who are vocal about it).

I agree with you, but I think we differ on our timetables. I am bearish for the next two years, at which point I’ll awaken from my slumber and become a flaming bull. (It helps to remember that “we overestimate the impact of years, but underestimate the impact of decades.” I try to plan accordingly.)

In other words, if you’re bullish that two years from now we’ll start seeing portability implemented in the field across various HPC chips, then we fully agree. But that’s also a glacial pace; GPT-2 changed the world almost two years ago now, and DALL-E seems to be the next frontier for doing interesting generative work. So, we’ll split the difference and say that the bears and bulls will meet in two years for a deep learning hackathon. As a bonus, the pandemic will be over by then, so it can be an in-person meetup.

solidasparagus · on April 25, 2021

Ah I see - I think we're pretty much on the same page in terms of timetables. Although if you include TPU, I think it's fair to say that custom accelerators are already a moderate success.

Updated my profile. I've been working on DL training platforms and distributed training benchmarking for a bit so I've gotten a nice view into the GPU/TPU battle.

Shameless plug: you should check out the open-source training platform we are building, Determined[1]. One of the goals is to take our hard-earned expertise on training infrastructure and build a tool where people don't need to have that infrastructure expertise. We don't support TPUs, partially because a lack of demand/TPU availability, and partially because our PyTorch TPU experiments were so unimpressive.

[1] GH: https://github.com/determined-ai/determined, Slack: https://join.slack.com/t/determined-community/shared_invite/...

nl · on April 23, 2021

Agreed: https://www.reddit.com/r/MachineLearning/comments/mqzl0c/d_l...

byronyi · on April 23, 2021

+1 please put some info in your profile.

sillysaurusx · on April 23, 2021

I'm dying to know what they work on, either officially or in their spare time. https://news.ycombinator.com/item?id=26586151

cs702 · on April 23, 2021

> from a longer term perspective

Yes.

Longer term, new hardware will also make it practical to train large models in a fully parallelized, fully distributed manner -- i.e., without having to backpropagate gradients, which requires a lot of complex bookkeeping and plumbing for distributed training.

Recent progress suggests this will happen. See, for example:

https://arxiv.org/abs/2006.04182

https://arxiv.org/abs/2103.03725

https://arxiv.org/abs/2010.01047

I for one am excited to see what happens over the next decade as it becomes trivial to train/use models with 1K, 1M, or 1B times more dense connections than present state-of-the-art models.