Hacker News new | past | comments | ask | show | jobs | submit | thatguysaguy's comments login

That is quite the claim, any source?

Edit: I guess I'm not sure on whether large training runs count as prod or not. They're certainly expensive and mission critical.


[flagged]


I have worked on a team which trains foundation models


And no one’s used anything other than Python? The core of Tensorflow uses C++


Many parts of tensorflow required Python- at least when I worked there a few years ago, it was nearly impossible to compile XLA into a saved model and execute it from pure C++ code.


The claim was that Python is only used for prototyping at Google, not that people are writing the AI frameworks themselves in pure Python.

JAX is obviously implemented in C++, but the scientists running the training runs which cost millions of dollars are writing lots of python.



Not sure on the specific combination, but since everything in Jax is functionally pure it's generally really easy to compose libraries. E.g. I've written code which embedded a flax model inside a haiku model without much effort.


Sounds like these guys didn't use custom kernels, but BitNet did.


That's correct. Only the dequantization is done on CUDA, the matmul is done with Pytorch. If they put their kernels open-source we could re-use them!


To think of non-evil versions just consider cases where right now there's no voice actor to replace, but you could add a voice. E.g. indie games.


I'm 100% going to clone my voice and use it on my discord bot.


Completely agree. He ends up sounding kinda amateurish but that's only because (unlike most other podcasters) he's willing to ask questions deep inside the domain of the interviewee.

(Amateurish w.r.t. the domain, not as an interviewer I mean)


At the end of the day all the arrays are 1 dimensional and thinking of them as 2 dimensional is just an indexing convenience. A matrix multiply is a bunch of vector dot products in a row. Higher tensor contractions can be built out of lower-dimensional ones, so I don't think it's really fair to say the hardware doesn't support it.


Well, in the transformer forward pass there are a bunch of 4-dimensional arrays being used.


Came in to say this.

The Einsum notation makes it desirable to formulate your model/layer as multi-dimensional arrays connected by (loosely) named axes, without worrying too much about breaking it down to primitives yourself. Once you get used to it, the terseness is liberating.


Welllll, there seems to at least be some mathematical cheating in that this is representing a non-well-founded set. (Declare each pixel to be the set of subpixels at the next level down defining it, this forms an infinite descending chain).


Uninmportant, but if you're citing Moore's paper I feel like you're just trying to pad out the references to make it look like you're serious


At a high level it is the right answer to the data center electricity demand problem. Which is that we need to make AI hardware more efficient.

Pragmatically, it doesn't make much sense given that it would take years for this approach to have any real work use cases in a best case scenario. It seems way more likey that efficiency gains in digital chips will happen first making these chips less economically valuable.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: