Hacker News new | past | comments | ask | show | jobs | submit login

Julia has some pretty swell cross-GPU packages. I was really hoping that it would catch on in the ML community, but I think we're past that point: the inferior solution has more momentum.



We recently showed DiffEqGPU.jl generating customized ODE solver kernels for NVIDIA CUDA, AMD GPUs, Intel OneAPI, and Apple Metal, where for CUDA it matches state of the art (MPGOS) which is about 10x-100x something like Jax/PyTorch (where the performance difference comes from inefficiencies of using vmap vs actually writing and calling a kernel). It's all in https://arxiv.org/abs/2304.06835. So this stuff exists and people are using it. Of course the caveat here is this is the context of engineering applications so someone would need to do similar for LLMs to fully relate back to the article, but it shows the tools are ready to a large extent for someone to step up in the ML space.


Are there examples to use it for SGD using this? Like "Here is a tutorial on how to do a nanoGPT using DiffEqGPU.jl"?


There is an example of using this with gradient-based optimization here: https://docs.sciml.ai/SciMLSensitivity/dev/tutorials/data_pa....

As an ODE solver, you wouldn't do nanoGPT with it though, you'd need to go back to KernelAbstractions and write a nanoGPT based on that same abstraction layer. Again, this is a demonstration of the cross-GPU tools for ODEs, but for LLMs you'd need to take these tools and implement an LLM.


Thanks!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: