it doesn't scale to what? 405B LLMs? no probably not. but i have plots that show not unreasonable solve times for CNNs with ~30k tensors (~10 minutes) using gurobi.
> There's no 1-1 mapping between tensors and allocations. Buffer re-use is important.
yes i'm aware... that's why I made the comment above that you can't pull this trick in e.g., PyTorch.
> 3. There's room for offline DSA instances as well. IREE's Stream dialect has a related transform (LayoutSlices.cpp).
yes again, i'm aware - that's why I made the comment above that IREE is the only place you could pull this trick. LayoutSlices is one place but hooking the allocator in the HAL is simpler if you don't want to fight IREE's various transformations that happen after that.
> DSA is much more generic than deep learning compilers.
yes that's I posted the OR literature first...
> I can't wait to graduate myself and never hear the involved keywords again.
it doesn't scale to what? 405B LLMs? no probably not. but i have plots that show not unreasonable solve times for CNNs with ~30k tensors (~10 minutes) using gurobi.
> There's no 1-1 mapping between tensors and allocations. Buffer re-use is important.
yes i'm aware... that's why I made the comment above that you can't pull this trick in e.g., PyTorch.
> 3. There's room for offline DSA instances as well. IREE's Stream dialect has a related transform (LayoutSlices.cpp).
yes again, i'm aware - that's why I made the comment above that IREE is the only place you could pull this trick. LayoutSlices is one place but hooking the allocator in the HAL is simpler if you don't want to fight IREE's various transformations that happen after that.
> DSA is much more generic than deep learning compilers.
yes that's I posted the OR literature first...
> I can't wait to graduate myself and never hear the involved keywords again.
amen