lol, you should see me bash my own code. I'm even more mean. https://github.com/...

tysam_and · on Aug 11, 2023

Well, you do seem to be extremely equal and fair on that aspect, so my respect to you for that. Don't be too hard on yourself, please!

I'll keep an eye out on that, anyone working on it can shoot me a message if any snag or questions about particulars come up. Weirdly enough my email is best.

If you're looking for biggest glaring performance the edge over PyTorch, I'd probably note that the MaxPooling is probably where to go, the PT version is extremely slow for some reason, and done properly it should be a simple indexing operation that's fusable in either direction.

If whoever fulfills the bounty can beat me to me writing a custom mega-fused kernel with max pooling, the convs, activation, etc, then y'all have a pretty good shot at taking the WR crown.