Hacker News new | past | comments | ask | show | jobs | submit | jszymborski's comments login

I wonder how close to something like 3FS you can get by mounting SeaweedFS with S3FS, which mounts using FUSE.

https://github.com/s3fs-fuse/s3fs-fuse


I'd estimate that there would be two orders of magnitude of difference in 4K random IOPS. If not three.

I'm assuming in favour of the DeepSeek FS?

Always a huge pleasure when Oona posts something. Her posts are the sort of magic you get when a genuinely curious person has the competence to satisfy and explore those idle curiosities. Glad she's still going strong after all these years.

Leaving music streaming services has been a great excuse for me to rediscover music blogs like Gorilla vs. Bear and Stereogum, or even local culture magazines.

Another great way for discovering music I've found is just perusing Bandcamp, which is where I buy most of my music anyway. Love finding local artists, so I just put in some genre filters and the location filter. Found multiple great bands this way.

As for keeping abreast of new releases, Bandcamp is pretty good for that too. You can just follow artists and you get emails when new releases or merch or tours come around.


That was mind-blowing

I actually saw a talk by Gabrielle Corso at a workshop belt at Google's Toronto offices this week. Very cool to see this posted here.

I primarily use Brave Search. I'll yap with Claude for ideation every so often though.

I wonder what the accessibility implications are.

I don't usually make these comments, but between the references to Dr. Dre and Napoleon Dynamite gifs, this article feels like it was written in 2012...

I am the author of this blog post.

Napoleon Dynamite is one of my favorite movies. Same with Billy Madison and Austin Powers so that's what came to mind, they are classics.

I figured I had to put something in there so i wouldn't bore people to tears as it can get pretty dry pretty quickly talking about register allocation and low-level Linux kernel primitives.

Nonetheless, I hope you at least enjoyed it.


People familiar with exotic RNNs and improvements to LSTMs know this problem all too well. The moment your lstm isnt a bog standard lstm, it loses all the speed-ups from cuDNN and it becomes borderline unusable for anything but toy models.


These would be inherently temporary problems though right? If it became eventually clear that alternate methods were the way forward, NVDIA would be highly motivated to do the optimization work wouldn't they? Any new step functions that can forestall the asymptotic plateauing of AI progress are things they desperately need.


That follows reason, but in practice I find that its often not the case. My suspicion is that it's hard to establish that your method is superior to another if, for example, it takes 10-100x the compute to train a model. This is largely in part due to the fact that machine learning is currently a deeply empirical field.

Nvidia isn't likely to start releasing updated firmware for an obscure architecture for which there is limited evidence of improvement, and even less adoption.


Indeed. Especially when a lot of papers are just using cherry-picked results that show some improvements just so they can publish something, but their method doesn't work that well when it comes in contact with reality (e.g. see the deluge of papers which claim to have come up with an optimizer better than AdamW), and when the majority of people are not even properly benchmarking their new methods wrt to the time overhead (no, it doesn't matter if your method achieves 1% better loss if it takes 10% longer to train, because if I'd trained for 10% longer without your method I'd get an even better loss; and don't even get me started on people not tuning their baselines).

I've been burnt way too many times by fancy new methods that claimed improvement, where I spent a ton of effort to implement them, and they ended up being poop.

Every person working in the field and pushing papers should read this blog post and apply what's written in it: https://kellerjordan.github.io/posts/muon/#discussion-solvin...


Yep. Offline RL is especially full of these types of papers too. The sheer number of alternatives to the KL divergence to prevent the offline distribution from diverging too far from the collected data distribution... There's probably one method for each person on earth.


Check out The hardware lottery [1], which drove a lot of discussion a few years ago.

[1]: https://arxiv.org/abs/2009.06489


The map of the Montreal metro (which is in fairness _way_ smaller) that you are most likely to see is also diagrammatic [0], but every station has a much more detailed geographic map, which includes bus stops etc... [1]

I think having both around is a good balance.

[0] https://www.stm.info/sites/default/files/media/Stminfo/image...

[1] https://www.stm.info/sites/default/files/media/Stminfo/image...


This was Vignelli's original plan, which the MTA never followed through on.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: