IMO this is the problem with all storage clusters that you run yourself, not just Ceph. Ultimately, keeping data alive through instance failures is just a lot of maintenance that needs to happen (even with automation).
Of course that is true but it undeniably helps. I wonder if postwar Britain watching its hegemony decline will be anything like the current decline of the US.
Honestly it is mostly caused by people from the machine learning end of the ecosystem not understanding how cooperative multitasking works and trying to bolt a web framework to their model.
That coupled with the relative immaturity of the python async ecosystem leads to lots of rough edges. Especially when they deploy these things into heavily abstracted cloud ecosystems like Kubernetes.
FastAPI also trys to help by making it "easy" to run sync code but that too is an abstraction that is not majorly documented and has limitations.
Everything you say here is true, but if you do an analysis and run benchmarks on non-toy projects you'll quickly find that async Python is a bad choice in virtually every use case. Even for use cases that are extremely IO bound and use almost no compute async python ends up dramatically increasing the variance in your response times and lowering your overall throughput. If you give me an async python solution, I will bet you whatever you want that I can reimplement it using threads, reduce LOC, make it far more debugabble and readable, make it far easier to reason about what the consequences of a given change are for response times, make it resistant to small, seemingly inconsequential code changes causing dramatic, disastrous consequences when deployed, etc etc etc. Plus you won't have a stupid coloring problem with two copies of every function. No more spending entire days trying to figure out why some callback never runs or why it fires twice. Async python is only for people who don't know what they are doing and therefore can't realize how bad their solution is performing, how much more effort they are spending building and debugging vs how much they should have to put into it, and how poorly it performs vs how it could perform.
The main problem with those is the memory controller starts to degrade a lot when you use all 128 of those cores.
We were doing some testing and > 96 cores at 100% caused a massive degradation in performance. We ended up going with dual 32C/64T Epycs (which cost twice as much) as a result. If they fix it in the Altra One chips they will be back on the table though because they were very good power wise for our workload and quite price competitive in a supermicro chassis.
It is bad in the sense we need to reach out for external libraries to avoid manually writing all the boilerplate with handling errors, and any async runtime works, as long as it is tokio.
Probably don't even need to work that hard. The Saudis got a bunch of nuclear secrets the first round so I am sure F35 info can be brought to Mar a lago.
If my systems guys are telling me the truth is it a real time sink to run and can require an awful lot of babysitting at times.
reply