Full reproducibility isn't easy, there is a cost to it.
However the payoff is rather significant so if you can temper that cost a bit and make it less inconvenient to achieve then you have a winning solution.
I'm looking at using OCI at $DAY_JOB for model distribution for fleets of machines also so it's good to see it's getting some traction elsewhere.
OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.
If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.
Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.
That is exactly the feature we are using, right now you need to be on a beta release of containerd but before long it should be pretty widespread.
In combination with lazy pull (eStargz) it's a pretty compelling implementation.
Damn, that's handy. I now wonder how much trouble making a CSI driver that does this would be for backporting to the 1.2x clusters (since I don't think that kubernetes does backports for anything)
Not too hard. If you happen to be on CRI-O this has been implemented for a bit but if you are like us and on containerd then you need the new 2.1 beta release.
That does most of the heavy lifting, implementing a CSI driver that mounted these as PVs wouldn't be super hard I don't think and you could borrow liberally from the volume source implementation.
I've been pretty disappointed with eStargz performance, though... Do you have any numbers you can share? All over the internet people refer to numbers from 10 years ago, from workloads that don't seem realistic at all. In my experiments it didn't provide a significant enough speedup.
In our case some machines would need to access less than 1% of the image size but being able to have an image with the entire model weights as a single artifact is an important feature in and of itself. In our specific scenario even if eStargz would be slow by filesystem standards it's competing with network transfer anyway so if it's the same order of magnitude as rsync that will do.
I don't have any perf numbers I can share but I can say we see ~30% compression with eStargz which is already a small win atleast heh.
A correlary to the Gelman effect, with govt spending , it all sounds important and reasonably priced unless the spending is in your circle of expertise
Is there a reciprocal Gelman where ignorant outsiders assume things are unreasonable and wasteful but anyone with expertise knows better? (Examples come to mind of Sarah Palin ridiculing fruit fly genetics work or DOGE’s press conferences about children receiving social security when they were just receiving survivors benefits)
Well... unless it's for temporary projects, government should just create the capacity in-house. Having to bring in consulting for everything because there is no know-how left is pointless.
This right here tells you everything you need to know. Follow the money. Which companies pay the most in lobbying/trips and Venn diagram that with those who got cut and you won't find much overlap. I'm not sure why reporters aren't tapping into these stories that write themselves. Also Thiel is in tight with this administration
It's hard to track, because it's all nice websites with generic messages like "we make life better" and pictures with obligatory diversity faces. E.g. $4.5 billion here chemonics.com. Or $3 billion here devalt.org. Where did these money go? How can we measure the ROI?
USAID is a corrupt wasteful sinkhole, good riddance to it.
That was DOGE's job to tell us why and what, and they did none of that. Why is cutting millions to starving people in 3rd world countries a good thing? Why is cutting AIDs funding for similar situations a good thing? save millions now, pay billions later as these situations destabilize and make new terrorists and wars.
How many of these "millions" that are cut, is actual food for starving people in 3rd world? If it's that good, how much did they closer neighbors contribute?
USAID looking for an ROI???? THat's NOT the purpose of aiding other countries. You don't need to look for a return on an investment. USAid was feeding children in places. You're looking for an ROI on saving human lives. Learn empathy.
Empathy seems to be in rare supply these days. I've made the mistake of talking to these people about the human collateral of these decisions and it's waved off like I'm bringing up an unpleasant experience in a checkout line.
It's surprising to me that not more of them object to Radio Liberty and its siblings getting ratfucked, since those channels are both rather cheap and obviously in the interest of whatever regime has the White House.
Running Twitter bots can't compete with the deep reach and influence that was built up there over many decades.
Isn't saving lives and feeding children ROI? I think the parent comment was stating that these things were not measured and therefore the ROI should absolutely be known.
You should always measure the impact of the things you do and the money you spend, you should never just say "we're trying to do this and so the money should always flow". If you and trying to feed children you should be able to estimate the number of meals you provide and if that number is too low you should stop getting money.
I'm not commenting on USAID cuts in particular here, but your comment struck me as being rather strange.
Oh interesting, look it's our empathy beacon right here.
Have you ever tried to figure out where did these money go? People like you keep parroting "feeding children", but when I read actual list of orgs funded by USAID, it's all nice words and stock pictures with very little actual data.
Yes, I look for metrics like for that N million dollars M thousand kids were fed. Without that it looks like corrupt sink hole, that it actually is.
Learn empathy, send me $100 monthly, I promise to spend them for good.
Losing all instruments with no visibility is still ejectable even if he thought the engine was still running. He was disorientated and relying on his instruments, when flying under IFR (instrument flight rules) loss of instruments is tantamount to loss of control. The likely outcome in those situations is controlled flight into terrain at 350+mph.
With low altitude being an aggravating factor he was always 100% correct in ejecting and whatever the plane did afterwards is largely irrelevant.
Even if he held the glide without visibility (harder than it sounds) he would have had less than 60s before eating dirt (rate of descent was ~800ft/min).
My personal favourite solution is Bazel specifically because it can be so isolated from those layers.
No need for Docker (or Docker in Docker as many of these solutions end up requiring) or other exotic stuff, can produce OCI image artifacts with `rules_oci` directly.
By requiring so little of the runner you really don't care for runner features, you can then restrict your CI/CD runner selection to just reliability, cost, performance and ease of integration.
Atleast 2 of the fast growing companies I have been a part of have had serious layoffs that were very successful in cutting dead weight that had accumulated because of fast and loose hiring and poor middle management. So it can be done well.
By and large though they are a sign to jump from the soon to be sinking ship... so it's very important to know what kind it is and act accordingly.
We recently did a pretty big rollout of Tailscale and tbh I am presently surprised with how well it works. Between subnet routing to our bare metal stuff and the Kubernetes operator, especially the ability to expose services to the Tailnet has been a big win.
I was a doubter a bit as to how it would work at a bigger org but so far rock solid, easy to setup and great user experience.
However the payoff is rather significant so if you can temper that cost a bit and make it less inconvenient to achieve then you have a winning solution.
I have cooked this up based on Bazel, rules_oci and rules_distroless: https://github.com/josephglanville/images Specifically this file is a busybox based image with some utilities included from a Debian snapshot: https://github.com/josephglanville/images/blob/master/toolbo...
More difficult than Dockerfile? Sure. However better in pretty much every way otherwise including actual simplicity.
reply