I think the most depressing thing is how unsurprising this is.
This is why free trials require credit cards upfront, as they're more difficult to fake, not because you're about to be stealth billed. It's thanks to people like this.
it's practically trivial to bypass this if you really want to. CapitalOne in the US allows you to have virtual cards that can be verified but you can delete and block at any time for free if you have a credit card from them. I'm sure the practice discourages casuals from gaming trials, but it just feels like it's making life miserable for paying customers but doing almost nothing to stop bad actors
If you also ban virtual and pre-paid cards it cuts this to almost zero.
There is a difference, this rocket company is not really going to generate a new virtual card every time? You think their business bank account even supports that?
They are detectable only if the issuer has a dedicated BIN for virtual cards. If they issue in the same BIN as your regular card, there's no way to detect without issuer cooperation, which would defeat the point.
Outside of the big clouds just buying a 1 Year lease (say) on a dedicated server is so cheap that you'd not be saving much vs spot instances and with spot instances you need code to manage this and you're introducing risk of slowdowns. Probably not worth the trade off.
To illustrate a 128GB ram 20 core server with a 10Gbps NIC and some small SSD storage is probably going to cost you <$2000 USD for a years rental.
They've got usage that plummets 80% 2 days a week and the other 5 have a broad predictable time based pattern where usage drops ~66% judging by graph.
If that works out to same prices as keeping compute at literally your peak requirement level round the clock then something is very wrong somewhere. Maybe that issue is not in-house at blacksmith - perhaps spot pricing is a joke...but something there doesn't check out.
Loads of companies do scaling with much less predictable patterns.
>risk of slowdowns
Yeah you do probably want the scaling to be super conservative...but -80% fluctuation is a comically large gap to not actively scale
>To illustrate
Better view I'd say is: That chart looks like ~4.5 peak. So you're paying for 730 hours of peak capacity and using all of it about 90 hrs.
Given that they wrote a blog about this topic they probably have a good reason for doing it this way. Just doesn't really make sense to me based on given info
No, I'm not. I'm looking at the 5 day average across fleet graph right at the bottom. That shows very roughly 2/3 drop from peak to lowest, while the 80% is from the text as fleet wide.
>whereas renting the server is about half that per hour.
If you're at capacity only 90 out of 730 a month then paying 2x for spot to cover those peaks is a slam dunk
Ceph would be a theoretical option, but a) we don't have a lot of experience with it and b) it's relatively complex to operate. We'd really love to add a lighter option to our stack that's under the stewardship of a foundation.
Try expanding a cluster, or changing erasure coding configuration, or using anything that needs random access within a file (parquet), or any day 2 operation.
Even some basic s3 storage patterns weren’t considered when the core storage scheme was designed. Lacks an index and depends on filesystem to organize objects and then crumbles to lock contention when too many versions are stored or under walkdir calls when anything is listed. It also can’t even support writing to the same set of keys as S3 should allow since it implicitly depends on underlying filesystem paths.
They might have added an index by now but gatekept it to their enterprise AIStor offering since they’ve abandoned any investment in open source at this point or appearance that they care about that. Their initial inclination in response to this issue says everything - https://github.com/minio/minio/issues/20845#issuecomment-259...
Guessing you’re referring to minio not ceph? Have they still not figured out how to do day 2? I mainly avoid them because of their license and the way they interpret it
They are not efficient; they have a one-time static hash to create a cluster. After that, it is all duct tape and glue. Want to expand? Add another cluster (pool) and then look for the cluster that contains the object. They don't know which cluster has the object, and performance does not scale as well with additional clusters. Want to decommission a single node, drain the cluster. They refer to multiple pools as a single cluster, but it is essentially a set of static hashes that lack the intelligence to locate objects. Got the initial EC configuration not quite right.. sorry need to redo the entire cluster.
MinIO is a good fit if you want a small cluster that doesn't require day 2 operational complexity, as you only store a few TBs.
I have not looked into them recently, but I doubt the core has changed.
Being VC-funded and looking for an exit makes them overinvest in marketing and story telling.
Genuinely most "AI" DCs are spending less than 9KW on cooling for every 100KW of servers. If you were that bothered about getting that to zero, you could literally sink them into the ocean, build a heat network so the town can take the heat for free or use any of a dozen more established and practical ways to do that.
It's a bit demoralizing how many suggestions in this thread would have significant environmental effects beyond what large scale AI training already has.
I'm talking about the above proposals (albiet hypothetical) to either cover a pole of our planet in solar and other ocean based proposals--not solar in general.
It's a bit demoralizing people talk about AI training as if it were even 1/100th the environmental impact of the personal automobile or frequent airplane trips
This is why free trials require credit cards upfront, as they're more difficult to fake, not because you're about to be stealth billed. It's thanks to people like this.
reply