Hacker News new | past | comments | ask | show | jobs | submit login

So I've created ~300k ec2 instances with SadServers and my experience was that starting an ec2 VM from stopped took ~30 seconds and creating one from AMI took ~50 seconds.

Recently I decided to actually look at boot times since I store in the db when the servers are requested and when they become ready and it turns out for me it's really bi-modal; some take about 15-20s and many take about 80s, see graph https://x.com/sadservers_com/status/1782081065672118367

Pretty baffled by this (same region, same pretty much everything), any idea why?. Definitively going to try this trick in the article.




My guess is probably related to AWS Spot capacity.

The second and third spikes at 80 and 140 seconds lines up nicely with this kind of behavior.

The second spike would be optimised workloads that can respond to spot interruption in under 60 seconds.

The third spike would be Spot workloads that are being force-terminated.

The reason it's falling on those bounds is because of whatever is trying to schedule your workload only re-checks for free capacity once a minute.

I used to be able to spin up spot instances and basically never get interruptions. They'd stay on for weeks/months.

In my experience, it used to be fairly safe to have Spot instances for most workloads. You'd almost never get Spot interruptions. Now, some regions and instance types are difficult to run Spot instances at all.


Thanks, pot capacity being scheduled differently would explain the behavior.

Almost all my ec2 instances are spot, and actually I can compare the distribution with the on-demand ones.

My spot instances are very short lived (15-30 mins max) and AFAIK I've never seen a spot instance force-terminated (this would be hard to find I think).


When I say "force-terminated" I mean when you don't voluntarily shut down in response to a SpotInterruption event.

When the event is sent, they give you two minutes to shut down.

If you either don't subscribe to the events, or don't shut down fast enough, they kill the instance.


Perhaps in one case you are getting a slice of a machine that is already running, versus AWS powering up a machine that was offline and getting a slice of that one?


Yes, some internal (AWS operation) explanation like the one you suggest makes sense.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: