
Designing a scalable API on AWS spot instances - iwitaly
https://blog.adapty.io/designing-scalable-api-on-aws-stop-instance/
======
arecurrence
There's an excellent implementation using AWS lambda to manage spot instances
at
[https://github.com/AutoSpotting/AutoSpotting](https://github.com/AutoSpotting/AutoSpotting)

What's fantastic about the autospotting implementation is:

1\. It dynamically replaces existing instances with spot instances just by
setting a tag

2\. Rather than replacing the instance with a fixed spot instance type, it
will choose the cheapest that fits the requirements.

3\. If there are no spot instances available that fit the requirements, it
will spin up on demand instances until spot instances are available.

If you have experience working with spot then you will know that these are
really outstanding features that hopefully amazon will bake-in in the future.

------
ocdnix
Turns out this is about EC2 spot instances for ECS. How would it compare to
ECS Fargate spot these days?

I'm also missing a discussion about designing for interruption, either by not
keeping state, or by being able to shed state quickly, to be picked up by
other instances.

Also, if you set up EC2 spot with a launch template or ASG with very
differently-sized instance types (to reduce risk of running out), is there a
way to even out the load coming through an ALB? The least-connections
scheduling can help in some cases, but a connection might not map 1:1 to one
unit of load. The ALB can use weighted balancing, but on the target group
level. Dunno how easy it would be to allocate different instance sizes to
different target groups and weigh them accordingly.

~~~
ollyculverhouse
AFAIK with Fargate a lot of this is handled for you, as long as you have the
auto scaling group.

We have this setup with two capacity providers (FARGATE_SPOT and FARGATE) with
a 75/25% split, meaning that even if there are no spot instances available we
will still be up.

The benefit of Fargate being that we don't need to care if certain instance
sizes are not available as that is handled by AWS.

~~~
l33tman
Cool, when fargate launched they didn't have a spot possibility (AFAIK) and
since we run ECS on Spot instances it would just be a massive increase in cost
to switch to FG, but if it now can use underlying spot instances, it might be
worth looking at again..

~~~
Epixors
Yeah spot capacity providers for Fargate only got added a few months ago, been
running well for us in production.

------
jakozaur
For service type workloads (e.g API service with 99.99% uptime SLA) we keep
comparing on-demand vs. spot.

In reality, you would like to compare Reserved Instances as you can get 60%
discount.

So in us-east-1:

\- spot costs: 35-43% of on-demand

\- RI 1 year standard: 60% of on-demand

\- RI 3 year convertible: 46% of on-demand

So if you have some base load that you can commit to running for 3 years, the
price gets often at spot range while not having to worry about losing
capacity.

In-reality sometimes combining reserved for some base capacity 60% + 40% spot
for spiky seems to be the winning combination for many companies.

~~~
kpotehin
Good point, thanks! We're looking at saving plans, probably will use them too.
But when you're doing prototypes it's pretty much impossible to commit to 1
year of usage, let alone 3:)

~~~
zerubeus
You can still reserve classic instance for one year and when you don't use it
anymore you can sell it in the market

------
WatchDog
Seems like amazon has long been transitioning spot instances away from being a
method of efficiently utilizing excess capacity, towards being a discounted
service for less risk-averse businesses(businesses that can accept the risk of
their service being terminated at any time).

Even if in practice AWS never sees large spikes in compute demand and
corresponding large scale instance preemption, most businesses I've worked
with won't accept the risk of having OLTP systems be taken down at any time.

No longer does spot seem to be a service where one can get a bargain for their
compute intensive offline/batch workloads that are much more tolerant of
preemption.

Given that the spot prices seem to be very flat, and preemption is rare,
amazon presumably have a fair bit of underutilized capacity, does anyone know
if amazon uses this capacity themselves, or offers more aggressive spot
pricing to select clients?

------
MrPowers
Great article. Cutting ec2 costs is important, especially for companies with
heavy data engineering / data science workflows. Spark service providers make
it easy to spin up huge clusters (100+ nodes) to perform ad hoc analyses. The
costs can quickly spiral out of control, even if you're getting 3x cost
savings on the spot market.

Some tangential thoughts:

* Is there an AWS API that returns the cheapest availability zone in a region for a given instance type? Or is the GUI that's screenshot in the blog the only way to see?

* I have seen the 90%+ cost savings for certain instance types

* Sometimes you lose a spot instance, look at the pricing history graph to confirm the price spike, and don't see any spike that was above your bid price... it can be frustrating

~~~
luhn
I'm not aware of any API (surprising for AWS), but you use Spot Fleet to get
an array of spot instances optimized for cost.

Having a bid above spot price does not guarantee you'll keep the spot
instance. AWS can terminate a spot instance at any time if they need the
capacity—That's the deal. It used to be more closely tied your bid price, but
they've been moving away from that.

------
makkesk8
"Our backend system is built on AWS. Today I’m going to tell you how we had
cut costs"

This is a recurring topic here on HN and it boggles me and makes me wonder if
people know that there are other platforms than aws, azure and google cloud
out there that are very capable and much much cheaper.

Unless any of the big 3 has a feature or certification you need I don't see
any reason to use them at all due to the insane complexity and cost.

So why do you or your company who uses any of the big 3 use them if you had to
cut cost at some point?

~~~
CSDude
Always this poor argument pops up. 1st AWS and big 3 is much more reliable.
2nd AWS is not only a VPS provider. Changing vendors is not the answer to all
cost related problems.

~~~
gridlockd
> 1st AWS and big 3 is much more reliable.

I think that's a myth. People assume it's true because it _should_ be true. I
don't think it is true.

> 2nd AWS is not only a VPS provider.

It's a _glorified_ VPS provider. Most of the stuff doesn't matter to most of
the people using AWS, but they go for it so they can put it on their resume
and because they don't want to get fired for choosing something that's not a
big name.

~~~
jungturk
EC2 might be a glorified VPS provider, but you seem to be ignoring the vast
array of managed services in modern cloud providers (or unaware of their
utility beyond padding resumes).

Load balancing, fault tolerance, high availability, arbitrary scale,
messaging, orchestration, autoscaling, warehousing, big data processing,
identity management, desktop management, secrets management, container
registries, source code management, build tools, hardware test suites, gpu
hardware, observability tools...

Those of use that use cloud providers know full well why we use them (and
certainly know when not to).

~~~
gridlockd
If you _really_ need any of that stuff you probably shouldn't use Amazon's
managed version of it.

~~~
rospaya
Why? I run both on prem and cloud workloads (on more than one provider) so I'm
wondering what's wrong with Amazon's managed services?

------
sunilkumarc
Very well written article!

If someone is interested in learning all the AWS concepts, here's an awesome
e-book which is written by the legend Daniel Vassallo himself.

[https://gumroad.com/a/238777459/MsVlG](https://gumroad.com/a/238777459/MsVlG)

------
ditansu
Very useful article! waiting a terraform best practice

------
lyalu
interesting article!

