
Frugal Computing - yarapavan
http://muratbuffalo.blogspot.com/2019/10/frugal-computing.html
======
chickenpotpie
I've been wondering why none of the serverless providers have offered the
ability to get better pricing if I let them control when my timed functions
run. For example, I tell them that these functions are daily/monthly/whatever
tasks and I don't care when they run as long as they run at least once during
that time period. They can run my functions when server usage is low. They get
better compute utilization and save money (hopefully passing some of those
savings onto me).

~~~
bbgm
Like most things, this sounds good in practice but fails in reality. There are
many AWS services that started with this question (Spot instances for one).
But what people want to do with them is always something else when push comes
to shove.

I've always felt there were a lot of background tasks that just need to be
done but have no time sensitivity that could fit this model, but in reality it
hasn't ever been something you can generalize to a point that it's worth
building it out. Still haven't given up on the idea though :).

------
twotwotwo
In a cloud context, something like AWS spot instances or GCE preemptible
instances is a neat tool for this specific niche of minimizing cost when you
can accept delays.

The context: providers need more instances than will sell at list price, so
they sell access substantial discounts if you accept that a customer paying
full fare can bump you out. (Note instance shutdowns might be correlated, not
random.) I've definitely heard of folks running tools like Spark that way.
From the software angle, being able to efficiently use various types of
machine and being able to minimize work lost when a worker dies (i.e. to
checkpoint) can help there.

I've heard of analogous setups for companies that own their boxes, e.g.
Google's Borg paper talks about the tricks they do to make production and non-
production jobs share the same box effectively.

I think sometimes cost optimization will lead you to counterintuitive answers.
For example, obviously the post is right that memory is pricier than disk is.
(Also, external memory algorithms are cool; more people should read about
them.) But when box time itself isn't free, and more RAM or faster storage
allows you to use less of it, the pricier computing resources can end up
paying for themselves. Like a cost version of the race to idle in power
management.

And the (good) discussion of the costs of going distributed may lead you to
want to have big enough boxes available to allow you to distribute fewer jobs,
even when those boxes cost (more-than-linearly) more than smaller ones.

I'm not saying this to advocate for a particular strategy; more want to say
the right approach can depend in super tricky ways on the specific workloads
and constraints you have.

------
gumby
There’s a good reference in a comment to the original post about read, write,
and space amplification by choosing the wrong index structure for the data
set: [http://smalldatum.blogspot.com/2019/05/crum-conjecture-
read-...](http://smalldatum.blogspot.com/2019/05/crum-conjecture-read-write-
space-and.html)

------
crb002
With proper error correction codes and perhaps the upfront cost of a prebuilt
DAG you should only have the rework of failed nodes.

------
theamk
I think if you want slow and frugal, you stop using cloud. Cloud's advantages
don't matter that much for batch jobs with long wait time.

Need more machines to handle unusual load? Just let the queue grow, if the
thing are still bad next week, we'll order a few more nodes.

Machine went down? We can wait until tomorrow when sysadmins can troubleshoot.

~~~
rhinoceraptor
It's almost as if there are economic advantages to owning your own hardware...

