
Amazon Elastic File System on Kubernetes - matyix
https://banzaicloud.com/blog/aws_provision_efs/
======
dlthrowaway1
EFS is terrible, the latency and throughput issues makes it useless for
anything involving scientific computing/data processing. I could not disagree
more, using it conjunction with GPU's makes zero sense. S3 and GCP/GS on the
other hand are far better. Tensorflow "natively" support S3 and GS, i.e. if
you path begins with "s3://" or "gs://" it will transparently pull from S3/GS,
When used with conjunction with tf.dataset it allows you to have stateless
training/processing. Tensorflow also support s3/gs for storing trained models.

Recommending EFS is a great way of signaling your ignorance about I/O.

------
nemothekid
Like others have mentioned, EFS has it's own surprises. To get decent
performance, you will see a lot of recommendations to store at least 1TiB on
EFS to get decent performance (so if your working set is 100MiB, store, you
need 999GiB of garbage), meaning you will spend $300/mo for capacity you don't
need.

~~~
notyourwork
Can you elaborate on why storing more data would fix your issues? I assume its
because it moves you to a different tier but the claim seems strange for those
unfamiliar with EFS.

~~~
nemothekid
You are right - the amount of throughput you get is based on how much data you
store. Whats unintuitive about EFS is you get an "allotment" of throughput.
Lets say you have 100MiB of data, you would get 100 MiB of throughput (example
numbers). So you read that 100MiB of data, really quickly, and then the next
time you read it, EFS is as slow as molasses.

So lets say you are using EFS for application deploys. Your application might
only be 50MiB, but if you are doing multiple deploys a day, you might find
that your deploy time shoots through the rough after 20 deploys. You find out
that you have used up your "EFS" credits and the only way to get more is to
store more data on EFS. I believe the first tier is at 100GiB.

------
karimfan
You should try Qumulo:
[https://qumulo.com/evaluate/download/](https://qumulo.com/evaluate/download/)

Fast, scalable and comes with tons of file enterprise features... And you
don't have to get performance by putting storing more data (aka fill in your
cluster with garbage data that you pay for!)

------
Daviey
I recently heard that some people are using Ceph on AWS, as the performance
can be better than _EBS_ , let alone EFS.

------
empath75
Efs is such a tire fire that I would not recommend this in production. Usually
when you get to the point where efs sounds like a good idea, you’ve probably
gone wrong somewhere.

------
khc
"But wait, I rather use S3. No, S3 could neither be an alternative to an NFS
nor a replacement for EFS. S3 is not a file system. This smells like a cloud
lockin’ to me - not really, Pipeline/Kubernetes can use minio to unlock you.
Another post …"

But minio is basically S3, so this doesn't make sense at all

