Amazon Elastic File System on Kubernetes

dlthrowaway1 · on Feb 14, 2018

EFS is terrible, the latency and throughput issues makes it useless for anything involving scientific computing/data processing. I could not disagree more, using it conjunction with GPU's makes zero sense. S3 and GCP/GS on the other hand are far better. Tensorflow "natively" support S3 and GS, i.e. if you path begins with "s3://" or "gs://" it will transparently pull from S3/GS, When used with conjunction with tf.dataset it allows you to have stateless training/processing. Tensorflow also support s3/gs for storing trained models.

Recommending EFS is a great way of signaling your ignorance about I/O.

nemothekid · on Feb 14, 2018

Like others have mentioned, EFS has it's own surprises. To get decent performance, you will see a lot of recommendations to store at least 1TiB on EFS to get decent performance (so if your working set is 100MiB, store, you need 999GiB of garbage), meaning you will spend $300/mo for capacity you don't need.

notyourwork · on Feb 14, 2018

Can you elaborate on why storing more data would fix your issues? I assume its because it moves you to a different tier but the claim seems strange for those unfamiliar with EFS.

nemothekid · on Feb 14, 2018

You are right - the amount of throughput you get is based on how much data you store. Whats unintuitive about EFS is you get an "allotment" of throughput. Lets say you have 100MiB of data, you would get 100 MiB of throughput (example numbers). So you read that 100MiB of data, really quickly, and then the next time you read it, EFS is as slow as molasses.

So lets say you are using EFS for application deploys. Your application might only be 50MiB, but if you are doing multiple deploys a day, you might find that your deploy time shoots through the rough after 20 deploys. You find out that you have used up your "EFS" credits and the only way to get more is to store more data on EFS. I believe the first tier is at 100GiB.

longwave · on Feb 14, 2018

Throughput is proportional to total data size - 50Mb/sec per Tb - and burst capacity above this is severely restricted until you have at least 1Tb of data.

karimfan · on Feb 14, 2018

You should try Qumulo: https://qumulo.com/evaluate/download/

Fast, scalable and comes with tons of file enterprise features... And you don't have to get performance by putting storing more data (aka fill in your cluster with garbage data that you pay for!)

Daviey · on Feb 15, 2018

I recently heard that some people are using Ceph on AWS, as the performance can be better than EBS, let alone EFS.

empath75 · on Feb 14, 2018

Efs is such a tire fire that I would not recommend this in production. Usually when you get to the point where efs sounds like a good idea, you’ve probably gone wrong somewhere.

khc · on Feb 14, 2018

"But wait, I rather use S3. No, S3 could neither be an alternative to an NFS nor a replacement for EFS. S3 is not a file system. This smells like a cloud lockin’ to me - not really, Pipeline/Kubernetes can use minio to unlock you. Another post …"

But minio is basically S3, so this doesn't make sense at all