That probably won't solve all EFS performance issues, but it's a pretty big boost and a nice announcement to come alongside ECS support.
Yes, these containers are supposed to be stateless, but I was tasked with converting an app at my previous job over to using ECS on Fargate and we hit so many issues because of the limits on storage per container instance. We ended up having to tweak the heck out of nginx caching configurations and other processes that would generate any "on disk" files to get around the issues. Having EFS available would have made solving some of those problems so much easier.
I've also been wanting to use ECS on Fargate for running scheduled tasks with large files (50gb+) but it wasn't really possible given the previous 4gb limit on storage.
You got it backwards. NFS type services help containers be stateless because they are a separate service accessed through an interface where all the state is handled by a third party.
Thus by using a NFS-type service to store your local files, you are free to kill and respawn containers at will because their data is persisted elsewhere.
You're totally right, I was mixing up stateless and ephemeral. My mistake and thanks for pointing it out!
If you're wondering why you'd ever have to do something like that, the answer is SAP.
We evaluated it for a relatively simple use case, and the performance seemed abysmal, so we didn't select it. I'm hoping that we made a mistake in our evaluation protocol, which would give me an excuse to give it another try.
EFS is a great way to get a lot of iowait on your cpu graphs. Would not recommend it for anything that had to be fast.
There is a whole market of small companies that make high performance filers that do what many people want but they also have limits (high cost/byte).
Can you say more about what you did with EBS? It seems like it would be necessary to make some compromises in availability and disaster recovery because any given EBS volume is restricted to the availability zone where it was created.
i) You should configure the OPcache so that it does not revalidate its cache on every request. Cache validation uses stat() in a serial loop on potentially hundreds of files, where each stat() would add O(ms) to the request.
ii) We recommend you store log files locally. NFS does not define an atomic O_APPEND operation, so appends require a file lock to prevent interleaving with appends from other clients. I've seen PHP application do 100s of file locks to a log file per request, each adding O(ms) to the total request latency. This is what you'd like to avoid.
Should you run a database on EFS? No. Can you use it to back media files for a web application that are cached using a CDN, or for data files used for processing or temporary storage? Yes, it shines in those use cases... and it's cheaper than dedicating the time required to maintain your own NFS cluster.
Even Gluster or Ceph is, IMO, not worth the effort unless you (a) know how to run and maintain it, and (b) absolutely need the potential speed up that you can get, assuming a well-configured and well-maintained system.
And then, I've seen way too many people treat it like a traditional file system, and stick things on it that don't expect to find themselves on NFS, and wonder why they get corrupted files.
And, really, I tend to avoid the AWS services with "Burst Balances". It's painful to get a system running smoothly only to have it grind to a halt when you use it under load because some burst balance somewhere went to zero. Your mileage may vary, of course.
At least those are mostly OK, since in our case at least, the really EBS hungry clients now have volumes of 1024 GB or more, so the burst balance issues don't apply.
My advice, stick to EC2 + EBS, it works.