Amazon Elastic File System

chuckcode · on April 9, 2015

I wish AWS would be a little more technical in their product descriptions and announcements. If there was ever an audience that wants technical data about how this type of technology will scale and compares to existing technology like glusterfs it would be the AWS users. Instead the performance specifications are ".. and is designed to provide the throughput, IOPS, and low latency needed for a broad range of workloads." We know that NFS to a lot of nodes is hard, show us that this scales better or am I the testing team?

jeffbarr · on April 9, 2015

We'll be providing a lot more details later.

click170 · on April 9, 2015

Fair enough, but one of the biggest reasons I ignore these types of posts is because too often they lack these critical details, they come off as puff pieces to me. IMO these posts would hold value for a larger audience if that info was included in them.

vacri · on April 9, 2015

It's a upcoming product announcement page from a vendor asking for preview registration, how can it be anything other than a puff piece?

jsprogrammer · on April 9, 2015

The trick is, based on the title and other marketing tricks, one doesn't find out it's an actual puff piece until one has already spent time reading at least a portion of the puff. By then, you just feel used.

antaviana · on April 10, 2015

There's a business side for it. Besides recruiting future customers via a preview, you also sublimilaly put off would-be customers decisions until your solution is ready to compete with existing solutions. Microsoft did it very successfully for years.

mhuffman · on April 9, 2015

I am a huge AWS fan, and user. Look at my accounts for the sums I have spent over the years. I lover new AWS services. But, this is my main complaint with AWS -- don't announce until you have usable numbers and workable examples for early adopters!

Terretta · on April 10, 2015

On the contrary, the earlier AWS can let you know they're working on a general kind of thing, the better you can plan your own roadmap.

chuckcode · on April 9, 2015

I can understand that writing up technical details can be difficult as things are constantly improving and also as AWS has to be careful about what they claim. Perhaps a reasonable compromise would be to team up with some 3rd parties to let them try it, optimize a bit with techs from AWS and write up their experiences. Similar to how Apple does the marketing for the iPhone but then lets other people review it. Not perfect but perhaps something that would help the community understand and deploy these technologies better.

brryant · on April 9, 2015

Would love some throughput and other performance numbers!

hariramshankar · on April 9, 2015

Yea and when is that? If a product is announced, even if its a preview version, shouldn't the details be sorted out by then? This is like a 10000ft view of a product, a great product at that. But numbers to back your claims of scalability and performance would have been best.

stevewilhelm · on April 10, 2015

Jeff, does EFS have some kind of "backing store" to S3? EFS -> S3 -> Glacier sounds really interesting.

jonahx · on April 9, 2015

As a developer (but not a sysadmin) I feel the opposite. Not saying your criticism is wrong, just giving another view from a AWS newbie perspective.

When I read most AWS product descriptions, I cannot understand what real world situations they are for. I either could not say, or could only vaguely say, what the product is in my own words. There seems to be a great scaffolding of assumed knowledge about the AWS system and distributed computing. Maybe that's intended, but it's intimidating and keeps me in simpler places like Heroku land.

seanwilson · on April 10, 2015

I feel similarly about AWS and I've been using it for a while. AWS is very powerful and flexible but the number of options and different services can be bewildering. I've suggested AWS to several developers I know that have never used it before and they were turned off from not being able to understand what was being offered and how much it would cost them.

For example, compare the Heroku pricing page and the AWS pricing page for EC2:

https://www.heroku.com/pricing

http://aws.amazon.com/ec2/pricing/

The Heroku one is significantly easier to understand in my opinion. I understand AWS has more options and is therefore harder to summarise though.

sciurus · on April 10, 2015

I think this just illustrates your point, but EC2 and Heroku Dynos are two different types of services (IAAS vs PAAS). The appropriate comparisons are AWS Elastic Beanstalk to Heroku Dynos and AWS RDS to Heroku Postgres.

https://aws.amazon.com/elasticbeanstalk/

https://aws.amazon.com/rds/

seanwilson · on April 10, 2015

Yes, a link to Elastic Beanstalk would be a fairer comparison of what is being offered by Heroku but my point was most AWS pages are like that and aren't easy to follow, especially if you're new to AWS.

derefr · on April 10, 2015

The "you" in the comparison changes as well, though. Indeed, the copy for (say) EC2 isn't comprehensible to the same people that the copy for Elastic Beanstock is. But different people will be reading the copy for those two things. The person AWS is selling EC2 to isn't a newbie developer wanting to run their app; it's someone wanting to build out something like Heroku: someone with lots of system-architecture experience.

Or, to put it more simply: the intended audience for AWS ad copy is exactly the set of people who are currently trying to compare some specific AWS service to similar services of competitors. AWS is never trying to "create a need" for something nobody was already asking for; they're just trying to serve the needs people already do have better/cheaper/more flexibly.

amelius · on April 10, 2015

Why not have both? E.g. a simple description at the top of the page, and a more advanced discussion below.

chuckcode · on April 10, 2015

I completely agree that the AWS descriptions and documentations are written with the style that actual best practices are left as an exercise for the reader. Without much nurturing the community is left to develop solutions like boto-rsync for moving files into S3 and glusterfs to overcome a 1Tb limit on drives. I'm not sure if that is intentional or a side effect of something else but it certainly turns off a lot of newbies for whom it seems overwhelming and a lot of technical folks as it is difficult to get a grasp on exactly what to use and how in AWS.

goldenkey · on April 10, 2015

heroku uses ec2 under the hood. heroku is heresy for any competent programmer. despite aws being packaged services at least both managed db and ec2 show you your actual servers and let you choose specs.

ceejayoz · on April 10, 2015

Heroku is used by a wide variety of competent programmers. Not everyone wants or needs to choose specs or see actual servers.

goldenkey · on April 11, 2015

Willful ignorance?

smcleod · on April 9, 2015

Until you see technical details on any file, network or storage system you should always assume your writes (or data for that matter) are/is not safe. It's disappointing to see a product launched and hyped without the appropriate details required to make an informed decision around its use.

tracker1 · on April 9, 2015

I think that you are correct, that some technical details are definitely needed for comparison... To me, this matches up against Azure Files (which is CIFS/SAMBA based)...

Though, even if redundancy isn't a factor, it's nice that you can have network file shares without having to run your own dedicated instance in a given cloud. There are plenty of situations where having a common networked filesystem makes sense across a few servers for purposes of sharing some information, without it being physically on all of them... seeding static content, or user-uploaded files for example. It makes a given solution simpler to implement initially (though other considerations may take hold as a site/application grows).

Typically Azure seems to perform better in terms of storage i/o over AWS. Though, if you are disk constrained in ways you can't reasonable scale horizontally, then you may be better off with something using local disks, and your own backup strategy on something cheaper (Linode DigitalOcean, Joyent, etc).

sciurus · on April 10, 2015

Aren't seeding static content and user-uploaded files both use-cases that are well-handled by an object store like S3?

btgeekboy · on April 10, 2015

Assuming your application was written for something like S3, yes. That's a big assumption though.

lchengify · on April 9, 2015

This is really a powerful product, and it shows the wisdom and work in Amazon's product/market research.

Outside of the world of startups and young companies who "grew up" in the world of cloud-based solutions, there is a large ecosystem of more traditional enterprises who still have a lot on-premise computing.

These companies have a lot of lock in: Racks of physical on-site servers, Sharepoint-based access control, custom hardware and clusters for everything from large file storage to compliance metadata, and custom software built around this infrastructure.

Of those, one of the biggest lock-in dependencies I've seen is NFS. Not just because NFS one of the oldest protocols, but because of the nature of the NFS abstraction. Fundamentally, software that assumes a filesystem is shared, globally mounted, and read/write is very hard to adapt to a cloud solution. Many times, it requires re-writing the software, or coming up with a NFS shim (such as a FUSE solution) that is so underperforming it blocks usage.

If AWS implements this correctly, this could provide the cost/performance balance to potentially move such a solution completely to AWS. This would eliminate not just large amounts of physical overhead for these companies, but the productivity costs that come with the downtime that inevitably occurs when you don't have good redundancy.

These companies (and the industries they comprise) are trying to find out how best to leverage Amazon. Recently, even more conservative industries, such as Law, are becoming more aware of AWS and other cloud-based solutions. Lets hope that solutions like this, that bridge the old with the new, can empower that transition so we can all feel better about how our software is managed.

notacoward · on April 9, 2015

As a GlusterFS developer, and furthermore the founder of a project to create a "cloud" version of GlusterFS aimed at exactly this use case, this is pretty darn interesting to me. I guess I'm supposed to pick away at all the feature differences between EFS and GlusterFS-on-EC2, or something like that, but for now I'm more pleased to see that this use case is finally being addressed and the solution seems well integrated with other Amazon features. Kudos to the EFS team.

tristanz · on April 9, 2015

I'd love to see an easy to manage version for containerized cloud stacks like Kubernetes and Mesos. I think the AWS move here validates that NFS is still an ok pattern to use for some applications.

I've had mixed experiences with Gluster a year ago, including lost files, so something that was rock solid and easy to manage would be a great product.

smarterclayton · on April 9, 2015

Yeah, that's something the Ceph and Gluster teams have been working on to integrate with Kubernetes seamlessly (or at least, easily). The gluster core drivers for volumes landed recently, and self service service on demand FS is a goal.

ipedrazas · on April 10, 2015

Is it not the same that uses OpenStack? I can see a lot of potential with containers + efs

DenisM · on April 9, 2015

Salient points:

- NFS (v4).

- Supports petabyte-scale file systems, thousands of concurrent NFS connections.

- Automatically grows/shrinks in size.

- Multi-zone storage and access.

- $0.30 / (Gigabyte * month).

- (Not mentioned) Both Linux and Windows have built-in NFS clients.

e40 · on April 9, 2015

For comparison, EBS is $0.10 / GB / Month for SSDs and $0.05 / GB / Month for regular disks.

hrez · on April 10, 2015

For comparison, keep in mind that EBS is $0.10/Gb per provisioned storage, which is often over-provisioned. Versus EFS $0.30/Gb per utilized storage.

kevinbowman · on April 9, 2015

But this is EBS + redundant NFS-mountable server, which is at least worth 2 EC2 instances as well as 2 x the EBS cost, which makes the equivalent ~$100 per month (m3.mediums) plus $0.20/GB/month for SSDs.

616c · on April 9, 2015

> - (Not mentioned) Both Linux and Windows have built-in NFS clients.

Does anyone use the NFS for Windows? Is it reliable enough I can move servers away from CIFS alogether if I want to access a Linux box from Windows. For instance if I click the folder's Properties in Windows, will I see the proper Unix perms and metadata and be able to edit them?

tracker1 · on April 9, 2015

FYI: In windows (desktop versions) it's only now in the "Enterprise" version, and apparently will be removed in Windows 10. Not sure if there are limitations to which server versions are currently supported.

CIFS/SAMBA in Linux has been pretty good, imho for the past couple years... though I still get some occasional wonky behavior from my NAS box.

karpodiem · on April 10, 2015

Where have you heard that they are removing the NFS client in Win10?

gnur · on April 10, 2015

I have read this as well, and to my experience, it really isn't available in my windows 10 preview vm.

jstoiko · on April 9, 2015

Thanks for the time-saving description. Osx also has a built-in NFS client.

cwyers · on April 9, 2015

Until Amazon starts offering EC2 OS X instances, I'm not sure that's as relevant.

cozzyd · on April 10, 2015

I assume Amazon can't offer EC2 OS X instances until Apple allows OS X to be virtualized

travem · on April 10, 2015

FYI OS X can be virtualized, but only run on Apple hardware. VMware have some details on the other caveats here -> http://kb.vmware.com/selfservice/microsites/search.do?langua...

echeese · on April 9, 2015

I think it means you can mount an EFS instance on OS X

cwyers · on April 9, 2015

No, it doesn't. From the announcement:

"Only Amazon EC2 instances within the Amazon Virtual Private Cloud (Amazon VPC) you specify can directly access your Amazon EFS file systems."

wmf · on April 9, 2015

Yes, but NFS over the Internet may not be fast enough to be usable.

alrs · on April 9, 2015

Back in olden times Linux installation floppies provided the option of installing from an NFS share.

zippergz · on April 9, 2015

We used to do it over 10Base-T, which is slower than a lot of residential connections now. Well, latency will be higher over the internet, but bandwidth will be ok.

dekhn · on April 10, 2015

That's the whole point. Since NFS is such a round-trip oriented protocol, it's very sensitive to latency. You'll never fill up the bandwidth unless you do large bulk transfers in parallel.

Drdrdrq · on April 9, 2015

But you can only use efs with ec2, which doesn't support osx.

cmurf · on April 9, 2015

Unfortunately it's fairly flaky release to release.

andrew311 · on April 9, 2015

EFS is a great addition to AWS. We have SAN as a service via EBS, now we get NFS as a service. Great.

The question (for me) now becomes "where do we go from here?"

Infinite NFS is great, but what I've always wanted is infinite EBS that is fully integrated from file system to SAN. In other words, something that behaves like a local file system (without the gotchas of NFS like a lack of delete on close), but I don't have to snapshot and create new volumes and issue file system expansion commands to grow a volume. I want seamless and automatic growth.

Furthermore, there's so much local SSD just sitting around when using EBS. I want to make full use of local SSD inside of an EC2 instance to do write-back or write-through caching. I could do this in software, but maybe there's an abstraction begging to be made at the service level.

Throw in things like snapshots, and this would make for a fairly powerful solution, and it would certainly remove a lot of operational concerns around growing database nodes and such.

Don't get me wrong, you can pull together a few things and write some automation to do this today. You could use LVM to stitch together many EBS volumes, add in caching middleware (dm-cache, flashcache, etc.), and then automate the addition of volumes and file system growth. However, it's clunky, and there's an opportunity to make this much easier.

I recognize that what I'm describing doesn't serve the same purpose as NFS - for example, EBS isn't mountable in multiple locations at once - but I'd really like to see the "seamless infinite storage" idea applied to EBS.

kondro · on April 10, 2015

http://aws.amazon.com/storagegateway/ is a little like what you require (if you're unfamiliar).

It's not unlimited, but currently its 32TB/volume and you're charged for the storage you use, rather than what is provisioned.

It supports encryption at rest.

rgbrenner · on April 10, 2015

something that behaves like a local file system (without the gotchas of NFS like a lack of delete on close), but I don't have to snapshot and create new volumes and issue file system expansion commands to grow a volume. I want seamless and automatic growth.

Wouldn't EBS w/ thin provisioning get you most of this? Just create a massive volume, and you get billed for the space actually used. (and the volume size could also function as a limit on your bill.)

andrew311 · on April 10, 2015

Yes, good point. This would provide effectively "infinite" backing storage. There might be some hurdles to overcome, though. For example, when you delete a file, will EBS know that the blocks are now free and thus can be decommissioned? This might mean the whole stack needs to support things like TRIM. I'm not sure the rest of the stack is smart enough yet. I'd love to hear from a storage/FS expert on this.

Edit: coincidentally, I just saw this article about XFS which observes the following:

"Over the next five or more years, XFS needs to have better integration with the block devices it sits on top of. Information needs to pass back and forth between XFS and the block device, [says Dave Chinner]. That will allow better support of thin provisioning." https://lwn.net/Articles/638546/

rgbrenner · on April 10, 2015

The reason I suggested it is because thin volumes are well understood, so the issues are straightforward... and it's been implemented in many products (lvm; virtually every san; xenserver & vmware; etc). So there really shouldn't be many surprises if amazon were to implement it.

And yes, trim is used to mark blocks as free.

Honestly, it's so widespread, I would be surprised if Amazon weren't already using it to over-commit ebs.

jordanthoms · on April 9, 2015

That would be awesome. A huge engineering effort - I imagine that would require building a radically different filesystem more or less from scratch - but AWS has the resources to do that sort of thing.

ealexhudson · on April 9, 2015

I think this is a great product, but I've been actively avoiding NFS for a while now. It's a real shame that there isn't a better network FS protocol standardized already - NFS is complex, is usually a single point of failure (would be interesting to know if EFS isn't...) and comes with a whole set of cruft.

koenigdavidmj · on April 9, 2015

I understand why they did it, though. It's either this (everyone supports NFS, even Windows) or get crucified for vendor lock-in. Just like Github did yesterday (https://news.ycombinator.com/item?id=9343021), even when they released their product as an open standard with an open source reference implementation.

justinsb · on April 9, 2015

There's a world of difference between what Github did, and the hypothetical where AWS chose a proprietary protocol to access a filesystem. For AWS, as long as the underlying filesystem was "more or less POSIX", the access mechanism is largely irrelevant to lock-in; it would be as easy to switch from AWS as it would be to move between filesystems.

Git was not designed for large files. But what github released yesterday primarily serves to promote github's central-server model for git. Moreover, it seems that it could have been better done within the git protocol itself (modify git to do more sparse pulls, and then try to fetch on a checkout when it is missing blobs, rather than erroring immediately).

I suspect AWS chose to use NFS for expedience, the net effect is positive, but I don't think it would have much mattered anyway.

Github is trying to inject their own server-model into the git protocol, with an extension that is only half thought through; that is a huge step backwards, open-source or not.

PaulHoule · on April 9, 2015

NFS is definitely wonky if you are using cheap tools, but high end systems give astonishing performance.

ealexhudson · on April 9, 2015

Well, indeed, I'm not sure I could offer a different choice I would have preferred - so the criticism definitely isn't constructive in that sense. It does seem, though, that the world is missing a sane cross-platform network system. If there had even been some fuse-based system that used a more robust protocol I'd probably prefer that, although if they can more or less guarantee that the NFS service won't go down (easily) then I suppose most of my qualms would be put aside.

Titanous · on April 9, 2015

9p is a better network filesystem protocol, and it's available on Linux.

23david · on April 9, 2015

What's your experience been running it in production? Any gotchas?

Titanous · on April 9, 2015

We use it to share filesystems into QEMU/KVM VMs, so far zero issues, but it's not a performance-constrained use case.

I do have some plans to do benchmarks for another use case that's more performance intensive, but it's buried in my backlog right now.

antocv · on April 10, 2015

Can you do a simple time sh -c "dd if=/dev/zero of=testfile bs=8k count=11000"; rm testfile

"performance" test?

Also do the read test on a really big file, first from your host - so the file will get cached, then from within the qemu-vm where virtfs plan9 is mounted, dd if=bigfile of=/dev/null

Please, I want to confirm its not only my machines suffering from really totally shitty read/write performance thru virtfs?

antocv · on April 10, 2015

Its performance for me was 500% slower than NFS which was only 30% slower than direct IO.

No time to investigate.

AceJohnny2 · on April 9, 2015

That's enticing, are there any references/benchmarks to back it up?

Titanous · on April 9, 2015

The protocol itself is much cleaner, but I'm not aware of any recent benchmarks.

Compare https://ericvh.github.io/9p-rfc/rfc9p2000.html vs https://tools.ietf.org/html/rfc5661

kilburn · on April 9, 2015

TODO: conduct this comparison when 9p specification is not full of "TODOs"...

Titanous · on April 9, 2015

Oops, missed that. This is more complete: http://man.cat-v.org/plan_9/5/intro

antocv · on April 10, 2015

From my own benchmark, on kernel Linux 3.19.3.201503270049-1-grsec, qemu 2.2.1, 9p was 500% slower on read and immensily - after 9min gave on the write test which took 29s on host.

alrs · on April 9, 2015

This is what should have existed instead of EBS all along.

I'll never consider this to be as reliable as S3, but if I'm going to have a network filesystem I'd rather be dealing with NFS as my abstraction instead of virtualized network block devices.

late2part · on April 9, 2015

I believe EBS was the right building block at the right time, and I'll still use it over EFS in the majority of my deployment designs.

I still like to "trifurcate" the storage into objects, local disposable, and local volumes. Having durable local volumes still makes sense in a lot of scenarios.

wmf · on April 9, 2015

AWS is lean so they build what's easy for them to build, not what should exist. You can gauge how hard a feature is by how long it took them to implement it.

boynamedsue · on April 9, 2015

They've been working on this for 2.5 years.

alexchamberlain · on April 9, 2015

May I ask why? My experience with NFS is pretty bad performance wise.

matheweis · on April 9, 2015

Not sure why NFS gets such a bad rap. On a low-latency network, properly tuned NFS has very few, if any performance issues.

I've personally seen read/write rates exceeding 800MByte/sec on more or less white-box hardware, at which point it was limited by the underlying storage infrastructure (8Gbit fiber), not the NFS protocols.

Dell has a 2013 white paper (I'm not affiliated with them, fwiw) about their fairly white box setup that achieved > 100,000 iops, 2.5GBbyte/s sequential write, and 3.5GBbyte/s sequential read: http://en.community.dell.com/techcenter/high-performance-com...

buster · on April 10, 2015

(me: Working in the Messaging Business for ISPs)

Not sure how it would ever be technically possible for a networked filesystem to get even near directly attached storage. But, for sure, the typical carrier grade EMC or Netapp is MUCH slower then a good SAN. I'm talking about petabytes of very small (average maybe 20kB) files with lots of _random_ sync writes and reads. NFS has a lot of other benefits, but it surely is not super high performance in every usecase. Regardless of what a theoretical marketing whitepaper has shown in some lab setup.

Someone who thinks that you can put a network protocol around a filesystem without _any_ performance impact is nuts.

BUT if your usecase fits NFS you might as well get very good performance out of it. As always, pick the right technology for your specific case.

dekhn · on April 10, 2015

Petabytes of 20k files?

I think you might want to use your filesystem more effectively.

buster · on April 11, 2015

Well, what would it help in terms of NFS? You'd still have to tell NFS to read 20kB. If it's 20kB from one big file or one 20kB file doesn't matter much. it's common to have one file per email and the usual filesystem has no problem with that.

eknkc · on April 9, 2015

Is it really that good?

My only test case has been a VMWare virtual machine, mounting an nfs share from the host so I could work on my local filesystem and execute within the VM. Switched to a filesystem wather + rsync combo after struggling with poor random read performance. Maybe it was due to bad configuration but always thought it would be a poor choice for anything serius.

ploxiln · on April 9, 2015

That might be an issue with scheduling of the virtual kernel and the host kernel.

I've found nfs to be much faster and more reliable than sshfs or smbfs for VMs, using either qemu-kvm on linux or virtualbox on OS X.

SEJeff · on April 9, 2015

That very much depends on what you're using and how you tune things. With NFSv4.1, you can use parallel nfs, which is essentially striping reads and writes over multiple nfs servers.

http://www.pnfs.com/

If you're using modern servers and clients, it is as fast as you can imagine a cluster of ssd nfs servers to be.

toomuchtodo · on April 9, 2015

Depending on how you tune it, it can be a monster. Several years ago I was managing a cluster with ~5K linux instances all mounted to ~4PB of spinning disk served with NFS. Worked very well.

click170 · on April 9, 2015

Can you speak to any stale NFS handle problems?

I've used NFS at home and have had NFS file handle problems but IIRC that was only when there were problems like kernel faults or network partitions.

However several of my colleagues at work have many NFS horror stories and are adamant that NFS does not scale well.

Is NFS stability at scale simply a function of your underlying network and infrastructure stability in your opinion?

rectang · on April 9, 2015

Unlike typical local Unix file systems, NFS does not support "delete on last close" semantics.

Ordinarily, even if you unlink a file, the operating system keeps the inode around until the last filehandle referencing it goes away. But an NFS mount cannot know when all filehandles on all networked systems have closed. When you attempt to read from an NFS file handle whose underlying file has been deleted out from under you, BOOM -- `ESTALE`.

The solution is typically to guard against file deletion using read locks... which are extremely annoying to implement on NFS because of portability issues and cache coherency problems.

I'm not sure I'd describe that as a "scaling problem" per se, because it gets bad quickly and stays bad. It's more of a severe limitation on how applications and libraries can design their interaction with the file system.

gavia1 · on April 9, 2015

Doesn't NFS rename deleted files to some temp name so other clients can still read and write to it using their existing file handle?

dekhn · on April 9, 2015

some implementations of NFS use "silly rename" to get delete on last close semantics. http://nfs.sourceforge.net/#faq_d2

I think that's limited to v2/v3 and not fully general or reliable.

matheweis · on April 9, 2015

Having a low latency network is key (e.g. LAN across an office or maybe a city should be fine, but WAN across country would be bad news).

Hard coding the fsid's on the server side can help if you change your exports a lot.

Using the automounter can help, although it does not scale to large numbers of mounts well.

acdha · on April 9, 2015

It very much depends on your workload, particularly with NFSv3 and earlier. We were able to reliably handle multiple gigabit streams no later than 2005 but that was writing to huge files (backing up a ~2-3Gbps data acquisition system being processed by 4 Mac or Linux clients).

Small files were much worse because they require a server round-trip every time something calls stat() unless you know that all of the software in use reliably uses Maildir-style practices to avoid contention. That meant that e.g. /var/mail could be mounted with the various attribute-cache values (see acregmin / acdirmin in http://linux.die.net/man/5/nfs) but general purpose volumes had to be safe and slow.

If you read through the somewhat ponderous NFSv4 docs, there are a number of design decisions which are clearly aimed at making that use-case less painful. I haven't done benchmarks in years but I'd assume it's improved significantly.

azinman2 · on April 9, 2015

Ya. I'm wondering how you'd back say a PostgreSQL instance over nfs? It is just different needs that require different solutions.

dekhn · on April 9, 2015

You run PostgreSQL or any other DB server with its DB data dir on the NFS mount.

Oracle supports this- and they even wrote a user-space NFS client to "get the highest level of performance" (because they thought the kernel NFS implementation sucked).

The important bit is to ensure the NFS client and server implementation handle whatever POSIX features are required by the DB server.

azinman2 · on April 10, 2015

Why would you want to? You can't share across two instances at the same time anyway, it's going to be slower/more edge case-y, and the cost with Amazon is higher?

alexchamberlain · on April 9, 2015

Is anyone suggesting that? It is widely considered best practice to back your databases by SSDs.

mallipeddi · on April 9, 2015

Well he's pointing out EFS is not a replacement for EBS. You can run postgres on EBS volumes with PIOPS.

cwyers · on April 9, 2015

If EFS offers PIOPS, couldn't you run Postgres on it, too?

teraflop · on April 9, 2015

EFS is backed by SSDs.

alrs · on April 9, 2015

Yeah, if you need a performant file system you use an instance with a lot of local storage or you get out of EC2.

zippergz · on April 9, 2015

On the one hand, I've been wanting something like this for a while. On the other hand, I have so many bad memories of problems caused by NFS in production from the 90s that I'm leery.

acdha · on April 9, 2015

At least the Linux NFS client made some big improvements in stability by the mid-2000s. FreeBSD took longer but I've heard they've fixed the kernel deadlocks as well.

The other interesting note is that they apparently only support NFSv4, which has some welcome improvements: it uses TCP over a single port, avoids the entire portmap/statd/lockd train-wreck, UTF-8 everywhere, etc. One of the more interesting ones is that it has referrals (think HTTP redirect) so one server doesn't have to handle every request from every client and you can do things like load-balancing. Clients are also assumed to be smarter so e.g. local caching is safe because the server can notify you about changes to open files rather than requiring the client to poll.

evol262 · on April 9, 2015

It's almost certainly NFSv4 only so they can utilize pNFS

acdha · on April 9, 2015

I'd be surprised if that didn't factor into the equation, although even just regular referrals would be acceptable for many uses where you can do some degree of load-balancing / fail-over with reasonable recovery times.

notacoward · on April 9, 2015

Not necessarily. IIRC, pNFS is an optional feature of NFSv4.1, so they might not have implemented that flavor. If they had, I'm pretty sure they'd be advertising it. They're not shy.

evol262 · on April 9, 2015

Technical details aren't out yet. They may not have implemented it, but given their comments about performance and scaling, it would be a very good reason not to support NFSv3.

jtchang · on April 9, 2015

Does anyone else cringe when someone suggestions using NFS in production?

I can't be the only one that has been woken up at 2am because of an NFS outage.

wildlogic · on April 9, 2015

Although skeptical about it going into the project, I've deployed a virtualized oracle rac environment over 10G NFS and with some tuning it was stable and performant. If it is good enough for rac, which has some of the most stringent latency / performance requirements that I've seen, I'd say that it is probably good enough for quite a few production use cases, although to be fair this was only an 8 node cluster.

throwaway1979 · on April 9, 2015

Some naive questions if you don't mind... (I'm really curious about RAC .. my only DB experience has been small MySQL and MS SQL Server clusters).

1) I thought the really high end DBs like to manage their own block storage. Your NFS comment suggests that the database data files were running on an NFS mount, and you had a 10 gig Ethernet connection to the file server.

2) What would you say is the average size of a RAC cluster in (your opinion)? Is 8 considered a small cluster in this realm?

3) DBs have stringent requirements when it comes to operations like sync. Can you actually get ACID in an NFS backed DB?

Thanks for satisfying my curiosity :)

jsmeaton · on April 9, 2015

Just to offer a little bit of information, we're currently running a 2 node RAC cluster. I'm not entirely sure about the storage mechanism though.

notacoward · on April 10, 2015

Does anyone else cringe when someone suggestions using XYZ in production? I can't be the only one that has been woken up at 2am because of an XYZ outage.

XYZ could be NFS, SCSI, MySQL, Rails, KVM, ..., you get the idea. Any technology that has seen wide use has caused someone to be woken up at 2am because of an outage. NFS has been very widely used for a very long time. As a distributed file system developer who once helped design a precursor of pNFS I think NFS has some pretty fundamental problems, but the fact that NFS servers sometimes go down is not one of them. Often that's more to do with the implementation and/or deployment than the protocol, and no functionally similar protocol would do much better under similar circumstances. People get woken up at 2am because of SMB failures too. My brother used to get woken up at 2am because of RFS failures. Nobody gets woken up at 2am because of 9p failures, but if 9p ever grew up enough to be deployed in environments with people on call I'm sure they'd lose sleep too. EBS failures have bitten more than a few people.

Citing the existence of failures, other than proportionally to usage, isn't very convincing. I'd actually be more concerned about the technology on the back end of EFS, not the protocol used on the front.

fsniper · on April 10, 2015

Yes I had my fair share of issues with NFS too.

It works, more than usually but the problem starts whenever it feels like not working. And you can never be sure whenever the tantrum will come.

With linux while using the nfs-kernel-server most of the time you need to restart if a problem occurs. And I don't like restarts.

A few years ago, slabtop helped me to troubleshoot some memory issues, turned out nfs-kernel-server was leaking. I had to upgrade the kernel.

The shame is, there is nothing like nfs to replace nfs. Easy to deploy, easy for clients, works everywhere.

smcleod · on April 9, 2015

I can't say I have any problems with NFS - we use it for shared storage on some pretty busy servers without any issue. I'm not saying they don't happen - just that we don't experience them. I'd be interested to hear the problems you've encountered - did you submit bug reports for them that you could perhaps link to?

jude- · on April 9, 2015

Maybe it only speaks the NFSv4 protocol, but the implementation is different?

xorcist · on April 9, 2015

On a previous workplace, we had a pretty beefy VMware set up backed by NFS. Performance was excellent, and file level access offers a lot of functionality you can't have with for example iSCSI.

tw04 · on April 10, 2015

That sounds like an issue with the implementation, not the protocol. There are countless large environments I know of running NFS in production on NetApp without any issues at all.

ceejayoz · on April 9, 2015

I'd guess this is some sort of clustered solution with failover.

positr0n · on April 9, 2015

As someone who doesn't know AWS much, what are the major differences between this and EBS?

nkozyra · on April 9, 2015

As has been mentioned, EBS can only be mounted to one instance at a time. If you could mount it to multiple it would effectively be the same thing, but then you have all sorts of write-locking issues.

As the rest of the comments also allude, a lot of the cloud-entrenched world has abandoned NFS, or at least in AWS circles.

I'm not one of these people. Rather than relying solely on puppet->all instances to handle multiple deploys, the convenience of an NFS instance was appealing, so essentially I have a relatively small (read: medium) instance that does nothing but manage the deployment filesystems and allow new in-security groups to connect. I have often though of abandoning this for doing deploys in a more "modern" way but I'm still not sure what the benefit would be other than eliminating a minor source of shame.

The analog here is running a MySQL or Postgres database in an instance prior to RDS. RDS provided enough benefit that the minor price difference in rolling your own no longer factored in. A more reliable, fault-tolerant and extensible file system is, like RDS, a huge upgrade. It may not be for everyone, but for some of us it's just another reason why AWS keeps making it hard to even look anywhere else.

vizzah · on April 9, 2015

Would it be a good idea to store Mysql db on EFS to be shared among load-balanced instances?

simonw · on April 9, 2015

No, absolutely not. See the warning on http://dev.mysql.com/doc/refman/5.0/en/multiple-data-directo...

nkozyra · on April 10, 2015

As simonw points out, it produces a real problem with regard to MySQL and the way it handles its data source(s) in general, but more importantly, it's wholly unnecessary.

The right way to do that (in AWS world) would be RDS accessible to all instances in the security group. This yields locking control to the application level and not the file system. For obvious reasons this makes a lot of sense.

There are of course non-ACID / NoSQL solutions for which this might be an acceptable practice, but in general I'd say it's fraught with peril.

rev_bird · on April 10, 2015

This idea got shot down pretty quick, but I love that it was asked. Coming up with potentially disastrous sideways use-cases for new toys is a favorite pastime of mine.

zippergz · on April 9, 2015

You can mount the volume on more than one instance at the same time, which you can't do with EBS.

artursapek · on April 9, 2015

It's also hard to resize an EBS volume, right? This resizes itself automatically.

kolencherry · on April 9, 2015

You have to create a snapshot of the EBS volume and create a new (larger) volume from said snapshot to grow the volume. Yep, this resizes automatically.

istvan__ · on April 10, 2015

NFS? I just simply don't see the place for this product. The world moved towards the use of smart data formats on platforms like S3 and HDFS. Where would be a NFS service better? It is kind of hard to see for me to use a distributed filesystem over a distributed datastore. Wondering about the outage scenarios with that. Historically speaking, the drivers that deal with this in the linux kernel are not the best in terms of locking.

http://docstore.mik.ua/orelly/networking_2ndEd/nfs/ch11_03.h...

This is a step back from the right direction to me.

notacoward · on April 10, 2015

There are two things wrong with the "nobody should use a distributed file system" meme. (Disclaimer: I'm a distributed file system developer.)

(1) Not all development is green-field and thus subject to using this week's fashionable data store. A lot of applications already use file systems and rely on their semantics. Ripping up the entire storage layer of a large and complex application would be irresponsible, as it's likely to create more problems than it solves. Look down your nose at "legacy code" all you want, but those users exist and have money to give people who will help them solve their business problems. Often the solutions are as cutting-edge as anything you'd find in more fashionable quarters, even if the facade is old fashioned.

(2) Even for green-field development, file systems are often a better fit than object stores. Nested directories are more than just an organizational scheme, but also support security in a way that object stores don't. What's the equivalent of SELinux for data stored in S3 or Swift? There is none. File system consistency/durability guarantees can be important, as can single-byte writes to the middle of a file, while parsing HTTP headers on every request drops a lot of functionality and performance on the floor. Distributed databases are much more compelling, but the fact remains that the file system model is often the best one even for new applications.

Go ahead and use something else if it suits you. Other people wouldn't have much use for your favorite storage model, and this one suits them perfectly.

istvan__ · on April 10, 2015

Funny you mentioned that this is a meme to you, while it is really a technical consideration to me, and I supplied some details about my concerns.

Answers following your numbering:

(1) Calling S3 "this week's fashionable data store" is like saying that an elephant is an interesting microbe. The rest of your points about "please do not innovate, we have filesystems for 40 years and this is how you store your data". I do not agree. Disclaimer, I was member of the team that moved amazon.com from using an NFS based storage to S3. It was a great success, and it solved many of our problems including dealing with the insane amount of issues that was introduced by running an NFS cluster at that scale. And I would like to emphasize on scale, because your operational problems are quite often increase worse than linear with you scale.

I know about legacy code, and running several legacy services in production as of now. I can tell you one thing. There is a point when it is not financially viable to keep rolling with the legacy code. This point is very different based on your actual use case, banks tend to run "legacy code" while web2.0 companies tend to innovate and replace systems with a faster peace. I don't see any conflict here. We even did a compatibility layer for the new solution and it was possible to run your legacy code using the new system and your software was untouched.

(2) Nested directories is a logical layer on the top of how the data is stored, aka a view, your are a distributed FS developer so I guess you understand that. S3 also supports nested directories no biggie here. Security. Well this is kind of weird because last time I checked S3 had an extensive security http://aws.amazon.com/s3/faqs/#security_anchor Now the rest of your question can be re-phrased: "I am used to X why isn't there X with this new thing???" I am not sure how many of the file system users use SElinux my educated guess is that roughly around 1-10%. It is a very complex system that not too many companies invest using. For our use cases the fine grained ACLs were good enough so we are using those. File system durability: yes it is very important, this why I was kind of shocked about this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/...

I guess you are right about the overhead of reading and writing, dealing with http headers etc. If the systems that benefit the most from S3 where single node systems it would be silly to use S3 at the first place. We are talking about 1000 - 10000 computers using the same data storage layer. And you can tell me if I am wrong but if you would like to access the same files on these nodes using a FS than you are going to end up with a locking hell. This is why modern software that is IO heavy moved away from in-place edits towards the "lock free" data access. Look at the implementation of Kafka log files or how Aeron writes files. This is exactly the same schematics how use use S3. Accident? ;)

I would like to repeat my original question: I don't see huge market for a distributed FS. I might be wrong, but this is how I see it.

http://kafka.apache.org/ https://github.com/real-logic/Aeron

notacoward · on April 10, 2015

"please do not innovate, we have filesystems for 40 years and this is how you store your data"

Please don't put words in my mouth like that. It's damn rude. I never said anything that was even close.

"S3 also supports nested directories no biggie here."

Not according to the API documentation I've seen. There are buckets, and there are objects within buckets. Nothing about buckets within buckets. Sure, there are umpteen different ways to simulate nested directories using various naming conventions recognized by an access library, but there's no standard and thus no compatibility. You also lose some of the benefits of true nested directories, such as combining permissions across different levels of the hierarchy. Also no links (hard or soft) which many people find useful, etc. Your claim here is misleading at best.

"last time I checked S3 had an extensive security"

Yes, it has its very own permissions system, fundamentally incompatible with any other and quite clunky to use. That still doesn't answer the question of how you'd do anything like SELinux with it.

"File system durability: yes it is very important, this why I was kind of shocked about this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/...

Open up your bug list and we can have that conversation. Throwing stones from behind a proprietary wall is despicable.

"you can tell me if I am wrong but if you would like to access the same files on these nodes using a FS than you are going to end up with a locking hell."

You're wrong. Maybe you've only read about distributed file systems (or databases which have to deal with similar problems) from >15 years ago, but things have changed a bit since then. In fact, if you were at Amazon you might have heard of a little thing called Dynamo which was part of that evolution. Modern distributed systems, including distributed file systems, don't have that locking hell. That's just FUD.

"I don't see huge market for a distributed FS."

Might want to tell that to the EFS team. Let me know how that goes. In fact you might be right, but whether there's a market has little to do with your pseudo-technical objections. Many technologies are considered uncool long before they cease being useful.

joemccall86 · on April 9, 2015

We're contemplating moving our static image files (JPG/PNG) from an EBS volume to serving them from an S3 bucket (so we can deploy a HA setup). It sounds like it would be a lot less code if we used EFS instead. Would you guys recommend S3 or EFS for this scenario?

skuhn · on April 9, 2015

One thing to consider is that EFS is priced at 10x the cost of the most expensive tier of S3 storage ($0.30/Gbyte vs $0.03/Gbyte).

If they are just static assets, you would do better to put them in an S3 bucket, set appropriate Cache-Control headers and serve them via Cloudfront. This reduces your outbound bandwidth cost to the Internet versus EC2/S3, and yields better performance.

ekimekim · on April 9, 2015

While there's alot of unknowns at this point, my money would be to stick with S3. Serving static files quickly is a large part of what it is designed to do. From what I'm reading here, EFS seems more geared towards shared volumes where it being a filesystem is a critical part of the interface. If you can get away with not needing a posix filesystem layer over your datastore, you should.

jdub · on April 9, 2015

If your architecture is amenable to storing files in S3, do. (And if it's a web app, ideally you'd serve them directly, via CloudFront.)

EFS first and foremost will take away a huge amount of pain when trying to make filesystem-dependent legacy applications more reliable.

b1twise · on April 9, 2015

S3 is going to be a lot cheaper. It isn't perfect, but it's pretty reliable. Then you can look at cloudfront. And they did recently release cross-region S3 replication. Then you can be really safe and keep a backup copy at a whole other location.

ceejayoz · on April 9, 2015

I don't think anyone can really recommend EFS for anything yet until it's in the wild for testing. I'd probably still store static files like that on S3 - with EFS you'd still need a server to handle the requests.

mikey_p · on April 9, 2015

Depends, I work with a number of HA Drupal sites, and Drupal 7 can do most files in S3 but it still likes to put generated CSS/JS files and tmp files on local disk. In most of these cases (or for already existing sites) it's usually easier to just use NFS or Gluster instead of trying to force everything to S3.

upstream issue: https://www.drupal.org/node/2044509

willcodeforfoo · on April 9, 2015

Looks interesting... but for my use case I'd rather not deal with NFS, use a FUSE-based S3 mount and save 90%.

fi788 · on April 9, 2015

S3fs filesystems are really slow. We tested around 10mb/s for file upload. Where it really struggles is when you have a lot of files in a folder. Try doing an 'ls' on a folder with hundreds of files to see it break.

bmurphy1976 · on April 9, 2015

As always, it depends on your use case. Just because it can be slow doesn't mean it's not a viable (and in some cases superior) option.

We use it to store petabytes of large video files and our system is structured such that no folder ever has more than a couple files in it (>20 is rare). With properly tuned caching this works fantastically well for our use case and I would take the simpler code and reduced points of failure over NFS nonsense any day.

That of course doesn't mean s3fs is the solution to every problem, it simply means it's good to have options and don't write something off because it "might be slow."

Know your data, know your use case, and know your tools. You can make smart decisions on your own rather than driven by anecdotal comments on HN.

fi788 · on April 9, 2015

I agree with your points. It is indeed a viable, if somewhat clunky solution.

For getting the data into S3, we found exponential improvements in using the AWS CLI, as I believe it handles uploads in a multi-threaded way.

S3fs turned out to be viable for our use case, storing Magento Enterprise content assets which are then served directly from S3, so the app's upload features rely on s3fs as well as the file checks from the app itself (which are indeed quite slow).

I've always wanted to do it natively, mounting EBS volumes on more than one instance (which is not currently possible) or wishing for a native NFS service like AWS released.

All in all, it is a happy day for me. More options make us more powerful.

moe · on April 9, 2015

Which s3fs are you using?

The fuse-based ones that I've tried were ridden with problems and poor error handling. Hangs and truncated files were the rule rather than the exception.

gaul · on April 9, 2015

s3fs-fuse has its share of problems, but master has fixes for some of the error handling and truncated files issues. Please report any bugs you encounter on GitHub!

IgorPartola · on April 9, 2015

S3 also seems to provide only eventual consistency.

ceejayoz · on April 9, 2015

http://aws.amazon.com/s3/faqs/

> Q: What data consistency model does Amazon S3 employ?

> Amazon S3 buckets in the US Standard region provide eventual consistency. Amazon S3 buckets in all other regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.

ekimekim · on April 9, 2015

I'd be very interested to know what kind of consistency guarantees EFS provides. The history of NFS is plagued by syscalls whose docs have a variation of the phrase "this operation is atomic (except on NFS)".

IgorPartola · on April 9, 2015

Agreed. From http://nfs.sourceforge.net/nfs-howto/ar01s05.html

> Finally, note that, for NFS version 3 protocol requests, a subsequent commit request from the NFS client at file close time, or at fsync() time, will force the server to write any previously unwritten data/metadata to the disk, and the server will not reply to the client until this has been completed, as long as sync behavior is followed. If async is used, the commit is essentially a no-op, since the server once again lies to the client, telling the client that the data has been sent to stable storage. This again exposes the client and server to data corruption, since cached data may be discarded on the client due to its belief that the server now has the data maintained in stable storage.

I am not certain how this works in NFSv4 which is what EFS will be. The safe solution is to use the sync option for mounting the NFS volume, at the cost of performance.

b1twise · on April 9, 2015

This would be big for us. When we initially looked at the problem of sharing or keeping a large number of files in sync, the prospects were dim. DRBD? etc. So we ended up using Gluster. Gluster has been temperamental at best. We've been able to move some data out and into elasticsearch, but not all. So, I've nudged my AWS rep and signed up already. Reliable NFS is good for me.

graceofs · on April 9, 2015

We couldn't find a solution either, so we built a posix filesystem with S3 backend backend that is easy to run and scale. If you want to give ObjectiveFS (https://objectivefs.com) a try, I'll be happy to hear your feedback.

netcraft · on April 9, 2015

blog link: https://aws.amazon.com/blogs/aws/amazon-elastic-file-system-...

tszming · on April 9, 2015

Congratulation on launching a killer feature once you addicted to it, very hard to move out from AWS :) (but honestly, this is a very awesome technology we've been dreaming for years)

rcoder · on April 9, 2015

How does this make it harder to move out of AWS?

If anything, their use of NFSv4 means there are plenty of competitive offerings if you decide that performance, security, or physical access constraints dictate migrating off their service.

If you don't want to manage your own Linux/BSD/etc. NFS infrastructure, Oracle, Netapp, and EMC will all happily sell you a storage appliance that supports it. I don't see much lock-in here.

Nexxxeh · on April 9, 2015

I think parent meant it in a positive way, as opposed to negative. "So good you don't want to leave" as opposed to proprietary tech results in vendor lock-in.

shogun21 · on April 9, 2015

What's the advantage of this over S3?

IgorPartola · on April 9, 2015

This is a filesystem. You mount it directly inside your EC2 boxes and work with it as if it's local (or really NFS-mounted). It's going to be a couple of orders of magnitude faster, but not directly web-connected, so you can't use it to serve content directly as if it was a CDN.

I can see this being useful for a few cases. For one, I can immediately use it for one of the projects I have where I have multiple worker servers and one of them needs to periodically process a few GB's of data, yet I don't want to give that much storage to every sever, and I don't want to make any one server special.

Another use case: you don't know how big your data will grow, yet you want to access it in a random fashion. S3 isn't great for this, but NFS is, and unlimited mounted storage is nice.

Edit: Third use case is logs. You can collect all the logs from all of your servers in one place and access them from any server.

shogun21 · on April 9, 2015

Okay, thanks for the explanation!

joemccall86 · on April 9, 2015

I think they're meant to solve different problems. On one hand S3 is designed for (among other things) serving files over the web. EFS sounds like it's designed to be used as a filesystem multiple instances can write to (say for a high-availability setup).

bosky101 · on April 19, 2015

this is the only real use-case IMO. Multiple instances being able to read/write to a single FS. Anything less would be a disappointment.

jdubs · on April 9, 2015

EFS which appears to be similar to NFS is built into the kernel. Read, write, stat and etc do not need to be wrapped into http calls.

SlalomStallone · on April 9, 2015

One nice thing would be hosting wordpress on multiple EC2 servers. All local copies of your wordpress directory would be kept in sync automatically with no other configuration.

ceejayoz · on April 9, 2015

Yeah, I have a WP Multisite cluster that's a real pain to manage for this reason. Probably will be my first use of EFS.

SlalomStallone · on April 9, 2015

How do you manage it now? Plugin? S3?

ceejayoz · on April 9, 2015

Our uploads folder is an s3fs mountpoint with a custom plugin to rewrite URLs to a CloudFront distribution in front of it.

The rest gets updated on a dev server then deployed to the auto-scaling group via an AMI.

netcraft · on April 9, 2015

We have been needing this very thing for hosting a large number of assets across multiple ec2 instances. Hope they roll this out quickly.

graceofs · on April 9, 2015

If you need something like EFS today, there is ObjectiveFS (https://ObjectiveFS.com), a posix cloud file system with S3 backend. Disclaimer: I am a co-founder :-)

nkozyra · on April 9, 2015

For the record, there has never been anything stopping you from spinning up an NFS EC2 that all of your other EC2s can use.

jdub · on April 9, 2015

Except that your one EC2 instance is a single point of failure, from instance through host through availability zone, and that providing reliable, multi-node NFS is hard (even with GlusterFS).

And you have to resize your block devices to store more stuff.

And...

:-)

nkozyra · on April 10, 2015

Well, this is a problem with a lot of ephemeral services hand-spun on AWS. There are also ways of mitigating it on your instances, but those are clunky, too.

Which is why new services like these are always preferable.

netcraft · on April 9, 2015

This may be a problem for this new service as well, but with windows isn't there a problem getting a network drive to mount without a user physically logging into the machine?

fi788 · on April 9, 2015

I believe it is to be released Summer 2015. I've applied for the Preview though.

kolencherry · on April 9, 2015

For what it's worth, We've found that if you contact your account manager @ AWS, the preview access requests can be pushed through faster.

biot · on April 9, 2015

All services should eventually be like this. Just as you don't want to deal with provisioning BTUs of air conditioning or watts of power needed for your cloud infrastructure, why should you concern yourself about allocating a certain number of bytes of storage?

pyre · on April 9, 2015

> Just as you don't want to deal with provisioning BTUs of air conditioning or watts of power needed for your cloud infrastructure

Maybe you don't want to, but there is definitely someone out there dealing with these issues.

E.g. during a heat-wave (100 F+) a transformer on top of the building (at a previous employer) started on fire. When the dust settled, we found out that the person in charge of it had not upgraded it as our power requirements increased. It was over-taxed and the heat-wave put it over the edge.

tlrobinson · on April 9, 2015

Right... which is why software should do it automatically.

discodave · on April 9, 2015

That is a reasonable analogy but the question is whether or not your in-house support will do a better job than Amazon.

toomuchtodo · on April 9, 2015

> All services should eventually be like this. Just as you don't want to deal with provisioning BTUs of air conditioning or watts of power needed for your cloud infrastructure, why should you concern yourself about allocating a certain number of bytes of storage?

Infrastructure guy here. Abstraction is to reduce workload; you still need to understand the underlying concepts. Otherwise, you're just the guy who freaks out when their DB is at 100% CPU utilization or hours of replica lag without knowing why.

JoelMaki · on April 22, 2015

WHY WAIT? Zetta.Net has been delivering enterprise grade capabilities, for more than 6 years, that Amazon/Google/MS are just now getting into. Don't wait for the pre-view, go to www.zetta.net and get all you can handle today. Customer's are moving in excess of 5TB in/out natively over the internet on a daily basis, with ability to do more given the available WAN connectivity. And add 100% Cloud Based DRaaS, and tomorrow is here today at Zetta.

bosky101 · on April 19, 2015

The best & key feature seems highly under-stated.

Until now, you could bind an ebs to only one instance.

Or had to use gluster / hdfs otherwise.

    Multiple Amazon EC2 instances can access an Amazon EFS file system
    at the same time, providing a common data source for
    workloads and applications running on more than one instance.

You effectively have a distributed filesystem now. this is great news.

arthursilva · on April 9, 2015

This could be a byproduct of what they developed for Aurora (their upcoming redesigned mysql) shared storage. Probably the other way around.

slashdev · on April 9, 2015

I was wondering that too, since Aurora uses shared multi AZ storage. When I learned that yesterday I found myself wishing they made just that available on AWS as it's own service. One day later, they announce it. I love what these guys are doing.

agrothberg · on April 10, 2015

How does this compare to just using AWS Storage Gateway (http://aws.amazon.com/storagegateway/) sitting in front of S3?

dumbfounder · on April 9, 2015

Finally! Seems like this should have been a product day one on AWS.

daemonk · on April 9, 2015

Nice. Do they have any details on how the resizing works? Is it automatic as I generate more files? Or will I have to explicitly provision more space as I generate the files?

zwily · on April 9, 2015

Looks like it's automatic.

davidmichael · on April 9, 2015

Its going to be really nice if you are able to snapshot these like EBS volumes. I didn't see any reference in the details on how data recovery would work.

simonjgreen · on April 9, 2015

NFS as a service on AWS. That is going to shake things up.

kaizen2015 · on April 9, 2015

Does EFS include snapshot capabilities? If so, can I use object for my protection scheme? How about EFS to Glacier connectors? In the works?

mrfusion · on April 9, 2015

Will this break my backup strategy of snapshotting my EBS volumes?

Or is this a feature you need to enable, and you keep using EBS otherwise?

dangrossman · on April 9, 2015

It's a totally separate product. It'll have no effect on your EBS usage.

joncooper · on April 9, 2015

Love it. I hope they can deliver meaningful throughput.

afrc · on April 11, 2015

Will this be available on Elastic Beanstalk?

jkrejci · on April 9, 2015

Pretty sweet little feature.

Shreyansh1911 · on April 10, 2015

tschellenbach · on April 9, 2015

Sounds like they are recommending to use an anti-pattern. Share your storage across instances.... You really don't want to do that.

teraflop · on April 9, 2015

Care to explain why not? It works well enough for Google: http://static.googleusercontent.com/external_content/untrust...

KaiserPro · on April 9, 2015

It works for HPC/VFX and "big data"

/home thats on a network drive with an SLA....

concurrent versions of your app sitting next to each other somewhere in $PATH meaning you can roll forward/back by typing app-$version (or what ever your convention is)

Quickly and efficiently share files between instances.

It basically means you can treat a node group as single proper linux cluster, without having to buy GPFS licenses or indulge in the horror that is glusterfs.

rjurney · on April 9, 2015

Amazon is the only company other than Apple that consistently thought leads and innovates ahead of other companies and open source.

vially · on April 9, 2015

Would you mind explaining how exactly "Apple consistently leads and innovates ahead of other companies and open source"?

efsavage · on April 9, 2015

Maybe you didn't notice, but they just invented the Watch.

Imagine...if you could put a whole entire clock...on your wrist. Almost everybody has a wrist!

rjurney · on April 9, 2015

Says the guy who will own one within a year. Or the Android knock off.

tw04 · on April 10, 2015

PSSST: Android watches were a thing long before Apple pre-announced theirs.