When I read most AWS product descriptions, I cannot understand what real world situations they are for. I either could not say, or could only vaguely say, what the product is in my own words. There seems to be a great scaffolding of assumed knowledge about the AWS system and distributed computing. Maybe that's intended, but it's intimidating and keeps me in simpler places like Heroku land.
For example, compare the Heroku pricing page and the AWS pricing page for EC2:
The Heroku one is significantly easier to understand in my opinion. I understand AWS has more options and is therefore harder to summarise though.
Or, to put it more simply: the intended audience for AWS ad copy is exactly the set of people who are currently trying to compare some specific AWS service to similar services of competitors. AWS is never trying to "create a need" for something nobody was already asking for; they're just trying to serve the needs people already do have better/cheaper/more flexibly.
Though, even if redundancy isn't a factor, it's nice that you can have network file shares without having to run your own dedicated instance in a given cloud. There are plenty of situations where having a common networked filesystem makes sense across a few servers for purposes of sharing some information, without it being physically on all of them... seeding static content, or user-uploaded files for example. It makes a given solution simpler to implement initially (though other considerations may take hold as a site/application grows).
Typically Azure seems to perform better in terms of storage i/o over AWS. Though, if you are disk constrained in ways you can't reasonable scale horizontally, then you may be better off with something using local disks, and your own backup strategy on something cheaper (Linode DigitalOcean, Joyent, etc).
Outside of the world of startups and young companies who "grew up" in the world of cloud-based solutions, there is a large ecosystem of more traditional enterprises who still have a lot on-premise computing.
These companies have a lot of lock in: Racks of physical on-site servers, Sharepoint-based access control, custom hardware and clusters for everything from large file storage to compliance metadata, and custom software built around this infrastructure.
Of those, one of the biggest lock-in dependencies I've seen is NFS. Not just because NFS one of the oldest protocols, but because of the nature of the NFS abstraction. Fundamentally, software that assumes a filesystem is shared, globally mounted, and read/write is very hard to adapt to a cloud solution. Many times, it requires re-writing the software, or coming up with a NFS shim (such as a FUSE solution) that is so underperforming it blocks usage.
If AWS implements this correctly, this could provide the cost/performance balance to potentially move such a solution completely to AWS. This would eliminate not just large amounts of physical overhead for these companies, but the productivity costs that come with the downtime that inevitably occurs when you don't have good redundancy.
These companies (and the industries they comprise) are trying to find out how best to leverage Amazon. Recently, even more conservative industries, such as Law, are becoming more aware of AWS and other cloud-based solutions. Lets hope that solutions like this, that bridge the old with the new, can empower that transition so we can all feel better about how our software is managed.
I've had mixed experiences with Gluster a year ago, including lost files, so something that was rock solid and easy to manage would be a great product.
- NFS (v4).
- Supports petabyte-scale file systems, thousands of concurrent NFS connections.
- Automatically grows/shrinks in size.
- Multi-zone storage and access.
- $0.30 / (Gigabyte * month).
- (Not mentioned) Both Linux and Windows have built-in NFS clients.
Does anyone use the NFS for Windows? Is it reliable enough I can move servers away from CIFS alogether if I want to access a Linux box from Windows. For instance if I click the folder's Properties in Windows, will I see the proper Unix perms and metadata and be able to edit them?
CIFS/SAMBA in Linux has been pretty good, imho for the past couple years... though I still get some occasional wonky behavior from my NAS box.
"Only Amazon EC2 instances within the Amazon Virtual Private Cloud (Amazon VPC) you specify can directly access your Amazon EFS file systems."
The question (for me) now becomes "where do we go from here?"
Infinite NFS is great, but what I've always wanted is infinite EBS that is fully integrated from file system to SAN. In other words, something that behaves like a local file system (without the gotchas of NFS like a lack of delete on close), but I don't have to snapshot and create new volumes and issue file system expansion commands to grow a volume. I want seamless and automatic growth.
Furthermore, there's so much local SSD just sitting around when using EBS. I want to make full use of local SSD inside of an EC2 instance to do write-back or write-through caching. I could do this in software, but maybe there's an abstraction begging to be made at the service level.
Throw in things like snapshots, and this would make for a fairly powerful solution, and it would certainly remove a lot of operational concerns around growing database nodes and such.
Don't get me wrong, you can pull together a few things and write some automation to do this today. You could use LVM to stitch together many EBS volumes, add in caching middleware (dm-cache, flashcache, etc.), and then automate the addition of volumes and file system growth. However, it's clunky, and there's an opportunity to make this much easier.
I recognize that what I'm describing doesn't serve the same purpose as NFS - for example, EBS isn't mountable in multiple locations at once - but I'd really like to see the "seamless infinite storage" idea applied to EBS.
It's not unlimited, but currently its 32TB/volume and you're charged for the storage you use, rather than what is provisioned.
It supports encryption at rest.
Wouldn't EBS w/ thin provisioning get you most of this? Just create a massive volume, and you get billed for the space actually used. (and the volume size could also function as a limit on your bill.)
Edit: coincidentally, I just saw this article about XFS which observes the following:
"Over the next five or more years, XFS needs to have better integration with the block devices it sits on top of. Information needs to pass back and forth between XFS and the block device, [says Dave Chinner]. That will allow better support of thin provisioning."
And yes, trim is used to mark blocks as free.
Honestly, it's so widespread, I would be surprised if Amazon weren't already using it to over-commit ebs.
Git was not designed for large files. But what github released yesterday primarily serves to promote github's central-server model for git. Moreover, it seems that it could have been better done within the git protocol itself (modify git to do more sparse pulls, and then try to fetch on a checkout when it is missing blobs, rather than erroring immediately).
I suspect AWS chose to use NFS for expedience, the net effect is positive, but I don't think it would have much mattered anyway.
Github is trying to inject their own server-model into the git protocol, with an extension that is only half thought through; that is a huge step backwards, open-source or not.
I do have some plans to do benchmarks for another use case that's more performance intensive, but it's buried in my backlog right now.
Also do the read test on a really big file, first from your host - so the file will get cached, then from within the qemu-vm where virtfs plan9 is mounted, dd if=bigfile of=/dev/null
Please, I want to confirm its not only my machines suffering from really totally shitty read/write performance thru virtfs?
No time to investigate.
Compare https://ericvh.github.io/9p-rfc/rfc9p2000.html vs https://tools.ietf.org/html/rfc5661
I'll never consider this to be as reliable as S3, but if I'm going to have a network filesystem I'd rather be dealing with NFS as my abstraction instead of virtualized network block devices.
I still like to "trifurcate" the storage into objects, local disposable, and local volumes. Having durable local volumes still makes sense in a lot of scenarios.
I've personally seen read/write rates exceeding 800MByte/sec on more or less white-box hardware, at which point it was limited by the underlying storage infrastructure (8Gbit fiber), not the NFS protocols.
Dell has a 2013 white paper (I'm not affiliated with them, fwiw) about their fairly white box setup that achieved > 100,000 iops, 2.5GBbyte/s sequential write, and 3.5GBbyte/s sequential read:
Not sure how it would ever be technically possible for a networked filesystem to get even near directly attached storage.
But, for sure, the typical carrier grade EMC or Netapp is MUCH slower then a good SAN. I'm talking about petabytes of very small (average maybe 20kB) files with lots of _random_ sync writes and reads. NFS has a lot of other benefits, but it surely is not super high performance in every usecase. Regardless of what a theoretical marketing whitepaper has shown in some lab setup.
Someone who thinks that you can put a network protocol around a filesystem without _any_ performance impact is nuts.
BUT if your usecase fits NFS you might as well get very good performance out of it. As always, pick the right technology for your specific case.
I think you might want to use your filesystem more effectively.
My only test case has been a VMWare virtual machine, mounting an nfs share from the host so I could work on my local filesystem and execute within the VM. Switched to a filesystem wather + rsync combo after struggling with poor random read performance. Maybe it was due to bad configuration but always thought it would be a poor choice for anything serius.
I've found nfs to be much faster and more reliable than sshfs or smbfs for VMs, using either qemu-kvm on linux or virtualbox on OS X.
If you're using modern servers and clients, it is as fast as you can imagine a cluster of ssd nfs servers to be.
I've used NFS at home and have had NFS file handle problems but IIRC that was only when there were problems like kernel faults or network partitions.
However several of my colleagues at work have many NFS horror stories and are adamant that NFS does not scale well.
Is NFS stability at scale simply a function of your underlying network and infrastructure stability in your opinion?
Ordinarily, even if you unlink a file, the operating system keeps the inode around until the last filehandle referencing it goes away. But an NFS mount cannot know when all filehandles on all networked systems have closed. When you attempt to read from an NFS file handle whose underlying file has been deleted out from under you, BOOM -- `ESTALE`.
The solution is typically to guard against file deletion using read locks... which are extremely annoying to implement on NFS because of portability issues and cache coherency problems.
I'm not sure I'd describe that as a "scaling problem" per se, because it gets bad quickly and stays bad. It's more of a severe limitation on how applications and libraries can design their interaction with the file system.
I think that's limited to v2/v3 and not fully general or reliable.
Hard coding the fsid's on the server side can help if you change your exports a lot.
Using the automounter can help, although it does not scale to large numbers of mounts well.
Small files were much worse because they require a server round-trip every time something calls stat() unless you know that all of the software in use reliably uses Maildir-style practices to avoid contention. That meant that e.g. /var/mail could be mounted with the various attribute-cache values (see acregmin / acdirmin in http://linux.die.net/man/5/nfs) but general purpose volumes had to be safe and slow.
If you read through the somewhat ponderous NFSv4 docs, there are a number of design decisions which are clearly aimed at making that use-case less painful. I haven't done benchmarks in years but I'd assume it's improved significantly.
Oracle supports this- and they even wrote a user-space NFS client to "get the highest level of performance" (because they thought the kernel NFS implementation sucked).
The important bit is to ensure the NFS client and server implementation handle whatever POSIX features are required by the DB server.
The other interesting note is that they apparently only support NFSv4, which has some welcome improvements: it uses TCP over a single port, avoids the entire portmap/statd/lockd train-wreck, UTF-8 everywhere, etc. One of the more interesting ones is that it has referrals (think HTTP redirect) so one server doesn't have to handle every request from every client and you can do things like load-balancing. Clients are also assumed to be smarter so e.g. local caching is safe because the server can notify you about changes to open files rather than requiring the client to poll.
I can't be the only one that has been woken up at 2am because of an NFS outage.
1) I thought the really high end DBs like to manage their own block storage. Your NFS comment suggests that the database data files were running on an NFS mount, and you had a 10 gig Ethernet connection to the file server.
2) What would you say is the average size of a RAC cluster in (your opinion)? Is 8 considered a small cluster in this realm?
3) DBs have stringent requirements when it comes to operations like sync. Can you actually get ACID in an NFS backed DB?
Thanks for satisfying my curiosity :)
XYZ could be NFS, SCSI, MySQL, Rails, KVM, ..., you get the idea. Any technology that has seen wide use has caused someone to be woken up at 2am because of an outage. NFS has been very widely used for a very long time. As a distributed file system developer who once helped design a precursor of pNFS I think NFS has some pretty fundamental problems, but the fact that NFS servers sometimes go down is not one of them. Often that's more to do with the implementation and/or deployment than the protocol, and no functionally similar protocol would do much better under similar circumstances. People get woken up at 2am because of SMB failures too. My brother used to get woken up at 2am because of RFS failures. Nobody gets woken up at 2am because of 9p failures, but if 9p ever grew up enough to be deployed in environments with people on call I'm sure they'd lose sleep too. EBS failures have bitten more than a few people.
Citing the existence of failures, other than proportionally to usage, isn't very convincing. I'd actually be more concerned about the technology on the back end of EFS, not the protocol used on the front.
It works, more than usually but the problem starts whenever it feels like not working. And you can never be sure whenever the tantrum will come.
With linux while using the nfs-kernel-server most of the time you need to restart if a problem occurs. And I don't like restarts.
A few years ago, slabtop helped me to troubleshoot some memory issues, turned out nfs-kernel-server was leaking. I had to upgrade the kernel.
The shame is, there is nothing like nfs to replace nfs. Easy to deploy, easy for clients, works everywhere.
As the rest of the comments also allude, a lot of the cloud-entrenched world has abandoned NFS, or at least in AWS circles.
I'm not one of these people. Rather than relying solely on puppet->all instances to handle multiple deploys, the convenience of an NFS instance was appealing, so essentially I have a relatively small (read: medium) instance that does nothing but manage the deployment filesystems and allow new in-security groups to connect. I have often though of abandoning this for doing deploys in a more "modern" way but I'm still not sure what the benefit would be other than eliminating a minor source of shame.
The analog here is running a MySQL or Postgres database in an instance prior to RDS. RDS provided enough benefit that the minor price difference in rolling your own no longer factored in. A more reliable, fault-tolerant and extensible file system is, like RDS, a huge upgrade. It may not be for everyone, but for some of us it's just another reason why AWS keeps making it hard to even look anywhere else.
The right way to do that (in AWS world) would be RDS accessible to all instances in the security group. This yields locking control to the application level and not the file system. For obvious reasons this makes a lot of sense.
There are of course non-ACID / NoSQL solutions for which this might be an acceptable practice, but in general I'd say it's fraught with peril.
This is a step back from the right direction to me.
(1) Not all development is green-field and thus subject to using this week's fashionable data store. A lot of applications already use file systems and rely on their semantics. Ripping up the entire storage layer of a large and complex application would be irresponsible, as it's likely to create more problems than it solves. Look down your nose at "legacy code" all you want, but those users exist and have money to give people who will help them solve their business problems. Often the solutions are as cutting-edge as anything you'd find in more fashionable quarters, even if the facade is old fashioned.
(2) Even for green-field development, file systems are often a better fit than object stores. Nested directories are more than just an organizational scheme, but also support security in a way that object stores don't. What's the equivalent of SELinux for data stored in S3 or Swift? There is none. File system consistency/durability guarantees can be important, as can single-byte writes to the middle of a file, while parsing HTTP headers on every request drops a lot of functionality and performance on the floor. Distributed databases are much more compelling, but the fact remains that the file system model is often the best one even for new applications.
Go ahead and use something else if it suits you. Other people wouldn't have much use for your favorite storage model, and this one suits them perfectly.
Answers following your numbering:
(1) Calling S3 "this week's fashionable data store" is like saying that an elephant is an interesting microbe. The rest of your points about "please do not innovate, we have filesystems for 40 years and this is how you store your data". I do not agree. Disclaimer, I was member of the team that moved amazon.com from using an NFS based storage to S3. It was a great success, and it solved many of our problems including dealing with the insane amount of issues that was introduced by running an NFS cluster at that scale. And I would like to emphasize on scale, because your operational problems are quite often increase worse than linear with you scale.
I know about legacy code, and running several legacy services in production as of now. I can tell you one thing. There is a point when it is not financially viable to keep rolling with the legacy code. This point is very different based on your actual use case, banks tend to run "legacy code" while web2.0 companies tend to innovate and replace systems with a faster peace. I don't see any conflict here. We even did a compatibility layer for the new solution and it was possible to run your legacy code using the new system and your software was untouched.
(2) Nested directories is a logical layer on the top of how the data is stored, aka a view, your are a distributed FS developer so I guess you understand that. S3 also supports nested directories no biggie here.
Security. Well this is kind of weird because last time I checked S3 had an extensive security http://aws.amazon.com/s3/faqs/#security_anchor
Now the rest of your question can be re-phrased: "I am used to X why isn't there X with this new thing???" I am not sure how many of the file system users use SElinux my educated guess is that roughly around 1-10%. It is a very complex system that not too many companies invest using. For our use cases the fine grained ACLs were good enough so we are using those. File system durability: yes it is very important, this why I was kind of shocked about this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/...
I guess you are right about the overhead of reading and writing, dealing with http headers etc. If the systems that benefit the most from S3 where single node systems it would be silly to use S3 at the first place. We are talking about 1000 - 10000 computers using the same data storage layer. And you can tell me if I am wrong but if you would like to access the same files on these nodes using a FS than you are going to end up with a locking hell. This is why modern software that is IO heavy moved away from in-place edits towards the "lock free" data access. Look at the implementation of Kafka log files or how Aeron writes files. This is exactly the same schematics how use use S3. Accident? ;)
I would like to repeat my original question: I don't see huge market for a distributed FS. I might be wrong, but this is how I see it.
Please don't put words in my mouth like that. It's damn rude. I never said anything that was even close.
"S3 also supports nested directories no biggie here."
Not according to the API documentation I've seen. There are buckets, and there are objects within buckets. Nothing about buckets within buckets. Sure, there are umpteen different ways to simulate nested directories using various naming conventions recognized by an access library, but there's no standard and thus no compatibility. You also lose some of the benefits of true nested directories, such as combining permissions across different levels of the hierarchy. Also no links (hard or soft) which many people find useful, etc. Your claim here is misleading at best.
"last time I checked S3 had an extensive security"
Yes, it has its very own permissions system, fundamentally incompatible with any other and quite clunky to use. That still doesn't answer the question of how you'd do anything like SELinux with it.
"File system durability: yes it is very important, this why I was kind of shocked about this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781/...
Open up your bug list and we can have that conversation. Throwing stones from behind a proprietary wall is despicable.
"you can tell me if I am wrong but if you would like to access the same files on these nodes using a FS than you are going to end up with a locking hell."
You're wrong. Maybe you've only read about distributed file systems (or databases which have to deal with similar problems) from >15 years ago, but things have changed a bit since then. In fact, if you were at Amazon you might have heard of a little thing called Dynamo which was part of that evolution. Modern distributed systems, including distributed file systems, don't have that locking hell. That's just FUD.
"I don't see huge market for a distributed FS."
Might want to tell that to the EFS team. Let me know how that goes. In fact you might be right, but whether there's a market has little to do with your pseudo-technical objections. Many technologies are considered uncool long before they cease being useful.
If they are just static assets, you would do better to put them in an S3 bucket, set appropriate Cache-Control headers and serve them via Cloudfront. This reduces your outbound bandwidth cost to the Internet versus EC2/S3, and yields better performance.
EFS first and foremost will take away a huge amount of pain when trying to make filesystem-dependent legacy applications more reliable.
upstream issue: https://www.drupal.org/node/2044509
We use it to store petabytes of large video files and our system is structured such that no folder ever has more than a couple files in it (>20 is rare). With properly tuned caching this works fantastically well for our use case and I would take the simpler code and reduced points of failure over NFS nonsense any day.
That of course doesn't mean s3fs is the solution to every problem, it simply means it's good to have options and don't write something off because it "might be slow."
Know your data, know your use case, and know your tools. You can make smart decisions on your own rather than driven by anecdotal comments on HN.
For getting the data into S3, we found exponential improvements in using the AWS CLI, as I believe it handles uploads in a multi-threaded way.
S3fs turned out to be viable for our use case, storing Magento Enterprise content assets which are then served directly from S3, so the app's upload features rely on s3fs as well as the file checks from the app itself (which are indeed quite slow).
I've always wanted to do it natively, mounting EBS volumes on more than one instance (which is not currently possible) or wishing for a native NFS service like AWS released.
All in all, it is a happy day for me. More options make us more powerful.
The fuse-based ones that I've tried were ridden with problems and poor error handling. Hangs and truncated files were the rule rather than the exception.
> Q: What data consistency model does Amazon S3 employ?
> Amazon S3 buckets in the US Standard region provide eventual consistency. Amazon S3 buckets in all other regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
> Finally, note that, for NFS version 3 protocol requests, a subsequent commit request from the NFS client at file close time, or at fsync() time, will force the server to write any previously unwritten data/metadata to the disk, and the server will not reply to the client until this has been completed, as long as sync behavior is followed. If async is used, the commit is essentially a no-op, since the server once again lies to the client, telling the client that the data has been sent to stable storage. This again exposes the client and server to data corruption, since cached data may be discarded on the client due to its belief that the server now has the data maintained in stable storage.
I am not certain how this works in NFSv4 which is what EFS will be. The safe solution is to use the sync option for mounting the NFS volume, at the cost of performance.
If anything, their use of NFSv4 means there are plenty of competitive offerings if you decide that performance, security, or physical access constraints dictate migrating off their service.
If you don't want to manage your own Linux/BSD/etc. NFS infrastructure, Oracle, Netapp, and EMC will all happily sell you a storage appliance that supports it. I don't see much lock-in here.
I can see this being useful for a few cases. For one, I can immediately use it for one of the projects I have where I have multiple worker servers and one of them needs to periodically process a few GB's of data, yet I don't want to give that much storage to every sever, and I don't want to make any one server special.
Another use case: you don't know how big your data will grow, yet you want to access it in a random fashion. S3 isn't great for this, but NFS is, and unlimited mounted storage is nice.
Edit: Third use case is logs. You can collect all the logs from all of your servers in one place and access them from any server.
The rest gets updated on a dev server then deployed to the auto-scaling group via an AMI.
And you have to resize your block devices to store more stuff.
Which is why new services like these are always preferable.
Maybe you don't want to, but there is definitely someone out there dealing with these issues.
E.g. during a heat-wave (100 F+) a transformer on top of the building (at a previous employer) started on fire. When the dust settled, we found out that the person in charge of it had not upgraded it as our power requirements increased. It was over-taxed and the heat-wave put it over the edge.
Infrastructure guy here. Abstraction is to reduce workload; you still need to understand the underlying concepts. Otherwise, you're just the guy who freaks out when their DB is at 100% CPU utilization or hours of replica lag without knowing why.
Until now, you could bind an ebs to only one instance.
Or had to use gluster / hdfs otherwise.
Multiple Amazon EC2 instances can access an Amazon EFS file system
at the same time, providing a common data source for
workloads and applications running on more than one instance.
Or is this a feature you need to enable, and you keep using EBS otherwise?
/home thats on a network drive with an SLA....
concurrent versions of your app sitting next to each other somewhere in $PATH meaning you can roll forward/back by typing app-$version (or what ever your convention is)
Quickly and efficiently share files between instances.
It basically means you can treat a node group as single proper linux cluster, without having to buy GPFS licenses or indulge in the horror that is glusterfs.
Imagine...if you could put a whole entire clock...on your wrist. Almost everybody has a wrist!
But you're right, I will likely get an iWatch. It will go in my QA toybox with my macbooks and ipads and iphones.
Do you remember computers before the Mac? Notebooks before the Macbook? Music players before the iPod? Phones before the iPhone? Tablets before the iPad? And in a month you'll think: watches before the Apple Watch?
Their strength lies in taking existing technologies, rebuilding them with a strong user focus, and then marketing the hell out of them. So much so that many people apparently forget what came before.
Don't get me wrong; those are all very polished products, and they took a lot of technical smarts. They were much more usable to a mainstream consumer audience. But the non-Apple versions of those products were generally fine for non-consumer audiences. And Apple's marketing is masterful; I've never seen a tech company so good at generating hype.
My phone before the iPhone was horrible, though. Partly that was my fault for getting a Razr. The user interface somehow managed it so that my thumb was always in exactly the wrong spot for what I needed to push right now. It was a joyous day when I found someone to give that thing away to. (On the good side, it woke up two years after I stopped using it, plugging it in, etc. to give me an alarm I had set. Now that is a reliability!)
Yes, I remember the iPod nano