It's an object storage engine (think S3, but it's open source and you can put it in your own data center) that's excellent at storing unstructured data.
It's completely deployable and usable without any other OpenStack projects.
There's S3 API compatibility for it. It supports globally distributed clusters. It supports multiple storage polices that can be either replicated or use erasure coding. It's designed for very high availability, very high durability, and high aggregate throughput.
One of my favorite features is being able to create sharable, expiring signed URLs to any object in the cluster.
Some of the common uses for Swift include storing user-generated content (eg images, videos, game saves), static web assets, movies, scientific data sets, backups, document sharing, VM and container images, etc.
Vagrant All-In-One setup:
Come say hi!
- #openstack-swift on freenode IRC (I'm notmyname)
Where's your docker file and "migrate from S3 to OpenStack Swift in 20 minutes or less tutorial"?
We're looking at minio/riak and now you as alternatives to S3.
I use IPFS. IPFS is great for sharing multi-gigabyte size files between machines in a cluster, bit-torrent style. In my case it is a couple of hundred Amazon spot instances that can come and go very fast and need to get the data ASAP to start some calculation, the same data for all nodes.
- Ceph: Very flexible. Supports many different kinds of replication. Has high overhead compared to local disk (on the order of ~50%) and was (for me) prone to hard to diagnose issues. Can be annoying to setup if you're not doing it on a supported Linux distro with ceph-deploy. It looks like Bluestore (a new on-disk format for data) will significantly improve performance but Bluestore is extremely RAM hungry.
- GlusterFS: Much faster than Ceph but less flexible. Has odd requirements about "bricks" being the same size. Much less RAM hungry than Ceph.
- A bunch of smaller ones I can't recall. Mostly discarded because they performed badly or lacked replication options (I really wanted erasure coding).
In the end I'm simply sharding my data manually. It's not as scalable but it's much more performant.
edit - this is distributed block storage -- if you
just need object storage, perhaps something else is in order.
I actually found it better than Gluster in terms of robustness and performance. It's got support for multiple masters, failover and a nice dashboard.
One of those projects that should be much more well known than it is given there are few open source distributed storage solutions. The MFS devs are good but maybe lack the marketing savvy or perhaps just happy where they are.
I think in production I've used NFS, DRBD, GlusterFS and OpenStack. Each has their pros and cons, and without a precise set of constraints it's hard to know what how to usefully answer any question of the form "Which would you recommend? Why would you choose this?"
Distributed storage tends to be required either because you want redundancy, availability, or because your "stuff" is too large for a single box to host. But with a vague question it could mean "How do you backup boxes?" or something entirely different. (For example "distributed storage" could end up mapping to a pair of MySQL hosts, or a replicated PSQL database..)
- Personal files, stuff I can not afford to lose (photos,documents etc.) - Full archive on S3, full archive on a home server, 4 clients with partial copies.
- Big data stuff I can afford to lose (VM images, media files etc.) - Around 6 TB, each file has two copies split between 5 hard drives on home server and Hubic.