
AWS S3 open source alternative written in Go - krishnasrinivas
https://minio.io
======
Ixiaus
Or, run _Riak_ with their S3 compatibility layer. Riak is extremely stable and
the work Basho has done to make a truly robust distributed database is
significant.

[http://docs.basho.com/riak/cs/2.1.1/](http://docs.basho.com/riak/cs/2.1.1/)

~~~
viraptor
Other alternatives:

ceph -
[http://docs.ceph.com/docs/master/radosgw/s3/](http://docs.ceph.com/docs/master/radosgw/s3/)

swift -
[https://wiki.openstack.org/wiki/Swift/APIFeatureComparison#A...](https://wiki.openstack.org/wiki/Swift/APIFeatureComparison#Amazon_S3_REST_API_Compatability)

~~~
dc2447
CEPH is a volume service not an object storage service.

SWIFT is indeed analogous to S3.

~~~
viraptor
Come on, I literally linked to a website describing "CEPH OBJECT GATEWAY S3
API"

------
davidu
Theory here is that people will build apps that talk to S3. But sometimes
those apps might need to run inside the perimeter and can't talk to the cloud.
So rather than rewrite an app to talk to a new internal datastore, you just
point it at a locally hosted Minio and you're up and running.

Smart.

~~~
notyourwork
What kind of situations do you see this becoming a factor in? 5 or 10 years
ago this was an issue with early cloud adopters. Now a days cloud providers
are ramping up their DCs to be compliant and allow companies/government
entities with strict policies to still onboard.

Its a good strategy but not one that I see being exercised frequently enough.

~~~
extrapickles
The software I work on is targeted towards customers who generally have really
spotty internet connections (eg: they all are in the less forgiving parts of
the ocean or middle of nowhere if on land). This pretty much mandates using
software like this to build out your app as you can't rely on internet
connectivity.

There pretty much isn't anything you can do to improve their internet
connections as cables to remote places are always getting dug up with week+
times to repair so you need something that can run locally for long periods.
Ships have a different problem with very slow speeds that effectively means
you can only transmit the absolute minimum off the ship when its out as sea
(when they are at port they typically have normal internet connections to bulk
dump data off on).

------
hhandoko
I switched from Fake S3 [1] to Minio for local development. Fast and
lightweight, good experience so far :)

Easy to setup with Vagrant, and linking / sharing the Minio shared folder to
the host makes it quite convenient to quickly check the files without going to
the UI [2].

[1] - [https://github.com/jubos/fake-s3](https://github.com/jubos/fake-s3)

[2] - It stores the files as-is in the local filesystem (files in folders,
unchanged), as opposed to having it 'wrapped' like Fake S3 does.

------
krishnasrinivas
Minio will always be 100% free software / open source. We have no plans to add
any proprietary extensions or hold back on features for paying customers only.
-- Minio Team

~~~
cyphar
Then why not make the license AGPLv3-or-later, to avoid other people creating
proprietary forks? I get that it's not a common occurence within the Golang
world, but nothing will change unless more Golang projects start making their
code copylefted.

~~~
y4m4b4
GNU AGPL is an ideal license for free software projects. We are a strong
supporter of the GNU project. We chose Apache License for Minio purely for
adoption reasons. Most of our users build proprietary software around Minio
and their legal council has a default NO policy towards GNU licenses. Besides,
FSF has also approved Apache License v2 as a free software license.

Proprietary forks are OK with us. It will be too expensive to maintain
branches of their own and catch up with the upstream.

~~~
cookiengineer
> It will be too expensive to maintain branches of their own and catch up with
> the upstream.

Haha, you guys are awesome! You've totally figured it out. Stay awesome!

------
bjoerns
After evaluating a couple of options mentioned in the other comments here, we
recently replaced our in-house built s3 clone with minio for our on-prem
version of our app. Very robust and stable.

~~~
matt_wulfeck
Keep in mind that there are plenty of object stores that are robust and stable
until you put 1 billion keys in them.

~~~
bjoerns
That's a very good point - but for what we do (on-premise version control for
Excel where each workbook version represents one object) we won't be getting
even close to that number. But yes, agreed, it entirely depends on your use
case.

~~~
matt_wulfeck
If you don't mind me asking, why not use Amazon S3? It's cheap and --
importantly -- somebody else is on-call for its uptime.

~~~
abakker
I'm going to guess that it is part of the "on premises superstition" where
companies feel that the stuff they own is more secure somehow. Obviously, this
is not often true in practice, but 5 years of research/consulting has taught
me that they feel this way all the same.

Occasionally, there are laws that also mandate certain controls that cloud
providers in general did not have. That is also becoming rarer as time goes
on.

~~~
brianwawok
There really are pros to on prem.

There are cons, don't get me wrong, but to somehow claim that AWS is the end
all be all of hosting choices is demonstrably wrong.

For example - You want to develop a financial exchange with a 100 microsecond
average response time, peaks of 10Gbit traffic, and 5 9s of uptime. Do you
host that on AWS? I wouldn't.

Another example - If I were a medium+ sized company (say 20+ employees), I
would want my source control 100% on prem (excluding backup). Internet
connections are too flakey, and Github gets DDOSed too often. I could not
stake my entire business on github.

~~~
breakingcups
I'll give you another pro, customization. Our on-prem Jira instance can have
add-ons and changes that aren't allowed in the cloud-hosted version.

~~~
killbrad
That's the Atlassian SaaS offering. You could still run it in any cloud
yourself and get all the customizations you want.

~~~
takeda
but then voiding the argument that was stated initially i.e. no one needs to
maintain it and be on call for it

------
fizzbatter
Does this have the ability to mirror to an encrypted remote? I'm looking for
something like this for a simple home storage server, but emphasis on being
able to replicate to something like B2 Storage for cheap backup.

Currently Infinit.sh has my attention the most, but it's quite young still.

 _edit_ :
[https://news.ycombinator.com/item?id=12125344](https://news.ycombinator.com/item?id=12125344)
this thread seems to be talking about what i want. With that said, i'm not yet
sure if `mc mirror` supports Backblaze, as that (per price point) is my prime
need

~~~
rsync
Current opinion is that "borg" is the holy grail of backup schemes ... it
takes attic, which fixed all of the duplicity shortcomings, and improved on
that ... [1]

We[2][3] tend to agree with that.

One reason it might not work for you is that we are an order of magnitude more
expensive than B2, so perhaps that's a better bet for you. On the other hand,
$7.20 per year for our smallest borg account is almost as close to zero as
your B2 minimum order would be, so ... who knows.

One upside of choosing our service is that you can choose your location (US,
Zurich, HK, etc.)

[1] [https://www.stavros.io/posts/holy-grail-
backups/](https://www.stavros.io/posts/holy-grail-backups/)

[2] rsync.net

[3]
[http://www.rsync.net/products/attic.html](http://www.rsync.net/products/attic.html)

~~~
RubyPinch
from [3]

> If you're not sure what this means, our product is Not For You.

Please don't do that, its childish and unimpressive.

~~~
corobo
There's no support for that service. Makes sense to ward people off who might
need support at the headline.

~~~
rsync
Just to be clear, there is no support for the deeply discounted borg/attic
accounts at rsync.net.

Regular rsync.net accounts have full, unlimited support provided by a US-based
engineer. As in, an honest to god unix engineer. Sometimes, but rarely, me.

------
frugalmail
The canonical open source alternative to S3
[https://wiki.openstack.org/wiki/Swift](https://wiki.openstack.org/wiki/Swift)

~~~
hansjorg
Riak CS is another one:

[https://github.com/basho/riak_cs](https://github.com/basho/riak_cs)

~~~
ranman
Ran this in production and dealt with a lot of issues. I would caution people
against it's use in anything critical or customer facing.

~~~
hashin
Could you please elaborate it? What were the issues you were facing?

~~~
shinydevops
As another user with nothing but negative experiences with Riak-CS in
production, I thought I'd take a stab here. We had a 12-node cluster with
~10TB per node, fwiw. In no particular order:

\- The restart times of the Riak process ranged from 10 minutes to 3+ hours,
during which time the cluster was basically useless. Not a single suggestion
from support sped up this process.

\- Every single night from 0800 - 0900 UTC, the cluster would grind to a halt
(as measured by canaries measuring upload/download cycle times). This
continued even after we migrated all customer data and traffic off of the
cluster.

\- Riak-CS ships with garbage collection disabled despite it being a critical
feature. I inherited a cluster that had been run for some months without gc
enabled. Turning it on caused the cluster to catastrophically fail. Basho
Support, over a period of close to a year, was unable to find a single
solution that would get our cluster back to health. If our cluster were a
house on a show like Hoarders, the garbage in it would be considered load
bearing.

\- We attempted to upgrade our way out of our un-garbage-collect-able mess,
but the transfer crashed. Every. Single. Time.

\- Even had transfers worked, all of the bloated manifests have to be copied
in their entirety, so you can't gc the incoming data on the new cluster.

\- Even while babying the cluster, it would become unusable at least once a
month, requiring a restart of all nodes. The slowest node took 3+ hours to
start, followed by another 3+ hours of transferring data. This was 6+ hours of
system downtime every month.

\- During these monthly episodes, we attempted to engage with support and try
to debug the processes (we were a team of seasoned Erlang developers). We
could attach Observer and/or use the REPL to grab stats, but not a single
support resource was able or willing to engage.

\- For giggles, once we had migrated all users off of the cluster, we
attempted to let gc run. It never completed. Not once. We let this go on for a
few months before nuking the entire cluster.

Now, I absolutely realize that we got ourselves into that mess by running the
cluster without gc for an extended period. But in the grand scheme of things,
this cluster wasn't storing a very large amount of data -- tens of TB spread
over tens of millions of objects. Having the cluster get into a state where gc
can never run and where this causes snowballing instability is unacceptable.

We switched to Ceph. We've never looked back.

------
cdnsteve
Practical use case:

\- Spin up a bunch of droplets on DigitalOcean, because I want reliability,
etc.

\- What's the best way to share drive space across these to create a single
Minio storage volume, so if one DO node goes away I don't lose my stuff?

~~~
krishnasrinivas
We are working on distributed minio
[https://github.com/minio/minio/tree/distributed](https://github.com/minio/minio/tree/distributed)

The minio available today for production use can export single disk or
aggregate multiple disks on the same machine using erasure coding.

For this, if you want backup you can use github.com/minio/mc tool to mirror,
more help here [https://docs.minio.io/docs/minio-client-complete-
guide#mirro...](https://docs.minio.io/docs/minio-client-complete-guide#mirror)

~~~
killbrad
I think this should be made clear on your site. I spent a good amount of time
trying to figure out how to actually get this to be distributed, but the
answer is - you don't. So it's only like S3 in interface, not in durability or
availability.

------
bryanlarsen
minio works awesome for dev & test deployments. It's dead simple to set up,
just a single executable. Hopefully it doesn't lose that simplicity as it
grows up and gains features.

~~~
tbrock
It's a go binary, that's just how they work.

------
Keyframe
Sorry for two posts (the other one was unrelated). If anyone has experience
with this I have a few questions regarding a particular use case.

How does something like this behave with really large files. Video files in
100s of gigabytes, for example. I'm asking because if one could set up a
resilient online (online as in available) storage with fat pipes like this it
could be used as a platform to build a centralized video hub for editing. It's
another question how much sense would it make over a filesystem though.

~~~
klodolph
I think these days we should by default think of storing blobs of data (like
video files) in storage systems like S3 or the alternatives, and that ordinary
filesystems should be thought of as a special case where you want to attach
storage to an individual computer.

Edit: I'm going to elaborate, because people are calling me naïve. Full
disclosure: I work at a cloud provider on a storage team.

For most people and applications, you simply don't get good value for your
money by using filesystems and hard drives directly. We've tried to make
things more reliable and durable with backup policies, RAID, and ZFS but the
fact is all of these things come with operational and capital expenditures
that compare unfavorably with common cloud storage options. There are some
good technical reasons why cloud storage is better: basically technologies
like RAID and ZFS are attempts to make each layer of your storage stack
completely durable and available, but this approach is not competitive with
the way cloud storage is typically implemented, which is to build a reliable
distributed service on top of cheap hardware. Consider RAID 1, for example.
This gives you N+1 redundancy at the drive level for an individual computer.
This worked in the 1990s but drives are bigger and RAID failure modes suck
with larger drives—it's worrying how common it is to see errors when
rebuilding a degraded RAID array, and at N+1 that means that your data is lost
from that computer. Essentially, with modern drive sizes (4+ TB seems pretty
common these days) a RAID 1 array should always be considered N+0 instead of
N+1.

Cloud storage is implemented much more intelligently. If you have distributed
storage, you can simply spread files across computers in different DCs and use
error correction codes to increase the redundancy. You can get more nines of
durability and availability for less money this way. You end up with something
like 33% overhead on disk space instead of 300% overhead, and you're also off
the hook for a big chunk of your capacity planning and various other
operational expenditures.

These days I would consider starting from "this file is in cloud storage, and
we have a local cache" rather than "this file is in local storage, but we have
a cloud backup". That's really all I'm saying.

It also won't _always_ be competitive. Sometimes cloud storage is more
expensive than regular filesystems, depending on how you're using it. If
you're a big company you can sometimes amortize the costs of doing it yourself
better. That's all I mean by "default"—I'm going to put my data in cloud
storage unless I have a compelling reason to store it some other way.

~~~
mi100hael
That's awfully naive, especially for tasks like video editing that are
significantly impacted by disk read/write speeds. Even a NAS on a gigabit
network is going to be roughly 6x slower than a standard internal SATA III
spinning disk.

~~~
klodolph
I said "by default", the implication being that you'd do something else if
your application needs it. But it's much easier from an operational
perspective if you start with a reliable system (replicated, networked
storage) and cache locally for speed, then to try and make local filesystems
reliable and durable.

------
zx2c4
Their CLI client is called `mc`. This is an unfortunate conflict with the
venerable Midnight Commander.

------
andrewchambers
I love the website. I'm a lone developer who doesn't know any HTML, how would
I go about getting such a nice design for my own projects? (Or how much would
it cost)

~~~
zbuttram
Wappalyzer ([https://wappalyzer.com/](https://wappalyzer.com/)) tells me
they're using Bootstrap ([http://getbootstrap.com/](http://getbootstrap.com/))
(probably customized a bit). HTML isn't very difficult (just another markup
language) and if you're not inclined toward design (I am also not) there are a
plethora of CSS frameworks to choose from (like Bootstrap) that will get you
up and running with something not completely ugly. Personally I like Bulma
([http://bulma.io/](http://bulma.io/)) right now which showed up (I think as a
Show HN) on here a while back. Currently using it for a project and I'm
enjoying it.

~~~
andrewchambers
Really my design sense isn't great, given time I can hack together something
with bootstrap, but I do think I lack the designer training and probably
instincts

~~~
zbuttram
Same here. I recommend making friends with some designers or looking around at
pre-customized versions of Bootstrap. I also spent some time looking for this:
[http://jgthms.com/web-design-in-4-minutes/](http://jgthms.com/web-design-
in-4-minutes/) One of my favorite sites for this type of conversation.

------
jedisct1
Or run LeoFS [http://leo-project.net/leofs/](http://leo-project.net/leofs/)

------
Keyframe
Unrelated question. What's the point of fullscreen button on those term
session players (or whatever they are) if it doesn't stretch the playback to
fullscreen? You only get a same-sized screen with black around it. It's not
even centered to the screen.

~~~
eknkc
I guess it is [https://asciinema.org](https://asciinema.org) but their samples
have centered full screen. Maybe a CSS issue here.

I'm not sure about the point either. Maybe if you embedded a small player it
would be zoomed out and fullscreen would show the native style.

~~~
jdc0589
all my brain sees in the domain name is "ascii enema"

------
nulagrithom
Is this just meant to emulate S3 for the sake of dev/test environments?
Without clustering/HA I don't really see the point of using this over the
plain old file system. Or am I missing something?

~~~
krishnasrinivas
Absolutely, our focus currently is on multi-server minio which is being
actively developed on the "distributed" branch
[https://github.com/minio/minio/tree/distributed](https://github.com/minio/minio/tree/distributed)

Our current stable version can export single disk or multiple disks (using
erasure coding providing protection against disk failures) As it is very easy
to get started with (single binary, thanks to Go) people find it attractive
for dev/test environments.

To replicate for HA (even for the single server version), use "mc mirror
-watch SOURCE TARGET" command to pair them up. If you have multiple drives
(JBOD), you can eliminate RAID or ZFS and use Minio's erasure code to pool
them up. Distributed version is also in dev/testing at the moment. It should
be out in a month.

------
olalonde
Previous discussion:
[https://news.ycombinator.com/item?id=12122998](https://news.ycombinator.com/item?id=12122998)

------
helper
How easy is it to embed this into go tests? Right now I use goamz/s3test for
that, but it has a lot of limitations.

~~~
y4m4b4
Quite easy actually you can look at

[https://github.com/restic/restic/blob/master/run_integration...](https://github.com/restic/restic/blob/master/run_integration_tests.go)

~~~
helper
I don't want to run it in an external process, I want to run it in a
goroutine.

~~~
y4m4b4
For that you can just do

```

package main

import minio "github.com/minio/minio/cmd"

func main() {

    
    
            go minio.Main()
    
            ... do your stuff ...
    

}

```

------
scoopr
So, I can use midnight commander as the client? ;) (half joking, half serious)

------
unboxed_type
Why is it so important what language it is written in? :-)

------
LoSboccacc
couldn't find at a glance wheter it has the same read after write issue of s3,
or in general what the consistency is.

also, failure and backup modes.

~~~
kparthas
Minio server provides read-after-write consistency. For fault-tolerance, *
protection against failed disks, you could deploy Minio erasure code setup.
ref: [https://docs.minio.io/docs/minio-erasure-code-quickstart-
gui...](https://docs.minio.io/docs/minio-erasure-code-quickstart-guide)

* Minio erasure code setup also provides protection against "bit-rot".

------
muminoff
Do you guys have plans with multi-tenancy feature?

~~~
koolhead17
Absolutely, we are working on it. Please visit our "distributed" branch
[https://github.com/minio/minio/tree/distributed](https://github.com/minio/minio/tree/distributed)

------
anonymous7777
ok tired of people bragging about "Go". It underperforms than many GC based
languages that are out there.

~~~
RubyPinch
Generally, if you comment less about Go, then you end up in less discussions
about it

------
beastman82
written in Go - Does this matter?

~~~
mrweasel
Yes and no, if you're in the market for an S3 clone, but want to be able to
add features, fix bug or hack on it in some way, it nice to know which
language it's being developed in.

As you can tell from the other comments, there's plenty of alternatives to
pick from, and if you're going to dive in to the code yourself the language
may be a deciding factors.

~~~
unboxed_type
It is important, because you will not find any Go-developers on the market, so
if you are serious about using it then think twice ;)

