
SeaweedFS – A simple and highly scalable distributed file system - daenney
https://github.com/chrislusf/seaweedfs
======
chrislusf
I am Chris the author. btw: This is not a new project. It has been around
since 2011. Many people tested it and contributed on it. It may not be the
tool for everyone. But I guess many companies will need a key~file store.

There is also a "Filer" component.

This is a side project. Welcome any VC to invest in this.

------
lovelearning
I was wondering why so many distributed file systems - all claiming high
performance / availability / scalability - are being implemented from scratch,
when mature ones like Gluster and Ceph have been around from years. Why start
new ones instead of improving existing ones? My guess is that these new infra
software like SeaweedFS and LeoFS are created mainly to implement them in new
languages like Go, Rust and Erlang. I'm not saying it's a bad thing, just that
it's one possible reason so many alternatives are being implemented from
scratch.

~~~
jdcarter
I'm reading between the lines here, and I'd love for Chris Lu (the SeaweedFS
author) to chime in here and correct me, but it _seems_ like Chris read
Facebook's Haystack paper and decided to implement it himself in Go. Kind of
how Riak started as "let's take the DynamoDB paper and implement it in
Erlang."

The README mentions that Seaweed is much easier to set up than Ceph. The more
apt comparison would be the RADOS component of Ceph--both Seaweed and RADOS
are key/value oriented. I've now set up both, and for sure Seaweed is easier
to get going.

I have a fair bit of experience with RADOS (all quite positive) but I really
do appreciate the ease of use Seaweed presents. I've only been using it for an
hour or so, just doing benchmarks and walking through fault scenarios, and so
far Seaweed looks quite good!

FWIW, I don't think this should be called a filesystem at all. Seaweed isn't a
filesystem which forsakes some aspects of POSIX, it's nothing like a
filesystem. The more proper term would be "distributed object store."

~~~
jusob
You are correct, it is not really a file system. It is used with the HTTP API
(upload, download, etc.).

------
jusob
I've been using SeaweedFS (previously WeedFS) in production for a couple of
years. I needed to share files on a LAN, including small VPS, so distributed
file systems(like GlusterFS) that require a kernel module did not work for me.

I store small files. Random access with WeedFS is very constant and fast. If
you use multiple volume servers, the ID is unique across all servers. You get
redirected to the right volume if you try to access an ID on the wrong server,
which does add latency.

My biggest complain is about replication. You can define rules, like replicate
on the same rack or different rack, how many copies, etc. But it SeaweedFS
fails to replicate a file, it does not store it at all. I would prefer a best-
effort duplication rather than a guarantee. This is a big issue when master
synchronization fails for some reason and only 1 volume is "visible" for some
period of time.

~~~
jdcarter
Can you tell us more about your production environment? Like how much data,
average IOPS, and uptime requirements? I've been poking around and I can't
find any real info on who's using this or how. The project's interesting and
I'd love to hear more.

~~~
jusob
I use it to store many small files (JSON and images). My requirements were:

1\. predictable read time 2\. read/write access through LAN 3\. server easily
installable on VPS 4\. simple API

The main downsides I found so far:

1\. Master splitting are very hard to recover from. They happen rarely but
when it does, be ready to restart master many times in random order. Latest
version seems to have fixed it 2\. No easy way to list the content of a
volume. You need to use the command line on the volume files 3\. Cleanup is
hard (compacting the volume files), I fee that the GarbageThreshold is not
working well 4\. Hard to resize the volumes (i.e. set lower limits) after it
has been started 5\. Replication, as I mentionned in another comment

Overall it works fine for me. The potential alternative (not fully tested) was
Cassandar (the 64MB limit per object when I checked is fine with me).

------
tw04
Call me skeptical:

>When testing read performance on SeaweedFS, it basically becomes performance
test your hard drive's random read speed. Hard Drive usually get
100MB/s~200MB/s.

There are no modern hard drives that get 100-200MB/sec on _RANDOM_ reads.
Random ends up being an exercise in IOPS not throughput, and there's
absolutely no way you're pulling enough IOPS out of a SAS or SATA hard drive
to get anywhere NEAR 100-200MB/sec on random workloads. I get the "it's the
software dummy" thing everyone is so hyped about, but if you know _THAT_
little about the underlying hardware and how it functions, you shouldn't be
writing filesystems.

Also - what's the protection mechanism for files? If there is none, I
literally have no idea what the purpose of this FS is. Under what
circumstances would I need a giant slow data storage tier that has no
redundancy to speak of?

~~~
chrislusf
Sorry for my wording, if it is misleading. I am only saying the SeaweedFS code
does not try to do anything unnecessary, so you can extract most of the
performance juice out of your hard drive or SSD.

------
tyingq
Distributed KV blob database seems more apt. I don't see anything here
resembling a filesystem.

------
vog
How does this compare with the already popular distributed file systems IPFS?

~~~
sabujp
This isn't truly distributed in that sense:

    
    
        "Instead of managing chunks, SeaweedFS manages data
        volumes in the master server. Each data volume is size
        32GB, and can hold a lot of files. And each storage node
        can have many data volumes. So the master node only needs
        to store the metadata about the volumes, which is fairly
        small amount of data and is generally stable."

~~~
the_duke
So there's no replication?

~~~
chrislusf
There is. I am not following your reasoning.

~~~
sabujp
i think he's asking if there's replication for the data stored on the master
in case the master goes down, i.e. can there be multiple masters?

~~~
jusob
You can (should) have multiple masters. But each volume has a primary master.
I did see other masters loose the volume state information after the primary
master has gone down and been taken out of the cluster.

~~~
chrislusf
Good observation. I will make the recovery much faster with gRPC calls, which
can detect broken network much faster than http heartbeats.

------
stuckagain
Seems to gloss over the details. Who is grooming the replication in case a
volume replica disappears? What happens when a client is writing a replicated
file and crashes in the middle?

~~~
chrislusf
If one volume replica disappears, the whole set of volume will become read-
only. There are no magic to auto correct things, which I personally think is
where problems start.

If the client crashes, the writes would fail.

------
jwatte
Can you compare it to MogileFS, which seems very similar in design
assumptions? (Also, Mogile has been around for > 10 years!)

~~~
jusob
I've just looked at it. The concept seem to be the same. SeaweedFS is much
simpler. You have only 2 services: master and volume, both speak HTTP only.
MogileFS seems to offer more control and more tools (rebalancing, for
example).

