Hacker News new | past | comments | ask | show | jobs | submit login

(SpiderOak / Nimbus.io cofounder here)

Note that this is just an announcement and invite site to show the pricing at $0.06/GB. Nimbus.io will have public git repositories, "developed collaboratively in the open" before we ever charge money to use the service. (And this is a wholly different project than the SpiderOak backup/sync software.)

FYI, you can see the git repos for the prototype we built of this awhile back, when we called it our storage "DIY API". https://spideroak.com/diy/ Note that the code and the rest of the information on that page is way out of date since it was an early design and prototype.

I'm not sure erasure coding vs. replication is a simple change for other distributed storage projects. It effects the whole architecture. We researched pretty heavily before building. If it had been simple to modify any of the alternatives, this project wouldn't exist. I'm more than happy to be proven wrong though!

* Edited for pricing info.

"I'm not sure erasure coding vs. replication is a simple change for other distributed storage projects."

It depends on a few factors: how modular the architecture is overall, whether the existing replication is synchronous or asynchronous, etc. I'm working on the GlusterFS replication code right now in another window (OK, I should be but I'm typing here). I can assure you that it would be possible to replace replication with erasure coding just by replacing that one module, without perturbing the rest of the architecture. I've also been through the tabled code and I think it would be possible there too. I suspect the same would be true for Elliptics, but probably not Swift. Can't tell for Luwak; that would require more thought than I can afford to put into it right now.

This is something we've actively considered for GlusterFS/HekaFS, and might still do some day - though it's more likely to be on the IDA/AONT-RS side than RS/EC. The downside is that, while these approaches do offer better storage utilization, they also consume more bandwidth. Also, queuing effects can turn a bandwidth issue into a latency issue. This is especially the case for read-dominated workloads, where you just can't beat the latency of reading exactly the bytes you need from one replica. For these reasons I don't think either full replication or redundant-encoding schemes will ever entirely displace the other. Each project must prioritize which to implement first, but that doesn't mean those that have implemented replication first are precluded from offering other options as alternatives. It's really not an architectural limitation in most cases. It's just timing.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact