Swift's code is available on github (http://github.com/openstack/swift) and devs and users are almost always available in #openstack on freenode. There is a wealth of info available and more available to anyone who asks.
I'm all for encouraging many people to solve large-scale storage problems. However, as others have pointed out, nimbus is claiming to be open without providing much detail.
Thanks for your interest! Nimbus.io will have public git repositories, "developed in the open" before we ever charge money to use the service. We just haven't posted the links yet. :)
We admire OpenStack and Ceph as great examples of open source S3 alternatives. Also, Riak+Luwak isn't protocol-level compatible with S3 but offers similar capabilities and an truly elegant design.
Nimbus.io takes a different approach than the above options in that it focuses on space efficiency using parity instead of replication, allowing the storage of a little more than twice as much data using the same hardware. It's a tradeoff of cost vs. latency. For long term archival storage, while throughput matters greatly, latency less so. That's why the price is $0.06/GB.
You are absolutely right that these things are greatly dependent on the use case. I'm happy to see other people trying to solve these problems too.
Can you describe your API? Do you have your own? Are you reimplementing the S3 API? REST-ful? xmlrpc? How do you handle authentication and authorization?
Just as an example, they store object listings using SQLite databases that were file based replicated between nodes for HA. Thus when you had too many files in one container your performance would sink like a stone. Assuming it was never corrupted/etc...
I'm all for people working in this space though. A monoculture is rarely good for anybody.
There is a current workaround for the issue you describe: use many containers. However, there are 2 ways to solve the issue for good. One (the simplest) is to have dedicated hardware for the account and container servers, and provide that hardware with plenty of IOPS. Our testing has shown sustained 400 puts/sec on a billion item container with this kind of deployment. The other solution is to change the code to automatically shard the container (transparent to the client) as it gets big. This is something we (the swift devs) are working on. I hope that it will be done in the next several months, but, of course, a complex feature like this is hard to fit to a predetermined timeline.
You're going to shard a SQLite database into a series of objects to deal with "large" containers?
There are tricky problems to solve, of course. How do listings work? Will shards ever be collected? What are the performance tradeoffs? How does replication handle shard conflicts?
These issue will be worked out, and it should eliminate the write bottleneck in large containers. (Note that reads are/were never affected by this issue.)
This implementation of container sharding is something that is being evaluated. It may or may not ever make it into swift itself.
Why don't you guys use a proper distributed database to handle container mappings/etc?