Hacker News new | past | comments | ask | show | jobs | submit login

Swift is....... woeful.

Just as an example, they store object listings using SQLite databases that were file based replicated between nodes for HA. Thus when you had too many files in one container your performance would sink like a stone. Assuming it was never corrupted/etc...

I'm all for people working in this space though. A monoculture is rarely good for anybody.




woeful seems a little harsh :-)

There is a current workaround for the issue you describe: use many containers. However, there are 2 ways to solve the issue for good. One (the simplest) is to have dedicated hardware for the account and container servers, and provide that hardware with plenty of IOPS. Our testing has shown sustained 400 puts/sec on a billion item container with this kind of deployment. The other solution is to change the code to automatically shard the container (transparent to the client) as it gets big. This is something we (the swift devs) are working on. I hope that it will be done in the next several months, but, of course, a complex feature like this is hard to fit to a predetermined timeline.


:(

You're going to shard a SQLite database into a series of objects to deal with "large" containers?


The idea is to limit each "shard" to some number configurable number of objects, say, 1 million. As the container grows, the db can be split in two and each of the two new pieces can grow. The original container entity keeps an index listing of what each of its "child shards" hold, ie the start and end markers.

There are tricky problems to solve, of course. How do listings work? Will shards ever be collected? What are the performance tradeoffs? How does replication handle shard conflicts?

These issue will be worked out, and it should eliminate the write bottleneck in large containers. (Note that reads are/were never affected by this issue.)

This implementation of container sharding is something that is being evaluated. It may or may not ever make it into swift itself.


But why SQlite? And why file based?

Why don't you guys use a proper distributed database to handle container mappings/etc?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: