Could you comment more generally on what advantages Rust offered and where your team would like to see improvement? Are there portions of the Dropbox codebase where you'd love to use Rust but can't until <feature> RFCs/lands/hits stable? Are there portions where the decision to use Rust caused complications or problems?
> The article makes a brief mention of Go causing issues with RAM usage. Was this due to large heap usage, or was it a problem of GC pressure/throughput/latency? If the former, what were some of the core problems that could not be further optimized in Go?
The reasons for using rust were many, but memory was one of them.
Primarily, for this particular project, the heap size is the issue. One of the games in this project is optimizing how little memory and compute you can use to manage 1GB (or 1PB) of data. We utilize lots of tricks like perfect hash tables, extensive bit-packing, etc. Lots of odd, custom, inline and cache-friendly data structures. We also keeps lots of things on the stack when we can to take pressure off the VM system. We do some lockfree object pooling stuff for big byte vectors, which are common allocations in a block storage system.
It's much easier to do these particular kinds of optimizations using C++ or Rust.
In addition to basic memory reasons, saving a bit of CPU was a useful secondary goal, and that goal has been achieved. The project also has a fair amount of FFI work with various C libraries, and a kernel component. Rust makes it very easy and zero-cost to work closely with those libraries/environments.
For this project, pause times were not an issue. This isn't a particularly latency-sensitive service. We do have some other services where latency does matter, though, and we're considering Rust for those in the future.
> Could you comment more generally on what advantages Rust offered and where your team would like to see improvement?
The advantages of Rust are many. Really powerful abstractions, no null, no segfaults, no leaks, yet C-like performance and control over memory and you can use that whole C/C++ bag of optimization tricks.
On the improvements side, we're in close contact with the Rust core team--they visit the office regularly and keep tabs on what we're doing. So no, we don't have a ton of things we need. They've been really great about helping us out when those things have sprung up.
Our big ask right now is the same as everyone else's--improve compile times!
> Are there portions where the decision to use Rust caused complications or problems?
Well, Dropbox is mostly a golang shop on the backend, so Rust is a pretty different animal than everyone was used to. We also have a huge number of good libraries in golang that our small team had to create minimal equivalents for in Rust. So, the biggest challenge in using Rust at Dropbox has been that we were the first project! So we had a lot to do just to get started...
The other complication is that there is a ton of good stuff that we want to use that's still being debated by the Rust team, and therefore marked unstable. As each release goes on, they stabilize these APIs, but it's sometimes a pain working around useful APIs that are marked unstable just because the dust hasn't settled yet within the core team. Having said that, we totally understand that they're being thoughtful about all this, because backwards compatibility implies a very serious long-term commitment to these decisions.
> On the improvements side, we're in close contact with the Rust core team
I've read before (somewhere, I think) that Dropbox effectively maintains a large internal "standard library" rather than relying on external open source efforts. How much does Magic Pocket rely on Rust's standard library and the crates.io ecosystem? Could you elaborate on how you ended up going in whichever direction you chose with regards to third-party open source code?
So, we use quite a few crates for the things it makes no sense to specialize in Dropbox-specific ways.
>Are you going to open source anything?
>Probably. We have an in-house futures-based I/O framework. We're going to collaborate with the rust team and Carl Lerche to see if there's something there we can clean up and contribute.
We also tweak jemalloc pretty heavily for our workload.
1. Go vs Rust at Dropbox's scale and requirements.
2. Maintaining availability whilst moving from AWS to Diskotech and Magic Pocket.
3. Internals of Magic Pocket (file-system, storage engine, availability guarantees, scaling up and scaling out, compression details, load balancing, cloning etc)
4. Improvements in perf, stability, security, and cost.
Will most likely start with the following:
1. Overall architecture and internals.
2. Verification and projection mechanisms, how we keep data safe.
3. How we manage a large amount of hardware (hundreds of thousands of disks) with a small team, how we automatically deal with alerts etc.
4. A deep-dive into the Diskotech architecture and how we changed our storage layer to support SMR disks.
Hopefully these will be of a sufficient level of detail. We certainly won't be getting into any details on cost but we're pretty open from a technical perspective.
(Lemme just take a moment to say how great Backblaze's tech blog posts are btw.)
Here's a request, how did you migrate all that data. Are you going to be open-sourcing any of the tools that you built up? We'd be interesting in hearing about that process. A lot of folks using B2 right now are trying to do this exact thing (or making copies if not actual migrations). Cheers! ;-)
One of the ways tech companies wrong is by putting the tech first and the users second. I think Dropbox's early competitive advantage was putting the user experience first. Everybody else had weird, complex systems; you folks just had a magic folder that was always correct. User focus like that is easier to pull off when it's just a few people. But once you break the customer problem up into enough pieces that hundreds of engineers can get involved, sometimes the tail starts wagging the dog and technically-oriented decisions start harming the user experience.
How did you folks avoid that?
You must have left a large hole in AWS' revenue stream. I assume their fees for transferring your data out of their system helped cover that short-term, but 500PB of data is a lot of storage capacity to sell.
Your other answers indicate your separation with AWS was very amicable, much more amicable than I would have expected. Either "its just business" or they have plans to backfill the hole you left.
So 500PB is ~6% of that... Although I'm sure Amazon would have liked to keep Dropbox as a customer it's probably a pretty small percentage of their revenue. Also, if you assume that AWS is growing at ~50% year over year then 6% of S3 is only about one or two months worth of growth.
I think it was this talk but I'm not sure: https://www.youtube.com/watch?v=3HDQsW_r1DM
But now that you've moved away from AWS, have you expanded your security team to help make up for the fact that you need folks to cover all your infrastructure security needs now too? Have you made the hires you needed to deal with the low level security issues in hw and kernel? How have you all been dealing with this so far?
I ask, because I help people make sense of their security issues and my clients explicitly ask me about your company. So keeping track of what you all are doing there is pretty relevant!
Feel free to refer others to this thread if you think it'd be helpful.
Cool stuff, really interesting read!
The data placement isn't RAID. We encode aggregated extents of data in "volumes" that are placed on a random set of storage nodes (with sufficient physical diversity and various other constraints). Each storage node might hold a few thousand volumes, but the placement for each volume is independent of the others on the disk. If one disk fails we can thus reconstruct those volumes from hundreds of other disks simultaneously, unlike in RAID where you'd be limited in IOPS and network bandwidth to a fixed set of disks in the RAID array.
That probably sounded confusing but it's a good topic for a blog post once we get around to it.
It's only covering a few use cases and doesn't go into too much detail and shows example code but it seems like using TLA+ has been beneficial for them.
I'm surprised at this. I would have expected the opposite: using AWS for the application and metadata but managing your own storage infrastructure. Much like how Netflix runs on AWS but streams video through its self-managed CDN. Can you shed some light on why it was designed this way?
Our previous version of the storage layer ran on top of XFS which was probably a bad choice since we ran into a few XFS bugs along the way, but nothing serious.
On a high level it's variable-sized blocks packed into 1GB extents, which are then aggregated into volumes and erasure coded across a set of disks on different machines/racks/rows/etc. We also replicate cross-country in addition to this. Live writes are written into non-erasure-coded volumes and encoded in the background.
The volume metadata on the disks contains enough information to be self-describing (as a safety precaution), but we also have a two-level index that maps blocks to volumes and volumes to disks.
More info to come later.
Also, do you mimic the eventually consistent behaviour of AWS, or do you offer a stronger form of consistency?