
Inside the Magic Pocket - hepha1979
https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pocket/
======
james_cowling
Hey folks, this is James from Dropbox. As with the last post
([https://news.ycombinator.com/item?id=11282948](https://news.ycombinator.com/item?id=11282948))
a few of us will check in from time to time to see if there are any questions.

Apologies for taking so long to get this blog post out! We'll follow up with a
few more posts going into more technical details.

~~~
Pyxl101
Nice engineering work! This is a really interesting case study.

Has the investment into bespoke storage technologies and data centers paid off
already, given the real costs to build and set it up, and opportunity costs
for other uses of engineering time? How long did it take to pay off through
reduced costs, or when do you expect it will in the future? It would be
interesting to learn at what point it might be worthwhile for a large player
in a space like file storage to switch from existing commodity cloud solutions
or proprietary on-premise solutions to custom ones -- as an extrapolation
basis for other problems.

The general implication behind my question is: a custom solution can provide
lower cost when viewed in isolation on a per-unit basis, but the total costs
to the business must also account for the engineering work to develop,
troubleshoot, deploy, and migrate to the storage system, as well as the time
needed to lease and build data centers, etc., as well as to operate facilities
and software stacks long-term; and to hire people to do that specifically; and
also take into account the other things one could have done but cannot due to
the decision to focus on a given problem (e.g., if the investment purely
lowers cost, then that might be a tradeoff compared to another project that
adds differentiating features. I know your blog post also mentioned
performance, so it's not either-or) To make a good decision one must also
weigh the expected value of alternatives, e.g., the current and predicted
feature-sets of existing cloud or on-premise storage technologies, their
expected prices over time, their expected regional availability. There's also
the complication of owning substantially more assets rather than having
operational expenses on cloud providers or other third parties.

I'm just curious to learn more about how you think about navigating the
decision space. Of course, there's significant strategic value to "owning
one's own destiny", but it also seems to have required significant investment
and attention from Dropbox engineering team for perhaps a few years.

Any general advice regarding how big your use-case needs to be before this
kind of engineering is worthwhile? How do you analyze the situation to
identify when that's the case -- how do you identify the inflection point with
confidence? Was it your prototyping that helped you determine that you could
execute on this approach?

~~~
jhartmann
I believe a team of one or two talented individuals can build something that
scales well and that will save a HUGE amount of money once you get into any
significant portion of a petabyte with your data. One or two months of S3's
list price can easily pay for the cost of hardware. You can easily go with
whitebox hardware from say supermicro, and colocation is SOOOO much less
expensive. If your not storing a significant portion of a petabyte, its going
to probably be cheaper development wise to just use S3. You also don't need to
build something as complex as their system, they have to scale to exabytes.
Unless you plan to hit that scale immediately or have that much data, you can
build for something less complex and smaller to start.

~~~
james_cowling
Just a caveat I'll add is that you have to work really very hard to build a
system with equivalent/better durability and availability guarantees as the
big cloud storage providers. It's typically not possible to be significantly
cheaper with these same guarantees unless you're at large scale with well-
established supply chain organizations, etc.

Of course if you don't need 11+ nines of durability or multi-datacenter
redundancy, or if you have lower access requirements, then you can exploit
those reduced requirements to your advantage.

~~~
james_cowling
To be clear, we do need those durability and availability guarantees :)

We just benefit from our very-large scale and close alignment between hardware
and software to tailor our infrastructure to our workloads.

------
akhilcacharya
What are some of the cool advantages of rebuilding Magic Pocket with Rust
(outside of just memory usage)?

~~~
jamwt
(We actually only rebuilt two components in Rust--most of Magic Pocket is
still go.)

Memory usage/control, easy FFI with C, safety/correctness of code. Those were
the big advantages for our projects.

Edit: deterministic destruction of resources is pretty great too. Yay RAII.

Edit 2: So is the ability to use the "standard" C tool suite for performance,
memory, debugging, etc. valgrind, jemalloc profiling, gdb/lldb, perf. All that
stuff.

------
squiguy7
What were some of the funnest engineering challenges Dropbox faced when
deciding on architectural implementations or how to get a system that could
store exabytes?

------
amelius
Since this is a distributed system, did you consider using Erlang for the
implementation?

~~~
jamwt
Nope, honestly. Go is dropbox's primary infrastructure programming language.
There's an extensive internal library/culture/toolset around go, so it made
the most sense to use that.

------
kordless
> immutable block storage system

There are higher order problems that result in that insight.

