Hacker News new | past | comments | ask | show | jobs | submit login
Lessons from Dropbox - One Million Files Saved Every 15 minutes (2011) (highscalability.com)
44 points by ghosh on July 31, 2013 | hide | past | favorite | 16 comments



> Use Python

It may be a good fit for Dropbox, but that doesn't mean it's going to be a good fit for your company. You're probably best off using what's suitable to the task and that you have a team that's familiar with. If your programmers are all, 'let's use C' and they've got a good argument for that suiting a particular problem domain, (and you're not somehow tied down to using another language by need to work with what another company is doing...) then it may be a good idea to use C. Same with Lisp or Python or....

There are going to be trade-offs there; (what library support is available, what sort of community of programmers you're buying into when you choose that language, how the semantics let you fit it into a particular production/design paradigm, are the really major ones to me.) So, it's not quite free choice. But those are discussions that ought to happen if you're going to sit down and choose a language.


I've always been quite impressed with how fast, low footprint, and solid Dropbox is given that it's in Python.

The problem with VM languages (Java, Python, Ruby, all of them) is that the VM itself and its libraries constitute an enormous dependency that you must either manage compatibility with if you are to use the natively installed one or ship with your product. The latter offends my personal aversion to bloat... I hate shipping something where there's 100X as much data in dependencies than in the product itself. "Here's my 2MB app and its 120MB of VM bloat..."

Python has some decent tooling in this area, so that helps. But VM language authors take note-- this is a big problem that needs to be solved.

Also why I find Google Go interesting: it's compiled as far as I know, and requires few dependencies to ship. (A few libs? That can probably be static if you want?)


Go produces a statically linked compiled binary.


I think the broader point is to use a single language over as many of your system's as possible. Rather than just "Python is the answer to everything."

Obviously there are benefits and drawbacks to such an idea. Benefit is every programmer you hire can be a Python developer. Should make meetings fairly simple. You need to worry less about ensuring you have an adequate number of x developers to maintain the system which relies on language x.

Obviously the drawback is that Python, like any other language, isn't the best choice for everything. To get round such inefficiencies Dropbox choose to buy larger machines.


'Just buy bigger machines' - Uhh I mean that works but that isn't really a solution is it?


> Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at a problem which means throwing more boxes at a problem as you need them. If your architecture can do that, then you’re golden.

Source: http://highscalability.com/blog/2013/4/15/scaling-pinterest-...


But that's not the first port of call. For instance, I've worked with a few companies that were perplexed why their site was going slow when they had invested heavily in great servers. It turns out the code was terrible.


I agree and I've seen that; however, the point of the article is that ideally you have rock-solid code and the only sane way to scale is just to add more of the same resources as the codebase is already optimized.

The reason I posted the quote was to answer your on your remark that it isn't a solution - it is close to an ideal solution actually, but not applicable everywhere and at any time (e.g. sloppy/non-optimized code as you pointed out).

Hope this makes sense.


If your product is profitable then being able to scale the server side by buy bigger/more servers (aka throwing money at the problem) can be a great approach. It means no new code write, and more importantly, no new code to test. There can be much bigger gains by updating your software but the costs are much higher and the turn around time is generally longer (for major updates).

This is not a valid approach on the client side though as you don't control the hardware.



Anyone care to elaborate on: "Poll - Polling 30 Million Clients All Over the World Doesn't Scale"

Rather interested on defining an HTTP notification structure.


This article is 2 years old


Why does it matter?


It's very likely Dropbox has changed some things since then, having grown to the point where many problems have likely gone from "just buy more hardware" to "let's make this more efficient" in solution-space.


Because I would like to know about scalability issues?

Perhaps in these two years their userbase and the number files saved every 15 minutes might have grown by, say 5x and 50x, revealing problems that were not visible in 2011, leading them to modify the system from what is mentioned in the article?


[2011]




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: