Hacker News new | past | comments | ask | show | jobs | submit login
Dropbox traffic infrastructure: Edge network (dropbox.com)
277 points by el_duderino 5 months ago | hide | past | web | favorite | 85 comments

If you have trouble to connect to github.com how you're going to "git clone https://github.com/[xxx]"?

Because it's possible that you can't clone from the https endpoint but you can from the git endpoint, which helps them (and you) debug the cause.

Yeah, but then the last step is to "copy & paste the results" to .. https://github.com/contact

I'm assuming it also logs some debugging information with regards to your git client's behavior, though. So you could still be able to get to that contact form.

Yes, presumably being unable to clone is an important data point.

My point is chances are you can't visit https://github.com/contact to paste your debugging result, if you're "Having trouble connecting to github.com".

Ha good catch

pretty cool.

quick question, why not just make

- https://github.debug or

- https://debug.github.com

The main principle here is to reuse as less infra as possible: different dns provider, different hosting, different cert authority, etc. So, debug.github.com is probably worse, than github-debug.com (github.debug should be fine though, and even slightly better than github-debug.com)

You could use two independent dns providers for the same domain (i call it inverse split horizon). I am a proponent of dns service discovery (using NAPTR and SRV records) so that by going to e.g. www.github.com, the dns provides a list of servers, and the final server would be your debug site (kind of like how there are MX records with different priorities, except naptr/srv is service agnostic).

Hi, I assume you're the writer or at least work at Dropbox.

Just want to let you know (as I mentioned somewhere): this blog link doesn't load at all if I click from here (i.e. with the refer of news.ycombinator.com).

thank you! i understand now.

Why would that be better?

To isolate possible issues with gtld, dns + glue records, registrar, domain blocking (corporate firewalls, restricted/compromised user dns, malware) that affects .com (the github.debug that is, debug.github.com is not a good solution for the above reasons)

Good find, I like this pattern. I could see a niche startup or open source solution providing this as a service, similar to statuspage (acquired by atlassian). It could also help keep the layout consistent, for user familiarity.

Is this open-source, how can I make one for my company?

Read the headers from the request to your service. Pull browser information from navigator (in a browser try: console.log(navigator)).

Should be most of the info.

> DNS TTL is a lie. Even though we have TTL of one minute for www.dropbox.com, it still takes 15 minutes to drain 90% of traffic, and it may take a full hour to drain 95% of traffic.

Interesting, awhile back both Google and AWS engineers replied to a HN thread[1] saying TTL of 30 seconds or so works pretty responsibly and can be trusted and used. Seems to be some disagreement on this.

[1] https://news.ycombinator.com/item?id=17553043

It really depends who your clients are. If they are servers, 30s can work okay. If they are end users, caching happens all over. It's a huge PITA, especially since you'll run into podunk ISPs that have their own custom caching setup, but you have a customer with a shop there. Not that I'm still bitter.

It really depends on the clients, here is an excerpt from the article:

> Here we also need to mention the myriad embedded devices using Dropbox API that range from video cameras to smart fridges which have a tendency of resolving DNS addresses only during power-on.

> which have a tendency of resolving DNS addresses only during power-on.


DNS TTL is a PITA for us at Netflix as well. We use Zuul to proxy cross-region for TTL "non-responders".

It would be funny if part of what they are seeing is because the Dropbox client doesn't try reconnects or re-resolving very often.

There's a lot of caching done by e.g. Comcast that does not always honor the TTL.

How are TCP connections established while using anycast ?

I was reading https://serverfault.com/questions/279482/what-is-the-differe... to understand what exactly is unicast, anycast, etc. It seems like TCP works with unicast.

With GeoDNS users requests will be resolved to single IP address with which they can establish a TCP connection.

How, will connection establishment (TCP specifically) work with anycast ? Where request (packets ?) can be routed to different machines ? Is there some other network protocol to use with anycast ?

I was going through: https://serverfault.com/questions/616412/how-does-anycast-wo... , not sure it is the 2018 way of doing these things.

Sorry for dumb questions, network newbie here.

Anycast route election is done by the connecting router (typically the clients ISP), and it works over bgp like normal route advertising - it's transparent to your applications and they only get directed to one endpoint destination (typically the one with the lowest latency, congestion, or cost). Once the router gets a request from a client for a destination, it does the same thing it usually does and checks the possible routes and chooses the "best" based on the rules defined by the owner of the router. The difference here is that not all of those proposed routes make it to the same physical destination. There is some stickyness to make sure you're arriving at the same destination for the duration of that connection.

Really it's like asking your favorite maps app where McDonalds is - it'll know to give you results close to you, since it knows there are many McDonalds. It would make sense for you to choose the closest, but it's not enforced by McDonalds. Similarly, if I do the same and we live in different cities, we'll get different results. The end result is that we both got to different physical places by specifying the same thing, and it's up to McDonalds to make sure the menu is the same :)

Read more on anycast. Anycast lets different computers advertise the same IP address from multiple places. When your computer tries to talk to that IP address (over whatever protocol you chose) the network sends that traffic to the “closest” computer advertising that IP address.

TCP builds a reliable stream on top of unreliable IP packets. Since which computer is “closest” rarely changes, your stream will keep being sent to the same server. When that server stops advertising the IP, then the data will get sent to a different server which will say “I don’t know what you are talking about, Reset your stream” and the tcp connection will close, basically.

I wonder which definition of "Edge" is going to win out because right now it's being used interchangeably to mean either: 1) CDN or 2) On-premise machines/IoT.

Are these really two competing definitions or just manifestations of the same concept? What is considered an edge device depends on the boundaries of whatever network is under discussion.

I've never heard it used with that second definition. It's pretty consistently used to refer to running close to the user, as opposed to having a big data center which most of your users aren't near.

The second definition could be a confusion of ownership — i.e. are you paying a CDN to do higher-level service or running the services yourself?


Wikipedia's definition primarily focuses on the local/IoT version as a distinctly different from utilizing a CDN.

I wouldn't define a CDN as edge computing either, since computation is not the purpose of a CDN.

https://en.wikipedia.org/wiki/Content_delivery_network contains

> Most CDN providers will provide their services over a varying, defined, set of PoPs [...]. These sets of PoPs can be called "edges", "edge nodes" or "edge networks" as they would be the closest edge of CDN assets to the end user"

There's also https://en.wikipedia.org/wiki/Edge_device which uses yet another idea of what The Edge is (in this case routers to a bigger network)

So "Edge" is the new "Cloud". Got it.

I work on an Edge Platform team as part of an Edge Foundation that manages both external CDN and internal Tier 1 WAF/Ingress systems. We do Edge computing at both CDN and Tier 1 layers via tenant plugins running LUA/go. We also have an SDN team building Tier 2 solutions, so basically systems operating at the edge of each layer of the HTTP stack.

And let me guess, the web admin dashboard only works in ... IE6? <smirk>

This article is about neither of those things, though. https://en.m.wikipedia.org/wiki/Edge_device is the term that's relevant to this article. It's about the network edge itself.

Those aren't completely distinct. We've started calling the edge everything between app servers and user devices. The boundary is "where your users are in control", but the edge itself is pretty fat.

It's the "edge" of what you control as the service provider.

It's OK, you can call your on-premises machine/IoT "Fog" computing - thank Cisco. :)

This page doesn't load here (totally blank). Other blog posts of Dropbox work fine.

I now noticed that if you click from here (i.e. with the refer of news.ycombinator.com), it won't load.

No idea why though.

Hm. Yea, the content for the main page is requested and loaded, but none of the supporting content (CSS, images, etc.) is requested.

Copying the URL into another tab loaded the whole thing.

I don't really have time to dig into it further...

I'm surprised there is no mention of security or privacy. I'd have thought that's one of the reasons for controlling your own edge network.

If I had to guess the people who wrote this are just assuming tls on top.

That's an interesting implication. If they distribute the same cert so widely geographically, any host country could technically request it for "lawful intercepts".

You don't have to keep keys on boxes in random countries if you use a TLS oracle [1]. Another option is deploying the keys onto an HSM and pointing your frontends at that.

1. Here is CloudFlare's implementation (and pats themselves on the back for "inventing" it): https://blog.cloudflare.com/keyless-ssl-the-nitty-gritty-tec...

A host country can request a lawful intercept regardless of whatever technical conditions exist on your network.

Dropbox doesn't offer privacy. Anything done on the service is visible to them and whoever else convinces them to hand over the data.

They're exactly the type of cloud service that shouldn't be used by businesses or privacy-conscious individuals.

Security's also questionable. They had an incident where one could log into any account a long time ago. More recently they were presenting a fake admin dialog to syphon the admin password on macOS and perform some admin tasks on the machine.

Their security is very good nowadays. I suggest reading their security whitepaper and their blog posts on various security topics:



Disclaimer: I used to work there on the Security Engineering team.

Thanks for the post! Typo at gradient decent (descent) mention.

Ouch, that was embarrassing -- thanks for spotting =) Sorry about that -- ESL and stuff. Editors did a hell of a job fixing our English, but errors still slipped in. We'll be fixing grammar (and probably add link to the presentation PDF) later this week.

Looks like TLS 1.3 adoption is growing across service providers...

Great post. Thanks!

Who would have thought a simple FTP-on-some-server would have been so complicated. Thanks for the post!

I'm eagerly waiting for the blog post about how bazel and torrents are used.

Who describes Dropbox as the same as FTP?

Cynical HN commenters from a decade ago. It's pretty hilarious in retrospect.

Oh, right, I'd forgotten about https://news.ycombinator.com/item?id=9224 It's definitely comical in retrospect for not even understanding the state of the technology at the time.

I know the comment is today taken as the high of HN negativity, but to me the comment seems very reasonable.

- Back then there were FTP clients that automatically kept server and client in sync, which is the main feature of Dropbox. Dropbox adds a website, but Windows Explorer already supports FTP nativly. Of course easily creating shared links turned out to be a major thing, but I don't think we can blame people for not predicting that (especially since public folders are a feature of FTP servers, so it's not a new feature, just a lot more convinience). And of course Dropbox makes all that convinient and approachable, but that's easily overlooked by the technical user.

- The comment points out that contrary to the headline Dropbox will not replace USB drives. And here we are, a decade later, and Dropbox indeed didn't replace USB drives.

Of course in hindsight it's clear that Dropbox was a great idea with great execution, but that wasn't obvious at the time at all.

I was just thinking that having used various FTP/SFTP-as-a-filesystem, not to mention NFS and SMB, over a decade or so before Dropbox arrived made the sales pitch immediately obvious: do you want everything to be slow and unreliable, with frequent jank even on fast networks, or not?

Well, SMB worked a decade ago and still does. Main difference is that it works better on local networks.

“Works” in the sense that the experience is acceptable but anyone who's used it knows that while things have gotten a little better over the years there are still a wide range of programs which handle latency by blocking. If you use a network home directory, you just get used to Outlook, Word, etc. sporadically hanging for a few seconds before the UI paints, etc.

That's going to be worse as a function of latency and packet loss so it's far more tolerable in an enterprise environment using wired networks with tons of bandwidth and, at least theoretically, a professional support team. Over WiFi or consumer-grade internet (i.e. probably a strong majority of Dropbox's customers) the gap in experience is going to be more substantial.

I am not sure if SMB is used over the Internet by "customers".

It's heavily used over VPNs. I have seen SMB over the internet by [ill-advised] companies but the various waves of exploits have probably put an end to that.

The main point was just that something like SMB or NFS is not a good fit for a network which is not extremely fast and highly reliable because too many programs do blocking I/O. Dropbox works really well in that situation because it's asynchronous and that advantage was huge when they came out because everything was even worse back in 2007.

I wrote "customers", not in the sense of "companies". With the later ones I get paid to wait if I have to use SMB.

To be fair: FTP makes it pretty easy to see if a file is uploaded. With Dropbox you always have to double check ...

How frequently do you find their UI badging to be inaccurate? I’m not sure I ever have caught it reporting the wrong state.

I'm also curious how torrents are used and why.

I would guess that they do something similar to Facebook.

Ars covered what Facebook do about 6 years ago, and even then I think they'd been using it for a few years:


I believe Twitter does something similar too.

Twitter dis use BitTorrent for deploying, in a project called murder: https://blog.twitter.com/engineering/en_us/a/2010/murder-fas...

Not sure they're still using this

MSN Hotmail and Messenger used torrents to distribute binaries in 2006. It's not new nor novel.

And I'm sure someone did it before they did.

What binaries did MSN Hotmail have? It was the earlier web mail incarnation of outlook.com?

The Hotmail front-end ran as an ISAPI filter inside IIS.

Messenger had lots of standalone binaries for different functions: CS, SB, DP, etc.

BitTorrent is used in Dropbox, Twitter, MSN, etc to distribute binaries throughout the server platform, not to end users.

This has to be HN at its worst. Reducing a complicated file sharing and collaboration tool to an insecure and highly technical protocol.

Dropbox: I can upload a file super easily and share a simple & secure link with someone who just has a web browser.

FTP: I can upload a file to an FTP server I've either configured on my server or rented online. I'll then provide an FTP url to friend with instructions on how they should login and what FTP client they should use on their chosen device.

EDIT: This could be sarcasm, if I didn't pick up on then feel free to downvote me to hell.

EDIT 2: Thanks to the comments, this is sarcasm. I messed up. Sorry rakoo.

This is a parody. From the original Dropbox announcement on HN:


Unfortunately, similar parodies are posted as a reaction to many Dropbox-related posts, so gets a bit repetitive.

It does indeed, but I feel like it's an important part of HN (some people arrive here allthc time) and retrospecting about it is something all engineers should do, so I felt the need to point at I again.

For completion sake and closer on the story, here's the same account 11 years later reflecting on himself: https://news.ycombinator.com/item?id=16661824

It's probably in reference to a HN comment on Dropbox's original announcement post that said the product was a glorified version of Rsync. To be fair to that commenter, he congratualted the company in it's IPO post.

Would be great to know how exactly they store all the customer data on this edge network. Is it encrypted with a customer-specific key? If yes, when and how do they decrypt it?

They dont store customer data at the Edge at all (only network optimizations for data going from user to the data center).

Probably not. You can reset your password without losing your content.

Why doesn't dropbox just use AWS?

They were using AWS. They have moved off of it within the last couple years because they now have the scale where it makes financial sense to build their own infrastructure and also to provide a better, faster service. They've had improved read/write + sync speeds since switching over to their own infrastructure. Having those checkboxes in a table showing that you have the fastest cloud storage works really well in B2B, which has been a big focus for them recently.

They did, but I imagine it was really expensive vs building their own stuff at that scale. Plus they probably don't want to be reliant on a competitor in many ways.

Early edge network was indeed prototyped in AWS: a simple setup with ELB in tcp mode, nginx and proxy_protocol.

Once we've got all data from that experiment: performance, cost, and flexibility included, we've decided to start building our own PoPs.

"Just" using AWS isn't going to solve any of the problems this article describes.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact