Hacker News new | past | comments | ask | show | jobs | submit login
Migrating From AWS to FB (instagram-engineering.tumblr.com)
303 points by dctrwatson on June 26, 2014 | hide | past | web | favorite | 91 comments

> The main blocker to this easy migration was that Facebook’s private IP space conflicts with that of EC2

IPv6 adoption could not happen soon enough.

An EC2 rep has told me on more than one occasion that they have no plans to support ipv6 because the demand for it simply isn't there.

That's because people don't realize how useful it is in network ops because they re not used to it. This thread is an example were a glaring advantage of ipv6 wasn't immediately obvious.

Anybody who ever tried the VPC+ElasticIP+VPN braindeadness once should immediately file a feature request for ipv6; it's just that they don't probably think of it

I think it has more to do with the fact that there are a standard series of private IP blocks.

Right, but if everyone used IPv6 there would be no need to use non-routeable private IPs for anything, you could just use non-conflicting IPv6 addresses and not route them.

That's basically what a ULA is for though - see RFC 4193. If I'm not big enough to acquire my own allocations from ARIN, I'd prefer not to have to renumber every piece of equipment (even those not externally accessible) when changing ISPs.

Even if you did use non-routeable private IPs, IPv6 provides enough different private address ranges - more than there are individual addresses in IPv4 - that it's unlikely they'd conflict.

I kind of like having standard private subnets. My router is always, or sometimes, and so is my friend's, my parent's, and my grandparent's.

After spending some time as a contractor doing systems work for a few companies I've stopped ever assigning an internal network to,, or

The number of times I found myself attempting to VPN into a clients network, only to find it conflicted either with my home network, or whatever coffee shop I was sitting in, was ridiculous. Depending how many hosts you need to run on your network there are huge numbers of possible subnets you could use for an internal network - do yourself a favour and keep off the ones set up be default on every router sold.

This is also important advice if you're thinking about setting up VPN access to your home network: Do not pick the most common/default subnets, i.e.,, etc. Picking a somewhat-random subnet as suggested would mitigate the problem and it's what I did for my home network.

It creates a lot of pain when you need to connect these networks to each other, though. Admittedly, that's probably not a problem for parents, friends, etc.

At my Old Job I demanded we keep a "registry" of the RFC1918 address space we allocated to Customers. We never allocated Customers in overlapping address spaces. It made VPN connectivity to Customer A while on-site with Customer B much easier. It also helped out in one case where one Customer acquired another.

You can have that in IPv6 too, with link-local addresses (fe80::/64). In fact, it's becoming quite common to assign fe80::1 to a subnet's default gateway.

Which is great until you and your friend want to play games with each other and need to set up a VPN.

The range of non-conflicting IPv4 private addresses is not really small. Everyone just happens to use the same two or three blocks, the easiest to remember.

Actually, when you're working at Facebook scale, the range is quite small. One of the reasons Facebook started moving to IPv6 was that they were running out of RFC1918 addresses.

Service-net (https://github.com/mesosphere/service-net/) is a proof-of-concept for service routing and discovery with IPv6 and DNS. The underlying mechanism is similar -- IP tables is used for routing and load-balancing -- but DNS and IPv6 tunneling are integrated as well.

One downside of IPv6 on AWS is that the IPv6 tunneling protocol -- protocol number 41 -- is only available in VPCs. The EC2 classic network allows only 3 IP protocols -- TCP, UDP and ICMP -- to pass through it.

But I can recall ip4 addresses en mass in my head. I can't memorize ip6 addresses easily. Plus, I'd like to keep pockets of private IPs that are never accessible as routable targets.

Can you explain why I wouldn't to do this or why I should evolve my understanding of ip6 better?

IPv6 address don't have to be long or publicly routable. ULA addresses (fd00::/8) are the IPv6 analogue to RFC1918 and you could theoretically use ULA addresses as short as fd00::1, fd00::2, fd00::3, and so on. Of course if you do this you run the risk of colliding with other people, so you're encouraged to randomly generate the next 40 bits after fd, which leaves you with addresses like fd32:5e26:381d::1. That's longer than IPv4 addresses, but it's a pretty fair tradeoff to get a globally unique address.

Even non-ULA IPv6 addresses need not be long. 2600:3c00:e000:6c::1 is the address of my server over at Linode, and I don't find that bad at all.

There's no real need to memorise IP addresses, that's what DNS was made for. If your servers are on the internet at large then they probably have DNS already, and if its a local network then most operating systems will now automatically work out where machines on the .local domain are (I'll be honest, I don't fully understand how that works).

I've had only a very limited exposure to IPv6, but it seemed to me that the slogan "DNS solves it for you!" doesn't really pan out. It solves it if you're on a well-set-up network and have your DNS up and running happily, but with the ad-hoc networks my [limited] experience has seen, it hasn't been trivial. Essentially, it means you have to run an interpreter service (the DNS) to understand the network - one more bit of software to configure and troubleshoot... though to be fair, IPv4 was also quite confusing when I first started playing with it.

Won't mdns / zeroconf / avahi / however it's called this week work for ad-hoc networks? It surely does the trick in LAN.

It won't work properly across routers, at least not out of the box (tried that when configuring Tinc VPN), but maybe this would be a good direction?

The word you probably wanted to use was “Zeroconf”.

“Zeroconf”¹ is a name for the sum of two interacting standards, namely mDNS”² and “DNS-SD”³. Avahi⁴ is a free software implementation (for Linux and BSD) for a service where programs can register Zeroconf services (name & port number) and have Avahi announce them on the network. The other major implementation of a daemon of this kind is from Apple, and it is called “Bonjour”⁵.

This often gets confused, so, again: Zeroconf = standard. mDNS and DNS-SD = component standards. Avahi = A specific free software implementation. Bonjour = A specific proprietary implementation.

1) http://zeroconf.org/

2) http://www.multicastdns.org/

3) http://www.dns-sd.org/

4) http://avahi.org/

5) https://www.apple.com/support/bonjour/

I think it is a safe assumption that Sam knows what the DNS was made for.

Why does it matter if an address is public or not? Whether the address is publicly routable has nothing to do with whether the host is accessible... Or don't you have firewalls?

You can also bind services to selected interfaces only. That's what I tell to people complaining about publicly routable addresses.

You can always create multiple routing instances (namespaces) for overlapping IPv4 blocks with a VRF-based strategy. This would create separate FIBs and IMHO is a cleaner approach than mangling packets with iptables.

Facebook supports ipv6, but maybe it is external only? I know Google is ipv6 internally too. Although as EC2 has no ipv6 maybe that was the blocking issue.

Facebook makes extensive use of IPv6 internally. As you suggested, EC2's lack of support is the issue.

Edit: here are the slides from Facebook's presentation to the IPv6 World Congress in March about their internal IPv6 use. If IPv6 interests you, they're definitely worth a read: http://www.internetsociety.org/deploy360/wp-content/uploads/...

Thumbs up for the interesting read. It's amazing how Amazon doesn't support IPv6 yet, though

You can reach Facebook over IPv6 at https://www.v6.facebook.com

Interestingly enough, last week when they had their 45 minute outage worldwide with some kind of routing problem, it was still up and running on that address

Facebook has AAAA records on the main https://facebook.com domain, so you don't need the v6 part anymore.

And the IPv6 address ends face:b00c:0:1 - hat tip to them.

Agree in principle, but it's unlikely for Amazon to ever force customers to allocate addresses from their own IPv6 block to begin with. Like many other startups, Instagram started with the bare-minimum setup outside of VPC, and scaled from there, not even knowing the benefits of VPC before it was too late for a simple cutover.

Amazon doesn't have to force customers to use their own IPv6 blocks. EC2 instances, whether inside or outside VPC, would be assigned unique IPv6 addresses from Amazon's address space (which would be extensive). If this were the norm when Instagram started out, it would have been just as easy to use, and there would have been no clash of addresses when migrating to Facebook.

Are you saying that Amazon would divide up it's own IPv6 address space and provide a subnet for each customer? That's really the only way it would work for our situation. I'm not sold that Amazon would actually have any motivation to go through the trouble to provide this as it would probably only impact 1% of their customers at best. The most rational path for them is to just assign addresses from a big pool and not bother with the all the fancy subnetting unless the customer asks for it (VPC).

Yes, Amazon should allocate a static /64 to each customer, possibly even more on (free) request. That would also make for very easy firewalling rules where you can whitelist connections from your own instances with just one simple firewall rule.

>The most rational path for them is to just assign addresses from a big pool and not bother with the all the fancy subnetting.

That's what he's suggesting, though (I think). Except because it's IPv6, Amazon's big pool of addresses would never conflict with Facebook's big pool of addresses.

One thing that may not have come across in the post is that one of the reasons we moved into VPC first is that Direct Connect is basically just a big dump pipe to AWS without VPC in place. Without VPC, there's no way to advertise routes for just your instances or to ensure that only traffic to/from your instances goes across the Direct Connect.


> Are you saying that Amazon would divide up it's own IPv6 address space and provide a subnet for each customer?

That's what Linode does. Everyone gets a /64 to use as they please.

At AWS's scale, 1% of customers is quite a lot of people and quite a lot of money.

I wonder why Instagram wasn't using VPC in the first place. I've been using AWS for a startup for a few years now and I had our instances running in VPC from about the second month onward.

It's been one of the best architecture decisions I've ever made. At this point we only use one public IP address. (If direct access to a machine is needed then you can connect via VPN running on the one bastion host with the public IP address, and this gives your machine access to the local IP addresses of instances running inside the VPC.)

All the machines in our cluster are protected inside local VPC address space, with the access by the external world being ELB to expose public service endpoints like the API and website. I can't think of any good reason why you wouldn't be using VPC in the first place. Having public IP addresses for private machines sounds like a recipe for disaster if you ever accidentally miss a port in your security rules.

Mike from IG here. VPC was barely a thing when we got on AWS (2010) and at the time not the default. I would definitely have done VPC from day 1 in hindsight, though.

Hindsight is 20/20.

I think you guys did an exceptional job to tackling a really difficult problem (I've been in the same position, migrating EC2 to Datacenters) and we determined that EC2 -> VPC -> Datacenters is really the only way, and Neti solves it surprisingly well.

Going forward, hope that acquired companies opened their AWS accounts late enough that Amazon forced them to use VPC.

We're small, comparatively - 20-30 servers max - and we need to get in to VPC for a new cluster that requires static internal IPs. (Reboot an EC2 Classic instance and you may get a different 10.x address.)

In any case, the migration is daunting even at our size, although our devops team size is 1. I do wish they had VPC when we started.

You could also just attach EIPs and use those, right?

In an incredibly late reply - EIPs are public-facing, I need internal IPs for fastest possible LAN routing.

If you assume they had no pressing need for any VPC specific functionality, you can get similar security by locking your security group/s down to only ELB for public service ports and having one instance in another security group with ssh/vpn allowed (to specific ips) as a jump box/vpn. Spending weeks of multiple teams engineering time to move to VPC without a pressing need would seem to me to make little business sense.

Agreed. This is the route I use and it works fine. I can see how it could quickly get out of hand with a lot of security groups, and I would love some sort of security group inheritance, but for -100 instances, it is not the hard to keep the public access to ELB.

Has Facebook ever been public about the tools they use for deploying new machines onto bare metal with Chef? My company faces similar problems, albeit at a much smaller scale, but still...I'm wondering what they have in place of a tool such as http://theforeman.org (which is very coupled to Puppet).

I think they just said this https://www.youtube.com/watch?v=SYZ2GzYAw_Q

I think this is the biggest takeaway from the article:

Plan to change just the bare minimum needed to support the new environment, and avoid the temptation of “while we’re here.”

Good engineering is knowing how to act with surgical precision when necessary. This is what allows a craft like programming to operate in the confines of a business.

It's usually the stateful stuff that proves challenging in big datacenter moves, but I don't see any mention of data copying, replication, or moving. How did you guys tackle the problems of keeping data in sync and doing a clean cutover?

Is neti open-source?

Managing iptables across datacenters and nodes would be a fun project to do with something like Serf (http://www.serfdom.io/)

At the Velocity talk today, he said they're going to make it open source.

why not just create a vpn between the nodes with another private IP space and send your data through that?

"This task looked incredibly daunting on the face of it; we were running many thousands of instances in EC2, with new ones spinning up every day. In order to minimize downtime and operational complexity, it was essential that instances running in both EC2 and VPC seemed as if they were part of the same network. AWS does not provide a way of sharing security groups nor bridging private EC2 and VPC networks. The only way to communicate between the two private networks is to use the public address space."

That is essentially what Neti does, except instead of static mappings, its dynamic and software configurable (which is pretty much the only way to go when you're entire environment is virtual and the underlying network equipment is out of your control).

Using a VPN would still be an option. Why write essentially your own VPN (neti) instead of using an existing VPN solution? VPC is not the only VPN you can use on EC2.

I believe Neti was a better solution at their scale (thousands of VMs, a dynamic production environment, etc).

So I guess it was a bit of hyperbole. A straight forward solution to a straight forward problem.

Perhaps that would become a bottleneck at Instagram's scale?

I believe it becomes a bottleneck at the scale of like, 5 servers.

This is a tough problem, Neti is a heck of a lot better than tons of VPN connections everywhere.

you could probably do hundreds of servers, but you would want to die.

I got an error page inside the Instagram mobile app a few days ago and was surprised to see Facebook server chrome around the error message.

I'm impressed out how fast they got this migration done, considering how massive the scale they operate at is.

I'm wondering if they got nailed by out-migration charges and how much that was. I assume a bunch of their images were in S3. Amazon charges a pretty penny to take things out.

I'm confused by this. S3 GET requests are the cheapest request type, and getting the images out would just cost you the bandwidth involved.

Maybe you're mixing things up with Glacier?

GET Requests are cheap But I was thinking of bandwidth costs to get things out of S3 entirely and do a complete outmigration. But the prices there have come down quite a bit since I last checked.

Bandwidth costs of a one-off transfer out would be a lot less than they were already paying to serve those images out of S3 to the public.

They could have used the export service:


Any reason not to choose docker over lxc? Is it because fb data centers are already lxc friendly?

The existing Facebook deployment system supports running deployments within an LXC (and setting up cgroups, &c.) and was written well before docker was available.

Some background:

* http://www.slideshare.net/dotCloud/tupperware-containerized-...

Contributors? What contributors? People from within the company or open source contributors?

and now Instagram can share ALL of its data with the US gov too.

And those "numerous integration points" are?

Mike from IG here. Some early wins are integrations with spam fighting systems, logging infrastructure, and FB's Hive infrastructure.

These engineering feats are truly impressive and worth writing about.

And yet every time I read about this kind of stuff I think, how glad am I that we are building a DISTRIBUTED social network and will never have to solve problems on this massive scale! We won't have to move millions of other people's photos here or there if everything is distributed from day 1. People will be able to move their own stuff easily wherever they want.

"Facebook’s private IP space conflicts with that of EC2"

^ That wouldn't've happened in GCE (i.e., they should have been acquired by Google).

Are you sure? This document suggests GCE instances use the 10.x.x.x address space (just as AWS instances in EC2 Classic do):


""" Although Compute Engine doesn't allow creating an instance with a user-defined local IP address, you can use a combination of routes and an instance's ‑‑can_ip_forward ability to add local IP address as a network static address which then maps to your desired virtual machine instance.

For example, if you want to assign specifically as a network address to a virtual machine instance, you can create a static network route that sends traffic from to your instance, even if the instance's network address assigned by Compute Engine doesn't match your desired network address. """

Meaning they could have avoided conflicts using this mechanism.

Maintaining thousands of forwarding/routing configs sounds just as nasty as implementing Neti.

At any rate, Instagram's been around since 2010 and GCE didn't exist until June 2012 (and wasn't generally available until this past December).

It is amazing how a post like this could reach the front page of hacker news just because it comes from Instagram rather than for its technical relevance.

They mentioned Neti but didn't dig into details other than "a dynamic iptables manipulation daemon, written in Python, and backed by ZooKeeper." and they mentioned the ip blocker which is an issue on almost every migration.

Also taking into consideration that they didn't write a post in the past 10 months, I am sure that they can do it better.

Called it, five years ago: http://www.web2expo.com/webexny2009/public/schedule/detail/9...

Run in multiple clouds from day one. Take the pain. It gives you flexibility. Basic vendor management 101.

Maybe it's worth the pain if every startup became the size of Instagram and got bought by Facebook, but that's only true for a tiny, tiny fraction of a % of all startups.

Yes - truth be told, I am not sure if there's much to take away from this article if you're not really aiming for something huge. And even then, maybe you should spend time on what'll get you traction rather than trying to design an architecture for what you hope to become. Instagram is very much an outlier.

> Run in multiple clouds from day one. Take the pain. It gives you flexibility. Basic vendor management 101.

While "taking the pain" may yield flexibility in the long run, the most important thing in the short run is making sure that you are building something that people want, listening to users, and iterating the tech side of things as quickly as possible. I suspect that most devs have enough trouble dealing with a single cloud provider and that trying to work with multiple would could a significant decrease in iteration speed. I think that approach would kill most startups because of the technical overhead incurred.

This seems like exactly the sort of advice that is great in retrospect once you reach scale but is effectively useless until you hit that point. There are far more important things for a startup to worry about (product market fit, retention, stability) than how to make the tech side of an acquisition easy.

This article mentions nothing about latency.

We're undertaking a similar project and the latency is almost negligible. In fact the latency is lower bridging between classic and vpc in the same availability zone than between two classic availability zones.

That's not multiple clouds, though.

They mentioned AWS direct connect which helps to reduce latency.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact