Hacker News new | past | comments | ask | show | jobs | submit login
Migrating from AWS to AWS (instagram-engineering.tumblr.com)
206 points by mikeyk on Oct 23, 2014 | hide | past | favorite | 15 comments

We just migrated over to VPC as well, and came across a really weird "bug".

We auto-scale EC2, and randomly when auto-scaling, the new server couldn't connect to memcache (ElastiCache). Note that when you migrate over to VPC you have to migrate everything -- launch new ElastiCache servers in VPC, EC2 servers, RDS servers, etc.

Back to the bug.. I'd ssh into the EC2 server, and when I telnetted to memcache, it wouldn't connect. I terminated the EC2 server, and a new server comes up and can connect fine. I made a forum post in AWS forums and got zero responses. We then bought into AWS support and I submitted a ticket.

The problem: I launched my ElastiCache servers in the same subnet as my EC2 servers. Apparently the ElastiCache servers by default remembers servers in the same subnet by Mac address. Since we were cycling EC2 servers, eventually we'd get one with the same Mac address but new internal IP address, and I'm no networking guy but apparently this caused a routing problem.

Solution: create a new subnet and launch all the ElastiCache servers in that subnet. I did that, and it fixed the problem. The AWS support rep said if the ElastiCache servers are launched in their own subnet it will force them to go by IP instead of Mac address.

Anyway, hope this helps someone out ;)

I will make sure that this gets to the ElastiCache team! Please feel free to send more info my way; my email address can be found in my profile.

This sounds like arp cache. And can be "fixed" by arping (forced arp responses without a who has)

Yup, EC2 instances should be issuing a gratuitous arp on startup. I've seen the same thing on subnets with a lot of DHCP churn due to constantly rebooting embedded devices.

Pretty much textbook symptoms of arp-caching.

eventually we'd get one with the same Mac address but new internal IP address

This sounds like an EC2 bug, not an ElastiCache bug. EC2 uses "locally administered addresses" for MACs (as it should), but the administrator assigning MAC addresses is responsible for ensuring that addresses are not reused within a collision domain.

We also migrated a while back (opbeat.com). While we run a smaller setup, I imagine that might be the case for most readers. We run a pretty standard setup with ELB, Postgres (master/replica), webservers and job processing servers.

This recipe details what we did (as i recall it):

  1) Prerequisites: Running at least two of everything in
     separate AZs and expertise (or courage) to fail over
     to a replica DB.
  2) Boot up instances of everything in the VPC
  3) Set up a new ELB inside the VPC, add the web servers inside to the ELB
  4) Make sure your instances inside the VPC can talk to
     which ever service they need outside (replica database
     and web servers needs to reach master outside +
     memcached). Use `telnet` to make absolutely sure :)
  5) Make sure web and job servers can reach the
     replica db and memcached inside
  6) Test out the new VPC ELB from outside
  7) Switch DNS over to the new VPC ELB, wait for it to
  8) Do a failover from your master to the replica inside.
  9) Same for memcached
  10) Shutdown everything in EC2 classic.
  11) Drinks
EDIT: formatting

I'm used to AWS pricing but I'm still a little fuzzy on how AWS VPC pricing works.

It is $0.05 per VPN Connection-hour. But in this context what is a "VPN connection?" Do you literally just set up your cloud for "free" (aside from paying for instances, etc) and then only pay $0.05 for every hour you spend connected to the private cloud externally?

Does the VPC have any external visibility aside from the VPN connections? And if it does, what is stopping you just setting up your own VPN server and bypassing the $0.05/hour rate?

VPC is free (and actually, new AWS accounts launch instances by default into a default VPC Amazon sets up for you).

The $0.05 per VPN Connection-hour is only if you want to connect your own network to your VPC. http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VP...

> Does the VPC have any external visibility aside from the VPN connections? And if it does, what is stopping you just setting up your own VPN server and bypassing the $0.05/hour rate?

Some do that, but the ~$30/month you're saving is likely eaten up by the cost of setting up and managing it and the software VPN instance you have to run.

One would assume the VPN also has a higher network throughput then any instance under $30.

VPC instance pricing is the same as EC2 pricing. The VPN pricing is if you want to setup a VPN between your office (or some other location) and your VPC.

This has to be pretty high up the list of "Most creative uses of Zookeeper". I'm really impressed with the ingenuity involved with building such a glorious hack.

The details of the migration are juicy and it's probably a good idea to do it anyway, but I can't stop thinking that the initial stumbling block (clashing of private IP addresses) wouldn't have happened with IPv6.

Is Instagram still using a modified version of Django 1.1 (if I recall that was the version they were on) or do they stay on the newest version now?


Dogfoodin' it to the extreme

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact