We just migrated over to VPC as well, and came across a really weird "bug".
We auto-scale EC2, and randomly when auto-scaling, the new server couldn't connect to memcache (ElastiCache). Note that when you migrate over to VPC you have to migrate everything -- launch new ElastiCache servers in VPC, EC2 servers, RDS servers, etc.
Back to the bug.. I'd ssh into the EC2 server, and when I telnetted to memcache, it wouldn't connect. I terminated the EC2 server, and a new server comes up and can connect fine. I made a forum post in AWS forums and got zero responses. We then bought into AWS support and I submitted a ticket.
The problem: I launched my ElastiCache servers in the same subnet as my EC2 servers. Apparently the ElastiCache servers by default remembers servers in the same subnet by Mac address. Since we were cycling EC2 servers, eventually we'd get one with the same Mac address but new internal IP address, and I'm no networking guy but apparently this caused a routing problem.
Solution: create a new subnet and launch all the ElastiCache servers in that subnet. I did that, and it fixed the problem. The AWS support rep said if the ElastiCache servers are launched in their own subnet it will force them to go by IP instead of Mac address.
Yup, EC2 instances should be issuing a gratuitous arp on startup. I've seen the same thing on subnets with a lot of DHCP churn due to constantly rebooting embedded devices.
eventually we'd get one with the same Mac address but new internal IP address
This sounds like an EC2 bug, not an ElastiCache bug. EC2 uses "locally administered addresses" for MACs (as it should), but the administrator assigning MAC addresses is responsible for ensuring that addresses are not reused within a collision domain.
We also migrated a while back (opbeat.com).
While we run a smaller setup, I imagine that might be the case for most readers. We run a pretty standard setup with ELB, Postgres (master/replica), webservers and job processing servers.
This recipe details what we did (as i recall it):
1) Prerequisites: Running at least two of everything in
separate AZs and expertise (or courage) to fail over
to a replica DB.
2) Boot up instances of everything in the VPC
3) Set up a new ELB inside the VPC, add the web servers inside to the ELB
4) Make sure your instances inside the VPC can talk to
which ever service they need outside (replica database
and web servers needs to reach master outside +
memcached). Use `telnet` to make absolutely sure :)
5) Make sure web and job servers can reach the
replica db and memcached inside
6) Test out the new VPC ELB from outside
7) Switch DNS over to the new VPC ELB, wait for it to
propagate.
8) Do a failover from your master to the replica inside.
9) Same for memcached
10) Shutdown everything in EC2 classic.
11) Drinks
I'm used to AWS pricing but I'm still a little fuzzy on how AWS VPC pricing works.
It is $0.05 per VPN Connection-hour. But in this context what is a "VPN connection?" Do you literally just set up your cloud for "free" (aside from paying for instances, etc) and then only pay $0.05 for every hour you spend connected to the private cloud externally?
Does the VPC have any external visibility aside from the VPN connections? And if it does, what is stopping you just setting up your own VPN server and bypassing the $0.05/hour rate?
> Does the VPC have any external visibility aside from the VPN connections? And if it does, what is stopping you just setting up your own VPN server and bypassing the $0.05/hour rate?
Some do that, but the ~$30/month you're saving is likely eaten up by the cost of setting up and managing it and the software VPN instance you have to run.
VPC instance pricing is the same as EC2 pricing. The VPN pricing is if you want to setup a VPN between your office (or some other location) and your VPC.
This has to be pretty high up the list of "Most creative uses of Zookeeper". I'm really impressed with the ingenuity involved with building such a glorious hack.
The details of the migration are juicy and it's probably a good idea to do it anyway, but I can't stop thinking that the initial stumbling block (clashing of private IP addresses) wouldn't have happened with IPv6.
We auto-scale EC2, and randomly when auto-scaling, the new server couldn't connect to memcache (ElastiCache). Note that when you migrate over to VPC you have to migrate everything -- launch new ElastiCache servers in VPC, EC2 servers, RDS servers, etc.
Back to the bug.. I'd ssh into the EC2 server, and when I telnetted to memcache, it wouldn't connect. I terminated the EC2 server, and a new server comes up and can connect fine. I made a forum post in AWS forums and got zero responses. We then bought into AWS support and I submitted a ticket.
The problem: I launched my ElastiCache servers in the same subnet as my EC2 servers. Apparently the ElastiCache servers by default remembers servers in the same subnet by Mac address. Since we were cycling EC2 servers, eventually we'd get one with the same Mac address but new internal IP address, and I'm no networking guy but apparently this caused a routing problem.
Solution: create a new subnet and launch all the ElastiCache servers in that subnet. I did that, and it fixed the problem. The AWS support rep said if the ElastiCache servers are launched in their own subnet it will force them to go by IP instead of Mac address.
Anyway, hope this helps someone out ;)