Hacker News new | past | comments | ask | show | jobs | submit login
Scaling serverless Postgres: How we implement autoscaling (neon.tech)
24 points by nikita on March 29, 2023 | hide | past | favorite | 9 comments



Very cool! Was there anything additional you had to do in the network stack to preserve active TCP connections? Being able to do a live migration without dropping connections is fantastic!


There is a second network interface attached to each pod and that interfaces are connected to vxlan-based overlay network. So during the migration VM can keep the same ip address (APR will take care of traffic finding it's destination). That is simplest approach but has it's own downsides when overlay network grows big and painful to set up with some cloud SNI.

Few other options are:

* use some SDN instead of vxlan

* use QUICK instead of TCP in the internal network -- with QUICK we can change endpoint IP addresses on live connection


Very cool! Thanks :)

QUIC seems like a solid simplification for sure.


Why use Kubernetes if you’re in control of the cloud environment? What does it bring to the table? Why not firecracker?


In a lot of ways, having this be based on k8s provides a lot of flexibility and independence, and with k8s there's much less friction to providing computes with high locality relative to applications/users application code.

It's also the case that by staying with k8s we can take advantage of existing operational tooling, experience, and work, and can focus or development time on the important parts of this problem: runtime scaling, scheduling, and virtual machine management and not on cloud provider APIs and management.

In short, k8s gives us options that we like for the future, it's shortening the development cycle, and only getting in our way a below-average amount. At the same time--for the most part--we're building this with reasonable abstractions that would let us reuse our existing work if k8s becomes more trouble than it's worth.


Firecracker doesn’t support live migrations. There is a new project called cloud hypervisor and it showed a lot of promise, but we struggled to make it works and reverted to QEMU

As for k8s its an ongoing debate internally if the complexity worth the benefit. It helps us provision nodes but we have to fight it quite a bit too. It’s unclear we will keep it long term


I’m CEO at Neon. Happy to answer questions.


How does the network switch occur when you live migrate the VM? I see some mentioning of multus-cni, but I was not aware of anything in kubernetes that allowed you to migrate a network interface across nodes.


Take a look at this issue: https://github.com/neondatabase/autoscaling/issues/104. There is an image there that explains it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: