Hacker News new | past | comments | ask | show | jobs | submit login
Writing scalable backends in UDP (mas-bandwidth.com)
40 points by gafferongames on April 9, 2024 | hide | past | favorite | 78 comments



> While I'm confident that an experienced senior engineer could find a solution over a weekend,

What they mean to say is ".. an experienced senior engineer *with my exact skill set* could ..."

Senior engineers vary wildly in their skills. Would you expect a senior engineer with experience with (to pick a similar set) Ansible, Rust and AWS to implement this over the weekend using Golang, Terraform and GCP? Obviously not and, just as obviously, you don't need such a wide difference to quickly bleed from "over a weekend" to "over a week".

These sorts of bizarrely complex and specific code challenges would at least serve to greatly reduce the number of resumes you have to review. So that is one upside. Unfortunately what that selects for is probably not as obvious as it would seem as many of their candidates would spend that full week on the problem and I wouldn't apply for fear of a workplace culture that values their employees time that little.


> These sorts of bizarrely complex and specific code challenges would at least serve to greatly reduce the number of resumes you have to review. So that is one upside. Unfortunately what that selects for is probably not as obvious as it would seem as many of their candidates would spend that full week on the problem and I wouldn't apply for fear of a workplace culture that values their employees time that little.

It also adversely selects for something else.

In general it doesn't matter what specific technologies someone has used in the past, if they do the same thing. When you hire someone their first few weeks/months are going to be less productive because they have to learn your internal systems and even if you use AWS and they've used AWS, that doesn't mean it will be the same subset of AWS. Whereas by the time they've been there a year they'll have learned the relevant systems regardless of whether it's cosmetically different than the one they used before.

Selecting for the exact set of technologies your company currently uses is going to filter out attractive candidates. Worse, it selects for a monoculture. The last thing you want is to spend two years building something in Golang that exists in the Rust standard library because none of your engineers are aware of that.

And it's simple enough to ask the same question without explicitly specifying specific technologies.


If somebody applied, and had strong opinions about a particular stack or approach, I would offer that they could choose to use their existing stack to solve the problem, if they wish.


Do you review their submissions? Do you use the problem to probe the applicant's thinking? Your blog's focus on the details of the task make it sound like you're more interested in using the task as a filter than anything else, but surely there's more to it.


The problem is usually solved in tandem with myself or another full-time engineer. We offer hints and use it to watch the candidates thinking. The best candidates usually surprise us with their solution.

Do we use it as a filter? Only to the extent that it filters out candidates who don't want to put in the work, or who can't actually code.


Perhaps it also filters out candidates who want to rewrite everything in rust.


That's pretty funny. ISTM that now that you have HN commentary you're thinking a bit more expressly about what you're really trying to achieve with this interview question.


To be clear, this is actually a historical interview question that we've used successfully in the past.

We're no longer hiring for this specific role, but I figured writing a blog post to share the solution to help people who for one reason or another, find themselves needing to implement a highly scalable UDP backend, they would now hit this article and at least have some pointers about SO_REUSEPORT and so on as well as increasing socket send and receive buffer sizes, ulimits and so on.

btw. This question actually comes out of a core part of the Network Next backend, which was problematic for us and a thorn in our ass in various ways over a period of 5 years. It took us an extremely long time as engineers not familiar with everything needed to solve this question, to correctly solve it and scale up past 25M+ clients. As such it feels like information worth sharing, after all is said and done.


> It took us an extremely long time as engineers not familiar with everything needed to solve this question, to correctly solve it and scale up past 25M+ clients.

But you want a senior applicant to solve it in a weekend. What you'll get is people who are already somewhat familiar with these issues to apply. Chances are such people are indeed senior and can think their way out of a paper bag. In fact, you'll get very valuable people if you get them at all. I hope you understand just how valuable.


You seem really hung up on this as an interview question. I repeat that it's a historical question that has been really successful for us, and we scale it up and down according to the experience level of the candidate.

This is honestly a toy problem for people with the direct experience that we are looking for, and there are a bunch of levels of implementation according to the level of the candidate that result in a pass. We've used this successfully to hire candidates who don't even know golang, and to filter out engineers who bluster and just wave hands and say "this is easy just XYZ...". OK cool, show me.

Like all good interview questions it has an infinite number of solutions and it's primarily used to determine if the candidate is "smart and gets things done" AND as a test to filter out candidates who talk the talk but can't walk the walk.

YMMV. It's being shared now primarily to help people who for some reason or another need to code backends in UDP, once the solution is published, at least there will be something that pops up in the google search results and gives them some hints.

Jesus. Next time I won't even bother trying to help people :)


The reaction you're getting is that the question seemed abusive. I also had that sense, but I was also very curious about your question, so we explored it, perhaps not unlike how we might have in an actual interview. Interviews are two-way streets. It seems to me that you're a bit defensive about your interview question! Now what? :)

Mind you, I do like this sort of interview question as a question not so much as a fetch-a-rock exercise.

The thing is that when you ask this sort of question you can very easily explore the candidate's ability to explore the solution space and show off what they know, but there's an end time, while with a task it can easily become a time sink, especially if the candidate chooses to go above and beyond and... fail to make the cut because in that time you picked a different candidate.


At this point you and others in this thread are simply arguing against a strawman of your own creation. Can't really see any value in continuing this discussion.


Besides, if you're interviewing senior engineers their CVs/portfolios probably speak to their ability to implement, and what you're really after is their ability to reason. Here the key elements of the problem are: the need to scale horizontally and the need to cost out the result -- everything about Golang and Terraform is just implementation details.

TFA seems much more interested in fetch-a-rock as a filter than the specific problem as a way to gauge how the applicant thinks.


You might be underestimating how much thinking is required on the part of the applicant to fetch-a-rock at 60M requests per-second.


In your setup the applicant is asked to build a proxy, and that proxy should scale horizontally just fine. When you get to C10M you start thinking about unikernel-style or user-land networking stack style solutions, but that's not going to be a weekend project, and the conversation will get interesting. If you really expect a weekend project here then the answer to scaling the proxy is bound to be: horizontally (meaning: not that interesting).


I would argue that both vertical AND horizontal scaling are required, because the winning solution in the end at the absolute highest level of this question, is one that gets the work done for significantly less $$$ than my golang implementation, in whatever language, stack, cloud or bare metal you deem appropriate.


The vertical component is about efficiency. It's very hard to get past a certain point with a general purpose OS and Golang. You'll need NCPU threads/processes and no context switching, so you're definitely talking about:

  - unikernel or user-land NIC driver and
    network stack

  - async/await or continuation passing
    style to reduce the per-client/request
    memory footprint
I'm not applying though, but I'm saying that "Golang + super-efficient" is definitely not something I'd assign as an interview weekend project -- that's something to be discussed in detail during the interview, and I'd expect no more than a horizontally scalable solution over a weekend. Getting the vertical efficiency tuning done will take a lot of time and cannot be expected to be a weekend or week project by an unpaid applicant, but you can expect the applicant to tell you how they'd go about it.


Some people when tasked with a simple problem, just love to turn it into an overly complex one.


What I mean to say is that we're hiring Golang engineers.

I mean, is that OK? To have a technology stack that you are hiring for (Google Cloud, Golang, Terraform etc), and to look for candidates with experience with that stack, or who would consider learning that stack?

ps. This interview question was designed to also be a great question to learn that stack with, but I would not expect a senior programmer would be able to do so in one weekend -- hence I gave programmers as much time as they wanted, in many cases, several months to play around with the stack and see if they enjoyed it in their spare time. Some engineers even asked for some budget to play around with it in Google Cloud and I provided it.

Or should we just code each service in Rust or Elixir or Node or F# or whatever language stack a candidate prefers that day?


It's not the tech they are using but the size of the project. This project requires simple tasks but spread over multiple technologies such that most developers won't have experiences with all (or even many) of them. The key thing here is the time. Having just done a ton of interviewing I can tell you that almost every code challenge has some form of this issue and so you learn to just multiply the time they say by 3-4 (as a rough average). Thus taking ones that say 'a few hours', which means a day or two at most, are feasible but the ones that say 'a day or two' or 'a weekend' are not.

Also remember candidate are usually not only applying to your job. They will easily have 3-4 of these going in parallel and with each taking up that much time is untenable.


That's a completely fair point


> I mean, is that OK? To have a technology stack that you are hiring for

You're aware that a senior engineer will take a trivial amount of time to get up-to speed in a programming language paradigm they are familiar with.

And, an entry level engineer will have the same learning curve--their core source of inexperience is not the programming language.

So, no, the technology stack would be a very low signal for me if I were hiring.


That's great, we're trying to filter out engineers who can't learn new things quickly, and who lie about their experience level and the sort of work they can do. People do this a lot, unfortunately.

So... we actually ask them to demonstrate to us that they can learn new stacks quickly. That's the point of the question. If somebody doesn't know Golang or the stack directly, that's awesome. They can learn it quickly and show us.


> That's great, we're trying to filter out engineers who can't learn new things quickly, and who lie about their experience level and the sort of work they can do. People do this a lot, unfortunately.

I feel for you. It does happen a lot. A portfolio matters. References matter. But it's really nice that you give unknowns a chance too, and to do that you really do need a filter.

And it's a fairly trivial filter, since you're asking the applicant to build a proxy.

I don't think you'll get too many applicants who don't already know the stack though because... think it of from their perspective: if someone is applying who knows that stack already, then they can beat the others to the punch, so taking 2x as long to learn the stack is likely to be a waste of time. I'm not sure how to make this "fairer" so that you really do measure learning time rather than comfort level.

Your filter is likely better at finding experienced Golang devs than experienced devs who can learn Golang quickly. In the short term this is probably very good for you, but the latter are probably better in the long run because people who learn quickly and well are more valuable than people who know a tool well.


Honestly this entire article was a red flag, and the fact that thats a quesiton they expect everyone to go home and build out for an INTERVIEW is a massive MASSIVE red flag, like are you kidding me you want a prospective employee to basically give you a solution for one of the main challenges of your company lol.


Strong no hire


I've been working in that domain for a long time - including being one of the main architects for HTTP/3+QUIC in a public CDN offering. And I'll agree with everyone that this is very niche question, and a great answer seems out of reach for most "senior engineers".

Translating a UDP packet into a HTTP request and back is reasonably easy. Yes, maybe one can do that in a coding interview with some pseudo code. But scaling it and making it reliable is yet another dimension.

Any candidate would need to understand that a single UDP socket itself would probably already a bottleneck for just running this on single machine, and figure out how to increase concurrency there. That's not easy - the amount of engineers knowing how SO_REUSEPORT works and when it doesn't work is low.

After that you start to dig into how you can actually spread load between hosts. Would an answer like I "I hope my cloud provider solves this for me" be good enough? Probably not. If it actually is, do candidates have to both understand the cloud providers native APIs and Terraform (mentioned in the blog post). Seems pretty unnecessary, terraform is just one tool out of the myriad of tools which can be used to configure cloud services. Not everyone will have used it before. Or would it even expect candidates to do a long writeup about the pro's and con's of client-side load balancing?

Are applicants required to talk about upstream connection pooling? Describe and implement a CDN like multi-tier architecture?

Last but far from least is that the requested architecture is very easy to misuse for denial of service and amplification attacks. Just being able to describe how to monitor and mitigate that is already a very very hard task, that very few specialists have worked on so far.

It's very fuzzy what would be good enough if this is a "homework task". At least in a synchronous interview the interviewer could give feedback that they are satisfied. So I think in a synchronous interview the question might be ok - but there will probably just be time to either talk about coding or about system architecture.


This is clearly a simple proxy that can scale horizontally. That should get you the task completed. The SO_REUSEPORT/epoll/io_uring stuff is definitely a point of research (and for TCP too, not just UDP), but it's doable (here it helps if the senior eng. applicant can read Linux kernel code and is resourceful). If you're going to be exceeding a 10GB NIC's bandwidth you'll have to talk about using multiple IPs, DNS tricks, client smarts, etc. And all of this assumes that the HTTP backend can go at least as fast as the UDP proxy, which... is a big assumption, because it's much harder to get an application to perform as fast as a proxy, and TFA is already asking a lot of a proxy.


Generally I've provided this as a homework task, with the ability to email me and ask specific questions to help guide the candidate over a period of whatever time the candidate wants.

There are definitely degrees of correctness and completeness and depending on the candidate experience and level, certain solutions are acceptable. For example, a totally naive implementation in golang that doesn't quite hit the scalability requirements would be a good conversation starter and would pass a mid level or junior candidate.

A senior or above "badass" candidate would be expected to hit the scalability requirements.

An incredible candidate would teach us something new about this problem that we don't already know.


> While I'm confident that an experienced senior engineer could find a solution over a weekend

Over weekend? Either I'm misunderstanding it (are you supposed to write your own NAT-pinning solution? TURN?) or this sholudn't take more than couple of hours.


As posed, this problem is a toy one and fairly easy.

1) The arrival rate is constant, vs the more realistic Poisson distribution

2) The response fits in a single packet

3) Requests are consistent size.

I don't think most companies are going to have the resources to take advantage of QUIC and HTTP/3 for a while.

While head of line blocking is theoretically gone, most programming is based around ordered delivery.

I see lots of projects doing single streams to minimize complexity, but that severely reduces large asset transfer.

I am sure the FAANG crew will gain a lot of advantages, but tooling needs to be there for the rest of us.

Obviously some will take advantage of it, but it is a shift in responsibility to the application and dealing with window scaling and out of order delivery is expensive.

IMHO, UDP is simply a target because legacy systems would make a new option that was superior difficult if impossible to implement.

As I have supported voip and streaming over the public Internet, I admittedly have a jaded view.

But I am not looking forward to supporting the quirks of every client when it was easy to hand that off to the OS.

Hopefully I am wrong, and SWEs will be good at dealing with ood and random failures and standards will offload the cost of implementing what TCP gave us for free.

But I don't think people realize how easy reliable, ordered delivery makes programming.


Cool. Show me


I actually received something similar for take-home challenge years ago so it may not be as "out there" as you think it is =) (at least if you work in telecom/networking space)


The specific challenge here is:

"You are tasked with creating a client/server application in Golang that runs in Google Cloud. The client in this application must communicate with the server over UDP.

Each client sends 100 requests per-second. Each request is 100 bytes long. The server processes each request it receives and forwards it over HTTP to the backend.

The backend processes the request, and returns a response containing the FNV1a 64 bit hash of the request data. The server returns the response it receives from the backend down to the client over UDP.

Implement the client, server and backend in Golang and Terraform such that it scales to more than 1M clients. Provide estimates of the cost to run the solution each month at a steady load of 1M clients, as well as some options you would recommend as next steps to reduce the cost.

IMPORTANT: When you load test this system, make sure you are communicating between the client, server and backend using internal addresses to avoid egress bandwidth charges."


Ok so if NAT punch-through is not needed I guess the only tricky thing is connection pooling on the backend side? I sort of agree with the other poster - that if you had done something like this before it's trivial and if you haven't, it's a bit hard to research on your own.


100%. There is a class of programmer for whom this would be a toy problem. There is another class of programmer who may be young or not directly experienced in this space, but who is very intelligent and an excellent problem solver and is willing to work hard. The question was designed to find both types of applicants.


We used a similar challenge which tasked a candidate to build a VPN over UDP (without the complex / annoying bits like the TUN interface). If you ignore encryption it’s rather trivial. I guess in this problem the challenge is mostly the backend connection, otherwise you could possibly do this on a single VM though Golang wouldn’t be my first choice for that, the GC becomes quite annoying when processing streams at high frequency.


Would you accept https://github.com/WireGuard/wireguard-go/blob/master/tun/ne... as a solution? =)

Edit: also i personally find TUN very convenient.


The UDP side of this doesn’t sound very hard. Most challenging part is probably the outbound HTTP requests (if I understood the problem correctly?).


The UDP part is hard because load balancers generally don't deal with UDP. And you can't trigger lambdas via udp (i'd have to check this, because i've never tried).

Sending the http request is easy, stuff incoming requests into an sqs queue from UDP side then have a lambda eating queue entries and sending them to the client. Because it's udp you don't need to keep endpoint state.

If api gateway doesn't do UDP you could network load balance the udp to a ecs autoscaling group with your udp listener in it. You'd have to figure out what metrics you need to trigger autoscaling. You could also just do route53 dns routing and have some oversized instances as your handler; it depends on your budget + how much time you have.


Well actually NAT on the client side might make this more complicated.


Let's just say that there are some challenges on both the UDP side, and on the HTTP forwarding side.


In the NAT case just keep a list of the active UDP receivers and client IPs/source ports (which you need anyway). Then when the backend processor needs to send a response just sent it to the receiver in question to send back to the client, assuming their NAT works normally.

I'd probably stick them in redis, since it's fast and seems to be able to handle some ridiculous number of simultaneous clients. They don't need to be super-persistent since the map can be rebuilt next time a packet comes in.

Assuming the UDP packets come in on a timely basis you should be able to connect to the client relatively well. NAT tunnels expire in a minute or two generally, so as long as processing doesn't take more than a few seconds the NAT mapping should be fine. I'd configure the client to send every few seconds, just so if we fail over we can still talk (which latency = those seconds). I'd have to know more about the time envelope that's required tho.

Shouldn't take more than 2 weeks to implement/test in node. Learning go might add a week or two.


NAT makes this a bit more complicated because you can't use a load balancer with NAT for outgoing packets since AFAIK you can't send stuff to the LB and tell it what source port to use.

IPv6 makes it easier and harder. I'm not sure what happens if the ipv6 address isn't reachable due to firewall ingress rules, but I'm pretty sure you can't poke a hole in the NAT that doesn't exist. I would like to think that most home users prevent ipv6 ingress, but I have no way to test that.

It'll be interesting to look at the solution and see how they handled that.


You are absolutely correct.


Not related to the discussion but wanted say that I came across your username and rung a bell on my head. I've read and used the information of one of your posts (https://www.gafferongames.com/post/fix_your_timestep/) and found it very well written and useful!


Thank you sir! I've stopped writing on gafferongames.com and will be writing on mas-bandwidth.com from now on, focusing more on scalable backend techniques fused with multiplayer games. I wrote a post about XDP for games last week as well: https://mas-bandwidth.com/xdp-for-game-programmers/


I’m more curious about the effectiveness of this question as an interviewer. I’d have to imagine you get a fairly strong signal to noise ratio as whoever takes the time to get a working solution must be pretty qualified for your role. That said, you are probably leaving a ton of talent on the table by asking them to do (relatively) so much work.


It's definitely a pretty serious filter, and it's designed to be. The cost of hiring somebody who just cannot be effective with our stack is so high, that it's important that we test for somebody being able to learn our stack, and really wanting to work at Network Next. This question did a very good job of that.


I would also mention that there is a type of programmer, that we definitely do want to hire, that finds a challenge like this really easy, might even bang something out in a few hours in a different stack, and show us something amazing. We were open to that as well.


As an example for that sort of programmer, consider an engineer at the caliber and experience level of Marek @ Cloudflare, who would absolutely eat a question like this for breakfast...

https://blog.cloudflare.com/author/marek-majkowski

This question is designed to help find those people, in addition to people who are really willing to learn the stack with no experience and who are excellent problem solvers.


One essentially pushes the rate limiting token bucket and session management up the ISO/OSI stack.

Things like iodine tunnels can trip IDS/smart-routers/firewalls to flag hardware for malformed traffic.

Seen people try this before, and it ends rather predictably unreliable/slow.


The restriction to Go is downright self-defeating.

There are better languages that allow much higher performance ceiling for this kind of task.

Also seconding notes regarding encryption and the use of QUIC.


Go isn't the problem. The issues I have seen is that you are I/O bound by the kernel.

One company did Pion WebRTC + https://www.dpdk.org/ and is now doing 4x throughput. They tried switching to Rust (and still doing userspace networking) and it didn't make a big difference.


Which is why last-mile solutions often rely on userspace networking, for which low overhead FFI and tight control with unsafe primitives is required (you are unlikely to need the latter with Rust, as long as the data access patterns play nicely with static load partitioning and context switch minimization).


I agree, although it's actually quite easy to do it in Golang once you have the right approach. I asked the question in Golang because this was an interview question for Network Next, and the Network Next backend is implemented in Golang.


I would not touch Go, thanks.

Either Rust or C# would be much easier in terms of vertical scaling strategy to reduce replica count and increase individual pod utilization - both languages offer better tools for scaling on individual many-core instances, reaching much higher throughput.

Edit: (though in wider context naturally the switch away from existing stack in a company would never make sense as long as it is a compiled language with sufficiently good average performance)


You're more than welcome to demonstrate a solution in your language of choice


If that's the case given interview scenario, then I withdraw my complaints :)


Unless you are doing heavy parsing of the payload and payloads are large you will be bound by i/o with basically any language (maybe with the exception of python) and will have 0 issues supporting 1M clients even on old hardware


Cool. Show me an implementation in Python


Why is this idea so out there? Most streaming used to happen over UDP. It might still. Anytime you have a use-case where latency trumps reliability (and security), you should use UDP.

That brings me to another point: HTTPS isn’t preferred because it’s the defacto standard now, it’s the secure bit, the “S” in HTTPS, which makes it a better option than using plain HTTP over TCP.

By going completely bespoke means you’ll need to implement your own protocol that implements some level of security and redundancy in UDP communication.


> latency trumps reliability

Let's say I don't care about the initial handshake latency, have disabled nagle and have no packet loss. Is there still a measurable latency difference between UDP and TCP? It would appear that there is (based on yours and other comments) - but I don't know why.


No, assumign an already estabilished connection, TCP is just as fast if there's no packet loss, except for the tiny amount of time that the additional TCP stack logic takes on the CPU.

See eg here: https://cloud.google.com/blog/products/networking/using-netp...


You may have induced latency driven by window sizing/scaling/acknowledgement RTT as well.


If TCP says "you can't send more data yet because of congestion control" (like hitting the congestion window limit) then I guess you can categorise that as latency, depends on your viewpoint.

RTT of acknowledgements shouldn't normally enter into it, no? You can send while previously unacknowledged data is in flight (as long as congestion control state allows).


Primarily traffic cost, CDN services were much more important during the early days... and punching through NAT came with its own liabilities.


I forgot why we were doing this again hahaha. What's the point of this exercise? I mean it's fun, but I don't need another job right now. Is it a puzzle?


It's a puzzle. Getting to the solution requires some pretty creative thinking. Nobody in the comments has posted the solution yet (or come close to it).


I don't think anyone who can do it has enough free time to do it.

It'd be fun, but I'd rather be figuring out how to make box joints on my table saw. I do want to see how the solution code handles IPv6 client ingress issues, though.


I agree. I'd rather be gardening, but my version of the solution is almost done :)


I mean this is cool, but where is the solution/ answer/ approach?

Where is info about encryption, for example - very few libraries support DTLS or any UDO compatible encryption at all.


A complete worked solution with source code and terraform scripts will be published April 16th (one week from today).

This gives everybody on Hacker News time to come up with their own solution, if they want to try.

When I asked this as an interview question, I gave candidates permission to implement this without encryption (although asking about encryption is definitely +1).

Note that the response size is significantly smaller than the request size (intentional), so you don't have to worry about being used in UDP amplification attacks. (Being concerned and asking about UDP amplification is another +1)



Now that QUIC is widely supported this is probably a lot easier than in the days of yore, though 100m rps is no joke


How would quic help here? It's relatively much overhead compared to this "one packet in, one packet out" scenario.


Answer: HTTP/3


Cool. Show me




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: