Hacker News new | past | comments | ask | show | jobs | submit login
Gossip Glomers: Fly.io Distributed Systems Challenges (fly.io)
309 points by yla92 9 months ago | hide | past | favorite | 77 comments

Love it. Thanks for putting this together! The actual challenges here [0].

Though I'm curious: are these different from the chapters in the Maelstrom documentation [1]? There seems to be a bit of overlap anyway.

[0] https://fly.io/dist-sys/

[1] https://github.com/jepsen-io/maelstrom#documentation

Thanks, Phil! Good question. Yes, there is some overlap with the Maelstrom docs but most of the challenges either have different goals or they are new (Kafka, Totally-Available Transactions). These challenges also are aimed at people who are familiar with distributed systems and want to test their ability as opposed to the Maelstrom docs where they are meant to walk folks through the issue.

We also contributed a Go implementation of a Maelstrom client library and we were able to fix a handful of bugs in Maelstrom during the process. Kyle uses Maelstrom for teaching classes as well so work we did on these challenges were also contributed back into the repo.

The fun part about these challenges is there isn't one right way to do them. Many of the later challenges have several approaches and they each have their own trade-offs. Some of the challenge goals are performance targets so you can continue to optimize your implementation to get better numbers.

I dig this, a lot. Seems like theres an intentional selection bias here in attracting people that love distributed systems, and I think that's cool.

As an aside, I think a conversational version of these challenges work well as a middle ground for anyone that think the take home coding approach may be too heavy-handed or not time efficient. That's how I do it in my funnel (sharding a database from scratch) and it only takes about an hour of face-to-face time. Most of the time you don't even need a whiteboard.

Thanks! These challenges aren't part of our hiring process. We do an at-home code challenge and then we do a short "workday" on our Slack instead. We wrote up some details about the process a while back: https://fly.io/docs/hiring/hiring/

Our workday is very much conversational -- it's hard to have whiteboards in Slack! We're fully distributed so Slack workdays are a good way to gauge how comfortable people are in that form of collaboration.

Reading a stack of MIT PhD dissertations may be a good Friday night

I'm happy if I can read through a single MIT PhD dissertation in a stack of days!

First I've heard of Maelstrom. Very cool.

me too, always learn something new everyday. thanks HN!

Another aside: after going through the first two exercises I got to thinking: is there anything out there that's like "Caddy For Distributed Systems"?

By which I mean is there a framework (ideally for Go) where I could "just" implement my client-facing API and something like your Malestrom package's `Node.Handle` function but all the rest like node discovery, updates, etc is handled for me?

Or put another way: I wish there was a turn key solution to releasing an API around something like etcd or Consul, but I'm having trouble finding anything like that.

Let's say I actually wanted to deploy my own distributed version of the grow-only counter challenge, it seems like there's a lot of non-trivial operations and such I need to get right _in addition_ to my message handling logic, and that feels like a gap that if filled would allow a lot more developers to get into distributed systems development, much like how Caddy has made it order of magnitudes easier for developers to implement custom reverse proxy logic.

Isn’t based around Consul or etcd, but the encore folks https://encore.dev/docs have been building a framework that makes working with microservices much more straight forward.

I haven’t used it much, but the tool is able to generate all the code for services that want to speak to one another. This to me is enough service discovery.

Dockerfile would be a really good idea.

Followed instructions to letter, got : main.go:7:5: no required module provides package github.com/jepson-io/maelstrom/demo/go $GOPATH is correctly set, I can see the package there, I've spent half an hour looking through incorrect internet answers and frankly I have better things to do than debug build scripts. Actually, no I don't. But I still normally get paid.

I tried reproducing the error and I think I know what happened. There's an instruction to run "go mod tidy" in Go's "go mod init" command but that could be more clear in the instructions on our site. I'll update that. Thanks for letting me know.

I pushed up the "go mod tidy" instruction so hopefully that helps other folks that run across that issue in the future. Thanks again for catching that.

As for Docker, that's probably a good idea for bundling Maestrom but it'd be tricky getting the Go dev environment bundled up in Docker as it utilizes several different build caches to make it fast on rebuilds. Mapping those onto Docker volumes or utilizing Docker's cache is going to be a different can of worms that probably wouldn't be any more enjoyable.

Total aside but thank you for discussing the term “sequentially consistent” that’s the exact thing I was searching for just a couple weeks ago but didn’t know what to call it (I was searching for monotonically consistent).

Gonna have to give Maelstrom a look over tomorrow!

The json from the first echo example is not actual json, there is a trailing comma that shouldn't be there :p

Ah, you solved the first challenge! Noticing my typos. :)

I fixed up and validated all the JSON. The fixed version will be up on the web site in a couple minutes. Thanks for letting me know.

I think the reply JSON for Challenge #2 should say "msg_id", not "id"?

".. represented by two separate yet equally important groups ...". Cue the Law and Order intro in my head. I like it.

Awesome stuff!

Just to share more in a course I know in the same area: open source training courses about distributed database and distributed systems by PingCap.


So this is some new elaborate hiring challenge?

But then when I open the linked jobs page the company seems to have exactly one open position. Is it that hard to hire one infrastructure engineer that you need to do...all this?

No. We contracted Kyle to help us build a hiring challenge, and he and Ben overdelivered dramatically, so we turned it into a public thing. They're just programming challenges.

If you're not interested in programming distributed systems, this won't be interesting to you.

hehe I'm literally in the middle of your actual hiring challenge with "Fix the Glitch" 6pn networking. But I've hit a wall with it. Maybe I'll switch over to these :)

I completed that exercise along with a Nomad one w ELK stack. And submitted it. And never got a response ever besides we will look at it ASAP. (3 months ago.). It was billed as a "2 hour" exercise, but to even get close to that you would need to be completely fluent with Fly, Nomad, ELK, Wireguard, Linux networking stack. I thought it was a great take home but you would at least expect a response for completing it successfully.

I don't have a better response for this than that we got a huge flood of candidates for that role, are doing our best to keep up, and have done an imperfect job. We're a small company. You're more than welcome to reach out directly to find out what happened with your application (it's entirely possible that what happened is that the ticket got buried somewhere, and that we'd have been excited to hire you had we not screwed something up on our end).

As for the scope of the challenge: that's deliberate, and we're up front about it. If the challenge is going to take you many hours to complete, and doing that work isn't something that lights you up, then it's the wrong role for you. There are candidates that breeze through it, and there are candidates who wrestle with it for a couple evenings just because they're nerdsniped and enjoy doing that stuff. Both those kinds of candidates are super interesting to us! But we're very comfortable with the idea that the job (in this case: infra engineering) isn't a perfect fit for everybody.

I really liked the take home. It is just that if you have someone doing this, you are having that person commit hours of their time to interviewing without the same investment on your end. So if you do not have the means or time to process these submissions then it is not fair to have someone spend their time on it.

I think your hiring process is great otherwise. And yes if someone isn't very familiar with every intersection of stacks there it will take them more time. That is fine too. It shows the person can learn on the spot.

I agree. We came to that conclusion about platform just a week or so ago and have paused hiring because we can't meet the commitment we've made about responsiveness. We haven't paused infra; instead, we've hired someone to help keep up with it.

I can't mitigate any bad experience you've had. I fucking hate getting ghosted, and I get a little nauseous when I think about us having ghosted people. But ghost people we have! All I can say is that it's not deliberate; it's just been a lot to keep up with.

For everyone else: my sense is that the majority of people in our process have gotten timely (if not as fast as I'd like) responses from us, but if it's been weeks and you haven't heard back from me, you can hit me up directly. We don't ghost people "communicatively"; that's just not how we convey decisions.

> it's just been a lot to keep up with.

Has it really?

Shall we check your HackerNews comments history and guestimate how much time you've spent commenting on randomness while ghosting people who've spent hours doing work for you?

Essentially, you've paused hiring for all roles at Fly including these 2 roles on your jobs page https://fly.io/jobs/ ?

We've paused platform hiring; we have plenty of slots for it, but we don't currently have the bandwidth to run hiring for it (and we're also metabolizing a bunch of the hires we currently made).

I'm not aware of us pausing any other roles; if they're not listed, it's likely we don't currently have openings for them.

That's even worse than my experience https://news.ycombinator.com/item?id=34523227

I'm pretty skeptical of these hiring challenges at this point. It seems that they often require more time than the traditional model and provide worse outcomes for the candidates. For me it was particularly frustrating because I can typically write off the traditional model as irrational but I was drawn to fly.io because they make it seem so much better than that.

If I were looking for a job, I would trade the usual stupid 1h leetcode interview for this take-home in an eyeblink.

This one is much more time-consuming _up front_, but at least it gives you a sense of the job to be done _and_ doing the exercise lets you think through whether you really want that job after all.

Well in that 1 hour leetcode session at least the interviewer is wasting the same amount of time as you. In this case you can spend all weekend working on the exercise and never even get a response.

This is an adversarial perspective of interviewing that does not get you anywhere because indeed, your time is always more precious than any company’s time.

By 1h leetcode, I meant the whole yak shaving of classical interviewing, which costs candidate many hours actually (in interview time and prep time).

A take home without actual feedback is a complete waste of everyone’s time. Who cares if you get a taste for the job in the process. You spend a weekend, the company spends 10 minutes.

That's an important knock against processes that do both interviews and WSTs ("take homes"). We don't: we exclusively use WSTs. We calibrate the amount of time our WSTs should take against the amount of time a typical interview process takes. There's a lot of stuff we can't do --- like expansive programming tests, or, for that matter, sharing our answer rubrics --- that we wish we could, precisely because we're fitting the whole process into a tight time budget.

We don't time our WSTs; it adds cortisol to the process that defeats some of the purpose of simulating what actually working here is like. So, it is the case that people can end up blowing way past our expected time budget working on things. We're not going to stop people from doing that; it would be hard to do, and we're also excited to get candidates who teach themselves stuff as they go. We're up front about this.

From my vantage point, this process is strictly better than conventional interviews. It demands the same amount of time from candidates, but allows the candidates to pick where and when to spend that time.

There is a lot of understandable enmity built up against "take homes" because firms have added them to their existing interview loop, so they become just another hurdle candidates have to clear before running through the same interviews they always have.

With respect to feedback: at some point in our process, we do start giving feedback. It has not been my experience that it is as appreciated by candidates as people on message boards seem to think it is. We've gotten to the point where we try to do a decent job of explaining what the best submissions have in common, and leave it at that. This kind of feedback is more than I'd ever come to expect from conventional interviews, so I'm at peace with it.

Finally, at to again be candid: we got a huge flood of candidates for several roles, and while for the most part we kept up and gave timely responses to people who submitted challenge responses, there have definitely been ticket mishaps that caused people not to get responses --- in other words, we've ghosted people. I fucking hated getting ghosted when I was applying for jobs and am mortified by the fact that we did it too. What I can say there is: we've never done that deliberately, and when we've caught that happening, we've gotten detailed feedback out quickly. So if the premise of your comment is: "submitting a code challenge and then never hearing anything back at all, even so much as a pass/fail, is bullshit", then yes, I agree, it's total bullshit. Not OK at all.

(With respect to the coding challenges we're talking about today: the public ones aren't a part of our hiring process at all! They're just there for people who are interested in them. We do have a related challenge Kyle and Ben worked on, as a post-L3 leveler. It's not a small challenge. But: any candidate that gets that challenge already knows they're getting an offer from us, so we're comfortable making it ambitious).

No, the premise of my comment is that faceless project-based assignments without applicant-specific feedback are a totally one-sided way of interviewing that completely caters to the company's values (e.g. scalability, leanness) and not the candidate's. Who spends 8h on an assignment, gets a generic "sorry", and does not wonder why? At least in a face-to-face interview I can at least go back in my memory and try to figure out what the cues were. There is none of that in a generic, fully automated screening process

Nothing about our process is "fully automated" --- as people who've gotten weeks-delayed responses to inquiries from me can attest, we've barely managed to automate email.

hehe yes the Nomad one w ELK stack took me longer than 2 hours for sure but was a lot of fun! I'll bet the fly people are just swamped. I loved the language in their email though. I've been on so many 1 hour live coding interviews where I can't shine like I want to shine this approach really made me go yes! A new way to interview.

We are swamped. We kept up as long as we could, and now we're making some changes:

1. We have someone running infra now that also owns infra hiring, instead of me owning it. Infra is an ultra-important role here and we're still hiring for it.

2. I'm still on the hook for platform hiring, and we've paused it for a couple months (if you submitted a challenge, we're still moving forward! but we're not taking new applications).

I know people on HN don't know these details, but knowing it myself makes it especially goofy to see people calling these challenges an elaborate scheme to recruit people for platform development (which is where these challenges apply).

It is also marketing (by virtue of the fact we are talking about it!)

Looks awesome, can't wait to give these a try.

A few weeks ago, someone at Fly threatened to disable a customer's account over Twitter (Alex Graveley, the creator of Github Copilot) because he complained about a minor-version breaking API change. Did anything ever come of that? I like Fly as a company, but overall the impression I get from their employees (even in this thread) is just...highly defensive and immature.

Edit: Here is the link to the thread. Alex's original tweets were deleted. https://twitter.com/alexgraveley/status/1619117645932150784

It's hard to pass judgement when the first couple of tweets are deleted, but this didn't read to me as Fly.io kicking anyone off the platform. Here is what my interpretation was of this interaction:

> Customer: Your service is bad and I had a bad experience > Vendor: Okay > C: I've given you multiple chances but you're still not up to my standards > V: (assuming C was done with the product) Sorry to hear that, but here is a refund

Am I missing something here?

    (1) @mrkurt: ... you can go ahead and find yourself another vendor.
    (2) @alexgraveley: [I'm] Trying! [to find another vendor]
Kurt thinks Alex is choosing to leave due to (2):

    @mrkurt: ok good luck [switching vendors by your own choice]. I'll refund all your payments.
Alex thinks Kurt was the one telling him to go due to (1):

    @alexgraveley: I'm sorry, you're kicking me off your platform ... ?
Typical human stuff. Maybe this is a great opportunity for an LLM extension - have it give people the benefit of the doubt before you respond nastily to them, by checking your response.

I've tried asking ChatGPT a few times to explain responses people have made to me on Twitter that I didn't rate highly but which received likes. It wasn't particularly insightful, but its unbiased perspective had a surprisingly calming effect.

EDIT: I fed all but the last line of this conversation into ChatGPT, then asked it to evaluate 'my' proposed reply (the final line) with the following prompt: "What is the presupposition behind my response? Can you point out any potential mistakes I've made in my assumptions?" It correctly identified the ambiguity in the statements, and made a helpful suggestion on how to proceed.

I don't see it in as clearly bad a light as babelfish does, but nor do I see it as positively as you do.

My rubric:

- CEOs shouldn't publicly call their customers dicks

- if a customer thinks the CEO is threatening to kick them off the platform and asks for clarification about it, they should get a clear response (Graveley's last tweet)

Both of these points are about professionalism of the vendor.

Perhaps the Fly CEO realized Twitter wasn't the right place to have this conversation and answered Graveley's question in a separate channel.

>A few weeks ago, someone at Fly threatened to disable a customer's account over Twitter (Alex Graveley, the creator of Github Copilot) because he complained about a minor-version breaking API change. Did anything ever come of that?

I hadn't heard of that, but I think that's an inaccurate characterization based on what I'm seeing of the conversation:

>Alex Graveley: [two deleted tweets]

>Kurt Mackey, Fly CEO: Wow, harsh. It's fair to be upset and an API change. But if you're going to be a dick on twitter you can go ahead and find yourself another vendor.

>Alex: Trying! It's not like this is an isolated event - gave you all a pass the first 4 times.

>Kurt: ok good luck. I'll refund all your payments.

>Alex: I'm sorry, you're kicking me off your platform because I was mean to the company on twitter?

Kurt was telling him that his behavior was unacceptable and that he didn't want to act as Alex's vendor if he didn't correct it. I think it's fine for companies to fire customers that are rude or abusive.

I can understand if Kurt said, "oh yeah? Well I just nuked your account with zero notice!" But it seems like Kurt was giving Alex the opportunity to exit the platform gracefully.

Also, I'd be a little more sympathetic if it was like Andy Jassy crushing a random company's business on AWS because they complained on Twitter. But this is a principal engineer at Microsoft presumably shitting on one of Microsft's underdog cloud competitors because he didn't like his experience using Fly on a pet project, so it's not like there's a huge power imbalance here.

Thank you for noting he is the CEO, I didn't realize that in my initial comment.

While I saw the two now-deleted tweets originally, I don't recall what they said now, but I do remember feeling like it wasn't anything serious enough to warrant the "find yourself another vendor" comment.

If Kurt is willing to (gracefully, as you said) crush someone's side project over a rude tweet, why should Fly be trusted with a potential business? I don't use TinyPilot but I am a big fan of your blog. Would you trust running any servers required for TinyPilot on Fly.io after that exchange, running the risk that if Kurt doesn't like one of your tweets, he (gracefully) will ask you to move to a new platform?

> If Kurt is willing to (gracefully, as you said) crush someone's side project over a rude tweet

I didn't read this at all. I read it as "we will refund you", not "we will kick you off". I know that fly.io frequently refunds payments for customers, if they feel the bill was unexpectedly large.

> running the risk that if Kurt doesn't like one of your tweets, he (gracefully) will ask you to move to a new platform?

I read this as "if you don't like us, leave us", not "I want you to leave us because I didn't like what you said". The first one is a bit more fair, because it's just a statement that if you don't like the way we do things, there are alternative providers, though some may say it's not the most graceful way to talk to an annoyed customer (picture "apologies for the breaking change, we will try to warn/communicate/avoid such changes in the future"). The second one is far more brazen.

Regarding your first point - the first thing Kurt said was "if you're going to be a dick on twitter you can go ahead and find yourself another vendor". I don't believe there were any billing issues as a part of this exchange, though Kurt's offer to refund was kind.

And regarding your second point, I read the above quote the same way Alex did, as being kicked off the platform ("go ahead and leave", not "you're free to leave").

>If Kurt is willing to (gracefully, as you said) crush someone's side project over a rude tweet, why should Fly be trusted with a potential business?

To clarify, when I said "gracefully," I meant "exit within a grace period," as opposed to "delete an account with zero notice."

And I don't read the exchange as Kurt crushing anyone's project as much as telling them to go elsewhere on their own if they won't meet Fly's expectations for professional conduct.

But I do think the fact that it's a side project makes it a more minor issue. It would be higher-stakes if someone had built their business' entire infrastructure on Fly and then Kurt asked them to leave. I'm assuming a principal engineer at Microsoft isn't putting anything serious on Fly, so it seems like the stakes were just a couple hours of Alex's time to migrate.

>Would you trust running any servers required for TinyPilot on Fly.io after that exchange, running the risk that if Kurt doesn't like one of your tweets, he (gracefully) will ask you to move to a new platform?

TinyPilot's critical servers actually do run on Fly.

The exchange doesn't make me nervous because I don't harass my critical vendors, especially not in public. I have vendors that I critique, but I try to do it professionally and respectfully. I think if I behave unprofessionally toward my vendors, then they're within their rights to drop me as a client.

Again, I'm assuming that what Alex deleted was something inappropriate. It sounds like your memory was that Alex said something benign. If Alex truly was critiquing Fly respectfully and Kurt responded by asking him to leave the platform, that would give me pause, but I'm not seeing evidence that was the case.

Just looked at the thread you linked. I wouldn't say I read that as a threat - more like he's saying "if you don't like it you can leave".

Not that that sounds great, but I can't see the original tweets that kurt replied to because they've been deleted, so it's kinda hard to judge


I don't know why you're being a total dick about something that's actually pretty neat.

We hire people to write distributed systems, and for those people, distributed systems is extremely relevant. We also hire people to do low-level Linux kernel stuff, and for those people, it's less relevant. I'm having trouble figuring out what it is you think you've found to dunk on here.


Probably for the same reason that practically every other team that builds distributed systems engages Kyle. But I can't speak for them, I guess.


(This comment was rewritten after I replied to it; it previously expressed contempt for the idea that a company that supposedly works on distributed systems would retain any of the services of Kyle Kingsbury.)


Fly.io uses their blog to educate people. They have some of the highest quality writing out there and talk about unique topics many people can't talk about with authority (SQLite, distributed systems, etc.).

This is amazing and I highly hope they keep doing what they do. I think it would be better if more companies did the same (sharing their knowledge).



And netlify.

This whole subthread is gross. We're asked by the guidelines not to have discussions like this. Can we please not?

Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.


I believe the previous poster and I were attempting to show that accusations of astroturfing are gross and can be hypocritical.

You can't do that by repeating the same behavior! You fix this problem with the "flag" button.

1. Not really a recruiting thing.

2. You'd get disciplined at Fly.io for non-ironically using the word "thought leadership".

We're a company full of message board nerds working on something super fun. We're going to write about it. Best get used to it!

How is this not a recruiting thing? It's promoting your supposedly superior hiring process.

Well, for starters, we've paused hiring for a few months on the role it pertains to?

Once again: this started as a hiring challenge, and Ben and Kyle overperformed and made it something more interesting. I guess we could have waited a few months and ran this at a time when we could metabolize more platform candidates, but, once again: this isn't a recruiting thing.

If you're not hiring, maybe you should remove this call to action implying you are at the bottom of the post:

> We reserved this last challenge for evaluating our staff engineers at Fly.io. So if you think you'd be up to the challenge, we'd love to talk to you.

We're happy to talk. But the first thing you'll hear from us is "we've paused platform hiring for a couple months".

Looks fun and interesting. I will say your recruiting call at the end made some of the devs where I work swear off your service... I think there have been enough scams in this vein that it rubs certain people in a bad way.

Sorry, what? A company blog post that links to their jobs page means you won't use a service? Do you just not use any service that advertises? How do you get Internet access?

I'm not that sensitive and have loved recruiting code games/challenges put out by other companies (like Scribd's AI recruiting game way back...) I was just noting that when I shared this link with my extended team, there was a pretty negative reaction because of how this type of thing has been used in the past by unscrupulous actors.

Our "recruiting call at the end" of what? I'm sorry, I don't understand. This post isn't advertising a new service; it's literally just a series of blog posts.

I am not passing judgment, but it’s literally the last sentence of the link. Personally, I don’t see any problems.

Sounds cool, keep up the great work

Is the discipline shots of malort?

Not even I would come up with a disciplinary scheme requiring employees to ingest poison.

hey John, long time! Sorry that the post came off as thought leadership. We thought it'd be fun to build something like Cryptopals but for distributed systems. Building projects like this helps give awareness to Fly.io but I'd like to think it helps the community at large as well. DigitalOcean did a great job of this in the past as well.

This could be said about alllll the inbound marketing pieces that end up on HN. That being said, I appreciate what Fly is doing and enjoy seeing their content and problem solving.

This is rude for pretty obvious reasons.

I can assure you that YC (in the form of HN admins) isn't doing anything to promote these posts. Like it or not, they're just popular with the community.


Or stability...

He's right you know, just make sure that you don't become insolvent in the next 3 months before you raise more money to find another year in still losing more money.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact