Hacker News new | past | comments | ask | show | jobs | submit login

One of my biggest UX nits with Fly (I have no excuses, I have all the access I need to go fix this myself) is that we "kernel panic" when your entrypoint command fails. Of course, our kernel is not really panicking --- we've just run out of things for our `init` to manage, so it exits, and when pid 1 exits, so does the kernel. But you get the terrifying stack dump.

We can clean this up, so that you get a clearer, simpler error ("your entrypoint exited, here's the exit code, there's nothing else for this VM to do so it's exiting, have a nice day"), and it's been on the docket for months. We'll get it done!

We could conceivably add a flag for our `init` to hang around waiting for you to SSH in after your entrypoint exits. But that's clunky and complicated. Usually, you want your kernel to exit when your entrypoint fails, so that your service restarts! What you should do instead is push a container that has enough process supervision to hang around itself. Here's a doc:

https://fly.io/docs/app-guides/multiple-processes/




That all makes sense. I think context matters. Yes, when I have a service that's been up and running for some time, if that service fails, I want my service to restart.

But when I'm trying to get a service running for the first time and I'm not sure that I have the right command in the entrypoint, the right arguments to that command, or the right supporting files in place, or the right libraries installed, or the right file permissions, …, well, then I don't want things to just blindly restart, I want a handle and some information so I can figure out why it isn't working.

ETA: I recognize that your link to docs about running a supervisor addresses this problem. For me this raises some interesting questions. Like, I understand why Ben would implement `litestream exec` but maybe it would be better to steer users to a proper supervisor? Separately, what if it's the supervisor that's failing? Now I'm back to seeing kernel panics and not having error messages or a shell.


Usually, when I'm debugging a container, I start with a `tail -f /dev/null` entrypoint, or something like that, and then just shell into it to run the real entrypoint to see if it's working.


That's probably closer to sysadmins magic, not average developer way of thinking when debugging.


> We could conceivably add a flag for our `init` to hang around waiting for you to SSH in after your entrypoint exits. But that's clunky and complicated

This is a common thing in CI platforms, and the way they usually expose this is "run tests with SSH enabled", and they keep it open for 30 minutes/2 hours/whatever until a session closes.

So if I have some app failing, being able to run `fly restart --ssh-debug`, having that first just sit around waiting for the app to boot, and then dropping into ssh would be a very helpful piece of UX. The main thing is cleanup, but y'all charge for compute! You can be pretty loosy-goosy on that one honestly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: