Hacker News new | past | comments | ask | show | jobs | submit login

The timing of this is pretty awesome for me. I’m building a product that requires fairly heavy, scheduled background services. Originally, I had built these services intermingled with my client and API, but it was not ideal from a development or deployment process. Plus we had rolled our own monitoring which was itself a PITA to maintain. With Inngest, I moved all of our background services and processing to a separate sub-repo, and we can develop, deploy and monitor entirely independently from the rest of our product which has really sped things up. Love it. Would recommend for anything event-based!

The last straw for me was the few times I ran into issues, often due to my own mistakes, their support was nearly real-time and worked with me either help me solve the problem or dig in on their end to see where the issue was. Honestly more than anything the support gives me confidence to fully commit to this and use across all my production apps.

Anyway, great stuff all, you’ve built something awesome here.




> we had rolled our own monitoring which was itself a PITA to maintain

Thanks! What type of monitoring were you looking for? We have some basic metrics now, but know we need to improve this. What metrics, alerting, observability are important for you?


Not the original commenter but I manage a similar system:

1. Wait timings for jobs.

2. Run timings for jobs.

3. Timeout occurrences and stdout/stderr logs of those runs

4. Retry metrics, and if there is a retry limit, then metrics on jobs that were abandoned.

One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.


This is great - we do capture all logs for each run including any retries, so you can see errors and general successes. All of these other metrics we have internally, but need to expose to our users!

Observability is super key for background work even more so since it's not always tied to a specific user action, so you need to have a trail to understand issues.

> One thing that is easy to overlook is giving users the ability to define a specific “urgency” for their jobs which would allow for different alerting thresholds on things like running time or waiting.

We are adding prioritization for functions soon so this is helpful for thinking about how to think about telemetry for different priority/urgent jobs.

re: timeouts - managing timeouts usually means managing dead-letter queues and our goal is to remove the need to think about DLQs at all and build metrics and smarter retry/replay logic right into the Inngest platform.


Sorry DLQs make it easier to do those alerts where a human needs to look asap at something. Not sure they can be gotten rid of, but maybe you call them something else.


Inngest engineer here!

Agreed that alerting is important! We alert on job failures, plus we integrate with observability tools like Sentry.

For DLQs, you're right that they have value. We aren't killing DLQs but rather rethinking them with better ergonomics. Instead of having a dumping ground for unacked messages, we're developing a "replay" feature that lets you retry failed jobs over a period of time. Our planned replay feature will run failures in a separate queue, which can be cancelled at any time. The replay itself can be retried as well if there's still a problem




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: