Launch HN: Patchwork (YC S24) – Automatically add structured logs to your code

btown · 2024-08-29T16:21:36 1724948496

This looks really, really cool. In the Python world, we can rely on tools like Sentry to be able to say "on any exception or logger.warning, crawl the stack and show me all the variables at every level." And we've also set up Honeycomb to record log entries as OpenTelemetry spans, which lets us see logs in context of database queries etc., and do structured queries about when and how certain things occur. Which means that we can just instruct our team to "err on the side of adding simple log statements, even if it's just a static string without variables; we can dig for those reactively."

But in a codebase where far more of the complexity is in native code and doesn't cross network boundaries, I imagine folks feel like they're flying blind without those tools that we Python devs can take for granted. So Patchwork is desperately needed. I'd be very curious to see how your larger observability platform integrates with the broader world of OpenTelemetry, where the native code might be part of a broader distributed system. It's cool to see this space moving forward so quickly!

benjaminfh · 2024-08-29T16:44:14 1724949854

A big part of our journey is trying to learn the different ways people generate, collect and then retrieve context, so we really appreciate you sharing how you're doing it - thanks! Am I right in understanding that you have a two-part approach: Sentry with simply messages and all variables up the call stack + Honeycomb/OT spans. What do you collect with the spans?

Regarding the platform, transparently, it's early days for us and we're focusing on vanilla structured logs (generating better ones and then later storing them efficiently in something like ClickHouse), rather than tracing. Are there particular things you'd expect to see from a platform like this?

benjaminfh · 2024-08-29T21:17:55 1724966275

Going to double tap my own reply here and come back to the OTel peice. Do you find there's a lot of effort required in adding attributes to spans through your code? A lot of people we speak to who are all-in on tracing are (we think) getting by with minimal/auto span implementation, without attributes. We're trying to keep digging and learning about how this looks at different orgs / figure out if this approach holds as systems get more complex. Would our workflow as applied to adding attributes and messages to spans be useful to you?

carlsverre · 2024-08-29T19:32:52 1724959972

Really cool idea! You may want to update the explainer video to use a different example of incorrect logger usage though. The current example (https://youtu.be/ObIepiXfVx0?si=-lumV_0ShKLpNl_h&t=203) claims that the use of Println is incorrect and logger.Info should be used instead. However, that would break the command in question as the command is outputting the current configuration as YAML. By using logger.Info, the additional logging prefix would make the user's life more difficult when they presumably output the command into a file or attempt to copy and paste the output. I'm sure that over time Patchwork will learn how to filter out these false-positives, but perhaps it's not the best thing to show off in the demo.

benjaminfh · 2024-08-29T19:40:15 1724960415

Thanks for catching this and providing the feedback. We're iterating very hard to make sure the fixes are indeed always fixes!

I won't waste the chance to engage a logging connoisseur – could you see yourself using this if the hit rate was 99%?

carlsverre · 2024-08-29T20:19:31 1724962771

No problem. :)

Possibly yes! If I had a project running in production with a team this seems very valuable as another layer of defense. With these kinds of scanning tools however, they are quickly ignored if they tend to emit noise. If at all possible (from a cost perspective) I'd focus on launching a free version for open source projects complete with a badge to add to the readme. Similar to other Linting/CI projects. Might help gain a bunch of traction in the short term, and if it's free folks may be likely to add it and leave it even if the accuracy isn't perfect which may be nice marketing for y'all.

carlsverre · 2024-08-29T20:21:32 1724962892

Also... can you expand this to comments? Doesn't seem too different from logs. And then add in the fairly obvious auto-PR logic with the ability to respond to PR comments and now you're eliminating maintenance burden. Caveat: I dunno if anyone else is doing this, but this seems like a promising step.

benjaminfh · 2024-08-29T20:31:51 1724963511

We're thinking alike. While we're iterating with customers, we're also thinking about how we can use it to contribute beneficially to projects (and in doing so prove it works reliably). One thing we are keeping a list of is popular OSS repos with notoriously spammy logs - if you know any, please let us know! We are planning to start with some of our YC batch mates' OSS projects once we get the quality right. I won't say exactly when we started showing the "fixes" but it was _very_ recently. Ensuring accuracy on identifying exceptions to the rules (aka the appropriate statements to touch in the first place) is the first thing we are perfecting.

We could expand to comments. The code maintenance direction is a possibility but the reason we get out of bed right now is to make a worthy contribution to logging -> debugging -> SRE sleep :)

devneelpatel · 2024-08-30T11:39:42 1725017982

OneUptime.com already does it with the copilot feature. It also fixes exceptions, add structured logs, optimizes functions / spans that take a long time to complete and integrates with OpenTelemetry natively.

It's also 100% open-source.

benjaminfh · 2024-08-30T15:07:04 1725030424

Thanks for pointing this one out, we weren't aware it could do that. Do you use those features yourself? We think it's an important enough problem that there's room for multiple people to be solving it, but very curious to hear if you think they have.

On the OSS piece, we are not ignoring this. It's very early days for us so we are figuring out how best to balance our limited resources and, in the future, make a thoughtful contribution to the community.

(Edit: just clocked that you _are_ OneUptime.com, so some of my questions won't make total sense. We'd love to understand how your users have responded to those capabilities, if you're willing to share)

DistractionRect · 2024-08-30T01:11:21 1724980281

Congrats on the launch! I'm probably just missing it as I only know enough python to be dangerous, but where can I find your MoA implementation in the source code?

The write up [0] sounds really useful, and says it's open source, but for the life of me I can't find it.

[0] https://www.patched.codes/blog/patched-moa-optimizing-infere...

benjaminfh · 2024-08-30T15:09:33 1725030573

The folk at Patched are doing great work - they are our YC batch mates! Alas, we cannot take credit for that write-up, sadly, nor answer the question re their source code. We are getpatchwork.io / Patchwork Technologies.

Thanks for the congrats :)

aviperl · 2024-08-29T18:14:31 1724955271

On the topic of structured logs, can anyone point me towards where I might learn more about what people have learned over time?

I'm new to the world of querying through my logs, but I can already see a benefit to logging with JSON...

This is what I've defined as a solution for a need of mine for the moment: https://gist.github.com/avi-perl/b173fdc30219155eb9ee4bb3a21...

benjaminfh · 2024-08-29T18:53:35 1724957615

Structured/JSON saves a tonne of time building regex parsers. The regex parsing at query-time is also pretty expensive. This is where Splunk excels - dealing with the noise with powerful querying. ClickHouse is also very performant at this, we hear. It's an expensive task though (computationally and cost wise)

I thought this was well put together from Better Stack: https://betterstack.com/community/guides/logging/logging-bes...

Charity, CTO of Honeycomb has strong views (which we enjoy a lot): https://charity.wtf/tag/observability-2-0/ - they come at it from a tracing/OT angle which is Honeycomb's forte, but we agree a lot on the intended outcomes - actionable (not spammy/noisy) + make it easy to gather the variable/state context in the context of a single event.

ucarion · 2024-08-29T16:43:55 1724949835

This is really cool. Is the "standards" thing basically some sort of LLM prompt? Like, do I just say in words "use slog, discourage logrus"?

benjaminfh · 2024-08-29T16:49:10 1724950150

Right now it's where we specify which rules to apply to a repo when analysing and improving, plus any user-specific tweaks which we do splice in to prompts (still working on exactly how to balance freedom + necessary rails here). The rules themselves are sets of logic in their own right – essentially a mix of static code analysis to carefully prepare context for an LLM to reason over. This makes me think the UI for the standards page probably needs to be more binary with respect to the rules (then freeform for user prefs) - would that make more sense in your opinion?

ucarion · 2024-08-29T17:05:45 1724951145

I don’t really have much an opinion on the what, I think I just know how I want this to feel. My preference is whatever works, and then making it obvious how it works. Mystery meat config files feel like the opposite, but I totally acknowledge that’s where you have to start when building this.

benjaminfh · 2024-08-29T17:17:55 1724951875

That's very fair feedback, we appreciate it!

spimmy · 2024-08-30T03:31:20 1724988680

Excited to see this!! Very much on board with this sort of approach to the future -- wide, structured logs, a balance between machines doing the heavy lifting + humans adding intent to the signal. Also VERY excited to see honeycomb customers already popping up in the comments .. i can't wait to see what these tools can do together.

benjaminfh · 2024-08-30T15:11:32 1725030692

Thanks for the support, it means a lot to us! It would be a dream to see our products working together in the near future. Here's to a high-context future of debugging :)

mrgoldenbrown · 2024-08-30T17:04:07 1725037447

Am I missing where they explain what languages/logging frameworks they support? Their blog page is somewhat broken on Firefox (Android) so maybe I just can't see something? If it's only for one language shouldn't the headline include that important detail

benjaminfh · 2024-08-30T17:54:31 1725040471

Apologies, for all the thought put in to balancing conciseness with completeness, that was a bad miss. Right now we are focusing on Go and Java. We're flexible on logging libraries - if you have strong prefs here, we'd love to get that signal.

Moving forward we will be able to support Python, Ruby, TypeScript, JavaScript, Scala, and Kotlin as well.

I'll check the page issue on firefox. Thanks for the flag.

brap · 2024-08-29T18:36:12 1724956572

Your landing page sucks, I don’t care about your team or your marketing text, I want to see the code!

benjaminfh · 2024-08-29T19:01:41 1724958101

Understood. The landing page so far is taking a back seat relative to making the product better for our early customers, which I'm sure you can appreciate. We hope to open source components of the product. In the mean time are there specific parts of the technical approach that you'd like us to see better described?

elwell · 2024-08-29T22:50:23 1724971823

If the landing page doesn't lead me to eventually try the product, then this is a bad tradeoff. I would listen to this feedback more seriously.

anushnaidu · 2024-08-29T16:38:38 1724949518

It looks awesome..

Does it checks the Log Levels and whether it's appropriate or not

benjaminfh · 2024-08-29T16:53:37 1724950417

Great catch - we didn't include that rule in the demo but it's in the works. To share our thinking a little more: analysing the contents of the statement is easiest (but still quite hard) so we're nailing that down. Analysing the level requires a bit more careful understanding of what the method/code around it is attempting and where this logging statement sits (harder still). Soon we hope to nail adding statements to fresh code (where we'd have to understand where to put them, what the level should be, and then the contents).

willks · 2024-08-30T09:54:16 1725011656

This looks really cool, this would be very valuable for us a small team. Do you have plans for pricing levels?

benjaminfh · 2024-08-30T15:14:56 1725030896

Thanks! Yeah, we're finding a lot of interest from small teams who want to move fast without accumulating tech debt / creating a negative trend between customer count an reliability as they start to find traction themselves.

Want to drop us an email (founders@getpatchwork.io)? Right now we do our best to make pricing work for our customers because the real value for us in this early stage is your feedback. (Cliche but true :) )

fnord123 · 2024-08-29T21:15:47 1724966147

> Refactoring the logs in existing code bases today is a manual slog.

Nice.

benjaminfh · 2024-08-29T22:29:40 1724970580

Glad to hear that resonates. Goes without saying that we get the best reception with people who've literately done manual surges on this. Speaking from experience?

fnord123 · 2024-08-29T23:41:57 1724974917

> Refactoring the logs in existing code bases today is a manual slog.

slog is shorthand for structured logs[1][2][3][4]. So refactoring the logs [without a tool] is manually converting them to a slog. You have a great pun on your hands.

> Speaking from experience?

I have a notebook laying around with a loose plan on how to get the most compression possible out of a log file. It would use ML to figure out what the log strings would be - but I think your idea of scanning the executable is even more clever. Anyway, once it knows the log strings you can probably stuff it into off the shelf column storage for great compression rates. And if the compression rates are good enough then searches should be much much faster as well.

Who do you reckon are the more important existing competitors? Grafana labs or Sonarqube?

[1] https://github.com/slog-rs/slog

[2] https://github.com/gookit/slog

[3] https://github.com/kala13x/slog

[4] https://pkg.go.dev/log/slog

benjaminfh · 2024-08-30T18:16:20 1725041780

Ahah. Totally missed that, I will admit. I gratefully accept the pun and will be using it even more going forward.

If you're willing to chat on the technical details of that notebook, we'd love to. Of course, if you're hoping to build it one day and don't feel comfortable sharing, understand. If yes, it's founders@getpatchwork.io :)

We are definitely eyeing up what can be done when you control the log strings and the rest of the payload. And along the lines of what you say, our first step would be to see how much ClickHouse could squeeze that, and then see what other clever compression could be added in advance.

Anyone crushing it in the code analysis and refactoring space is a challenger. I think for now our sense is that the full-blown agentic SWE tools have bitten off more than they can chew and aren't viewed as credible just yet. However there are people out there taking a focused, use-case-specific approach (like us) who are building impressive things. komment.ai is one that springs to mind. SonarQube looks interesting - thanks for flagging.

In terms of logging stack players, we're hoping some could be friend rather than foe, at least to begin with. We thought ClickHouse might see unstructured logs as an unlock for their customers / GTM motion. However, they have invested a lot in their query-time materialisation tech, which they said their log storage customers love. Expensive, in practice, I suspect. Grafana actually pinged me yesterday.

andrewshadura · 2024-08-29T19:46:44 1724960804

Sorry, but the name's already taken: Patchwork is an online patch review tool.

benjaminfh · 2024-08-29T19:57:59 1724961479

Thanks for flagging that one. It has turned out to be a very busy namespace. We're focusing on making sure the product sticks first... then figuring out if the name needs changing is a good problem to have! :D