I use the builtin pprof flame graphs all the time, and since each of the goroutine pools have different stack traces, i can tell them apart. what does this package improve on? Wall time instead of CPU time? It isnt immediately obvious to me what the extra info is?
The main difference is that you get a timeline (flame chart) rather than flame graph. This allows you to understand the order in which operations are taking place. You also get walltime (instead of CPU time), so you can debug Off-CPU performance bottlenecks (e.g. database calls) without the need for additional instrumentation. Last but not least you get everything broken down per-goroutine, so you can understand which operations are executed concurrently vs sequentially.
The Go CPU profiler is great for reducing CPU utilization. But unless you're CPU-bound, it's not very useful for improving latency. fgtrace is trying to help with that.
> fgtrace may cause noticeable stop-the-world pauses in your applications.
Huh, I wonder if this is a temporary limitation or an issue with the approach. In my experience if you're doing profiling you probably better off getting something lighter weight that you can get more honest numbers from.
Edit: reading closer, it looks like the go team had similar concerns. I wonder if this can capture how long a goroutine was unmounted for.
Capturing a consistent snapshot of all goroutines requires stopping the world. However, this can be very quick as the GC relies on the same mechanism.
The bigger problem is capturing the stack traces for all goroutines. Rhys added a patch to Go 1.19 [1] that mostly moves this work outside of the critical STW section, which greatly reduces the overhead. Unfortunately this improvement only applies to the official goroutine profiling APIs, and those do not provide details such as goroutine ids. This means fgtrace has to use runtime.Stack() which returns the stack traces as text (yikes) and isn't optimized like the other goroutine profiling APIs.
There are various ways the implementation details of fgtrace and the Go runtime could be improved for this use case (wallclock timeline views), and I'm hoping to work on contributions in the coming months.
The proposal[0] mentioned in the README has some good insight from rsc.
He notes the performance & scalability issues already noted here by other commenters.
> Probably the right thing to do is figure out more of a trace like the current trace profiles but perhaps less low level.
This is the key take away for me.
I think there's room for tracing support somewhere in-between runtime/trace and full blown distributed traces (e.g., OpenTelemetry[1]) - so I'm hopeful this effort may evolve into a good solution in that space.
From a usability point of view, my biggest gripe right now with the go tracer is that it's viewer is...painful. It uses the tracer that's built into chrome, which chrome itself is moving away from.
I'd hacked around a bit recently to try and get the existing go traces into perfetto[2], with some success. As I recall, I couldn't get user traces functioning.
The `go tool trace` server has an api to output compatible json, but it's limited in what it outputs. Unfortunately, the trace file itself is in some custom binary format. All the tools for manipulating it are in `internal/` folders, making them unavailable for import, so creating new tools for working with the traces is quite burdensome.
I'd debated copying the code out into a new project, and starting to hack on it, but at that point, I'd reached the end of my willingness to invest time. Perhaps I should open an issue or mesage the mailing list to see what the maintainers think the future of runtime tracing looks like.
> He notes the performance & scalability issues already noted here by other commenters.
Go 1.19 has made some improvements in this regard [1]. But yes, profiling all goroutines does not scale to programs that use more than perhaps 10k goroutines which isn't entirely uncommon. To overcome this, the goroutine profile API would need to be extended to allow profiling a subset of goroutines. pprof labels could be used to specify which goroutines should be profiled.
> Probably the right thing to do is figure out more of a trace like the current trace profiles but perhaps less low level.
Yeah, in the long run the tracer, perhaps in combination with the cpu profiler [2], also offers a great way of capturing this data. But right now it's too much of a firehose, so it probably needs some way of selecting a subset of goroutines to trace as well. Additionally the unwinding of stack traces is a major bottleneck, so maybe frame pointer unwinding or similar will be needed to make it faster.
I've heard some stuff about future plans to the tracer that would help with the custom binary format problem, so hopefully this will improve in the future.
Anyway, I mostly see fgtrace as a "Do Things that Don't Scale" [3] kind of project. If people like the value it can provide, it will likely motivate myself and others to figure out how to build a version of it that is safe for production usage :).
Sorry for the confusion :). I'm the author of both tools and was also considering to build the new functionality into fgprof since the data capturing approach is very similar. Anyway, if you found fgprof useful, I think fgtrace could be even more useful in similar situations :)
I've also posted a few more comments in this twitter thread: https://twitter.com/felixge/status/1571850160358965249