Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Perforator – cluster-wide profiling tool for large data centers (github.com/yandex)
78 points by BigRedEye 9 days ago | hide | past | favorite | 13 comments
Hey HN! We are happy to share Perforator – our internal cluster-wide profiler with great support for native languages and a built-in AutoFDO pipeline to simplify sPGO builds. Perforator allows you to profile most binaries without having to recompile or adjust the build process. We use it at Yandex to profile each pod inside a large cluster at modest speed (99Hz), collecting petabytes of profiles every day.

There's a blog post about it at https://medium.com/yandex/yandexs-high-performance-profiler-....

Inspired by Google-Wide Profiling, we started continuous profiling years ago with simple tools like poormansprofiler.org. With the rise of eBPF, we came up with a simple and elegant solution providing detailed profiles without noticeable overhead. Pretty wild when you can see the guts of your production binaries in a flamegraph without them even noticing.

Some technical details:

- Our main contribution is infrastructure for continuous PGO using AutoFDO. Google and Meta have done tremendous work on building PGO infrastructure, and we made the last missing piece of the puzzle to make this work well and scalable.

- Native binaries are profiled through eh_frame analysis, interpreted/JIT-compiled languages are profiled through perf-pid.map or hardcoded structure offsets.

- We render profiles in multiple ways, the most common one is a fast implementation of FlameGraphs, rendering 1M frames in 100ms.

- We provide Helm charts to easily deploy Perforator on your k8s cluster.

- You can use Perforator in standalone mode as a replacement for perf record.

I'd love to answer your questions about the tool!






I just learned about poormansprofiler (https://poormansprofiler.org/): it's brilliant in its simplicity.

If I'm understanding correctly, this is collecting LBR data through hardware support for PGO/AutoFDO, right?

(These are older comments that we merged from https://news.ycombinator.com/item?id=42888185, in case anyone was confused by the timestamps)

Yes. Although we are studying CSSPO, which uses a mixed (LBR + software-sampled stacks) approach.

I'm familiar with the paper, but it doesn't improve the situation in terms of LBR availability on cloud providers, does it?

Yes, existing limitations apply. Without hardware LBR support, we cannot provide sPGO profiles. However, the basic profiling should work fine.

Blog is packed with information, thanks!

Isn't it the case that from stack traces it is rather impossible to read that function foo() is burning CPU cycles because it is memory-bound? And the reason could be rather somewhere else and not in that particular function - e.g. multiple other threads creating contention on the memory bus?

If so, doesn't this make the profile somewhat an invalid candidate for PGO?


It depends on the event that was sampled to generate the profiles. For example, if you sample instructions by collecting a stack trace every N instructions, you won't actually see foo() burning the CPU. However, if you look at CPU cycles, foo() will be very noticeable. Internally, we use sPGO profiles from sampling CPU cycles, not instructions.

Right, perhaps I was a little bit too vague but what I was trying to say is that by merely sampling the CPU cycles we cannot infer that the foo() was burning CPU because it was memory-bound and which in itself is not an artifact of foo() implementation but rather application-wide threads that happen to saturate the memory bus more quickly.

Or is my doubt incorrect?


I'm curious about the differences from Pyroscope. https://github.com/grafana/pyroscope

Great question! Perforator indeed looks similar to Pyroscope. However, we think that the closest existing solutions are https://parca.dev, closed-source Google Wide Profiling, and, speaking of the agent, the beautiful OpenTelemetry eBPF profiler. The main technical differences with Pyroscope we see are:

- Pyroscope's Java support is superior as of now because Pyroscope offloads it to the amazing async-profiler.

- Pyroscope expects native binaries to be compiled with frame pointers: https://grafana.com/docs/pyroscope/latest/configure-client/g.... This is often not the case, and that's the problem we've tried to solve with Perforator. Perforator uses .eh_frame, which is nearly universal and does not impose additional requirements on compiled binaries.

- Pyroscope symbolizes using symtab: https://grafana.com/docs/pyroscope/latest/configure-client/g.... We use DWARF/GSYM to get as correct and verbose stacks as possible (we benchmark our stacks against stacks from gdb).

- Pyroscope symbolizes profiles on an agent, while Perforator symbolizes profiles offline, greatly reducing symbolization costs and agent's overhead. It seems Pyroscope is heading toward the same architecture we use: https://github.com/grafana/pyroscope/pull/3799.

- Perforator can be (and should be!) run as a standalone replacement for perf record.

- Perforator supports sPGO profiles.

In summary, we try to implement native profiling almost perfectly. It's worth noting that Pyroscope is a mature, well-established product that integrates excellently with the Grafana ecosystem. We have just focused on different things: our focus has been on optimizing native code profiling and making it as accurate and low-overhead as possible.


Any plans on grafana integration? It would’ve been great to have an ability to match performance metrics with other app indicators

[deleted]


    You don't have to wait for later, here's a new eliminator
    Ask your local weapon trader for the superperforator



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: