Hacker News new | past | comments | ask | show | jobs | submit login

Can anyone from McSema comment on why it's moved from paid solutions for CFG recovery? It's amazing that an open source project depending on other open source projects for its main function has to rely on either IDA Pro or Binary Ninja to function. The support for radare is abysmal and not even merged.



> The support for radare is abysmal and not even merged

mcsema takes a CFG as input, so really someone just has to wire up radare to emit that CFG and it will just work. I don't seen an active PR to merge that in for radare.

Control flow analysis is a problem I love working on, but it has a large share of thorny issues that either need to be solved with deep theoretical thought or an ocean of tiny hacks. Tools like IDA already have that ocean of hacks. When I was working on making the control flow analysis better, collaborators (and customers) would reply with annoyance that this was a problem whose solution was already approximated by other tools, so why not use them?

If I could find someone to pay me to work on open source control flow analysis for binaries, and I had the time, I'd do it. Mysteriously, the people that can pay, are happy just paying a little extra to start using the CFA results that IDA gives them already and having their money get spent on new work that isn't re-inventing what IDA and Binja can do. Maybe I should set up a Patreon.


> Can anyone from McSema comment on why it's moved from paid solutions for CFG recovery?

Plainly, because radare's CFG recovery is not reliable enough for us to use. It's not a production quality tool when compared to IDA or Binary Ninja. CFG recovery is an extraordinarily complex problem and doing it right requires a team with commercial support behind them. Altruism simply does not get you far enough for the grind of developing effective CFG recovery.

Similarly, McSema itself is not free either. We're being paid by the US Government. It requires a team of people with decades of experience in programming languages and compilers working together for years. We're lucky to be in this position, but it could have easily gone the other way. Not every project out there has a sponsor willing to make it free.

The teams behind IDA and Binary Ninja don't have some other, enormous business producing profits to support them (say, advertising). In order to spend the person-decades of time required to build these tools that our entire community relies on for our jobs, they need an income stream to work on them.

> The support for radare is ... not even merged.

I'm not aware of any pull requests for radare support. You're welcome to submit one and we'll consider it. However, we'll expect the author to continue supporting it as McSema grows since we can't use it for our projects (it's not reliable enough).

I will note that we have an open PR for DynInst-based CFG recovery but we're still working with the author to review and approve it. https://github.com/trailofbits/mcsema/pull/386


My bad, I thought it was a PR but it's just an issue with 2 people actually had started work on it: https://github.com/trailofbits/mcsema/issues/220

I understand it's a difficult issue but even with a basic CFG recovery, wouldn't it still be useful to lift code?


Would you mind describing some of the cfg recovery problems IDA can solve that open source tools can’t? This sounds quite interesting.


IDA has a ton of hacks in it that arent exactly common or public knowledge. There are hundreds of anti-debugger/dissasembler instruction sequences that will crash or stall or corrupt analysis tools. To even identify all the instructions which can be used for control flow, which alter which flags and how, for multiple architectures, is a royal man hour grunt work marathon. And as we see with the latest round of speculative load exploits, a lot of branches cannot even be predicted appropriately because they could be using exploits relying on undefined behavior..side channels. So the whole issue of perfect control flow analysis is a more mechanical one than a pure abstraction based one. When OllyDbg was first built it was the same way. Oleg put thousands od hours into whack a moling every issue. Most people like elegant endeavors based on some core principles. In static analysis of complex ISAs, the core principle is documenting 10000 species of moths, not finding that they all have wings..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: