
Searching statically-linked vulnerable library functions in executable code - matt_d
https://googleprojectzero.blogspot.com/2018/12/searching-statically-linked-vulnerable.html
======
TACIXAT
It is totally awesome that function similarity is seeing the light of day.
Google bought Zynamics way back when and their work has evolved a lot since
then.

This sort of work has big implications in signature generation for malware
samples, for clustering families of samples as well as finding common
functions to generate detections on. You couldn't necessarily throw this into
a detection engine, because we don't have a fast (dedicated) function recovery
tool for binaries, but you can absolutely use it to generate byte based
detection from a seed of a few samples.

Rather than having hash based signatures you could generate signatures that
cover many samples (and likely new ones) in bulk. Normally a good signature
like that requires manual effort from an analyst, this is a step toward
machines doing it. As well, a central authoritative name database that could
say "this is Petya" could force some sane naming convention on the industry
(every AV wouldn't just be like "it's Zbot lol").

This stuff can even aid manual reverse engineering. You could build a function
naming database that uses this. Maybe a new engine for Talos FIRST. [1] Then
if you opened up a file without debug symbols this could match it to known
functions and really speed up reverse engineering efforts.

I look forward to reading it in more detail tomorrow. Thanks to Halvar for
putting this out.

1\.
[https://www.talosintelligence.com/first](https://www.talosintelligence.com/first)

------
tomjakubowski
> An efficient implementation of a hash function (based on SimHashing) which
> calculates a 128-bit hash from disassembled functions - and which preserves
> similarity (e.g. “distance” of two functions can be calculated by simply
> calculating the hamming distance between two hashes - which translates to
> two XOR and two POPCNT instructions on x64).

Huh! This obviously desirable property of a good hash function for this
application is one of the classic _undesirable_ properties of a good hash
function for cryptography. I don't think that I was previously familiar with
this sort of hash function, very cool.

Are space-filling curves related to these hash functions?

~~~
viraptor
Another hash with a similar property uses close pronunciation as distance:
[https://en.m.wikipedia.org/wiki/Soundex](https://en.m.wikipedia.org/wiki/Soundex)

~~~
simcop2387
Soundex however is only designed for English names. Take a look at Metaphone
or Double-Metaphone for a more generic version of the same idea.

[https://en.wikipedia.org/wiki/Metaphone](https://en.wikipedia.org/wiki/Metaphone)

------
devereaux
The zlib vulnerability was a huge wakeup call. Years ago I used to be in favor
of statically linked, to limit dependency creep. Disk space is cheap so large
binaries were never a concern.

My favorite approach was single repository compiling optimized binaries with
the specific versions of the libs wanted (in case of weird regression) and
pushing these static binaries to the rest of the network.

After the zlib debacle, no more: I only use that approach for very specific
mission critical tools, where I do not trust ansible or even linux
distributions.

The sqlite 0 day may have reignited the same fears in those too young to
remember grepping various zlib signatures on binaries -not just yours (you
central repository can easily push new version to your network) but the other
tools you don't necessary control.

~~~
flukus
Unfortunately outside of linux distros static linking seems to be becoming
more and more then new norm, especially if you include what is effectively
"soft" static linking of bundling all dependencies, which suffer from the same
problems. Rust is only suitable for static linking, .net core statically links
and most .net projects will have all sorts of out of date dependencies, npm
bundles huge dependency chains that may not be upgraded. Even in the linux
distro world there is movement towards tools like docker where dependencies
will often not be patched.

Even the idea of stable versions of libraries with security patches seems to
be a dieing one.

~~~
whyever
> Rust is only suitable for static linking

This is incorrect, Rust supports dynamic linking.

[https://doc.rust-lang.org/reference/linkage.html](https://doc.rust-
lang.org/reference/linkage.html)

~~~
flukus
It supports it terribly. Since there's no stable ABI you have to use the same
compiler version. That or be limited by the C ABI and give up most of the
safety features.

~~~
elteto
This isn’t unique to Rust, C++ has the same issues when it comes to using
shared libraries. You either carefully control compiling and shared library
versions or you drop down to C.

I don’t know any native language that can do this in a better way.

------
nineteen999
This is probably a good thing, given the recent trend in languages like Golang
and Rust to statically link binaries by default.

~~~
devereaux
I do not understand the hate towards statically linked binaries. It has its
place for mission critical tools.

~~~
viraptor
Very specific tools, sure. If you have the time, people, and skills to keep
your own list of baked-in dependencies and monitor it for issues - great. But
for a random person downloading something from the internet, it's a dangerous
thing to default to.

~~~
devereaux
I agree. It is a double edge sword. It requires skill and proper analysis.

I should have been more specific. I explained my reason below on
[https://news.ycombinator.com/item?id=18713442](https://news.ycombinator.com/item?id=18713442)

~~~
ex_amazon_sde
Even FAANG companies rely on distributions and other companies to spot
vulnerabilities, rebuild libraries, test and validate them.

Even if the company rebuilds everything, there's a huge benefit in knowing
that you are using a well tested release instead of a less popular one or an
internal fork.

Also: security teams hate big security patches.

------
senderista
Nit: simhash approximates cosine distance, not Jaccard distance (that would be
minhash).

------
motohagiography
The treatment of their ROC curve is precisely the problem I encountered in
product for a closely related tech.

This is a great way to do vulnerability research, surprisingly good malware
detection, but a less good way to provide an assurance service.

Great post, and a good step towards the general problem of code reputation.

------
a-dub
I always wondered if AV databases were just minhashing/lsh on binaries...
guess not!

~~~
loeg
Hah, no. AV databases are just real dumb tries of whole-file MD5s, or a
similar level of sophistication (i.e., not very). Usually combined with an
unnecessarily privileged and unsandboxed parser for arbitrary weird file
formats.

~~~
a-dub
Looking for vulny binary code in terms of jaccard similarity is interesting.

------
gammateam
“0ld days” lol I like it kinda

Marriott breach: someone executed an 0ld day.

------
RobLach
Very cool.

------
Jacoe
What am I reading haha.

~~~
puzzle
The article makes more sense if you consider that Google is notorious for
statically linking (almost) all its production binaries, something that the Go
toolchain supported early on for related reasons.

~~~
CaliforniaKarl
I’m guessing it’s also related to the recent SQLite zero-day. SQLite is
public-domain, and so it’s not unusual for projects to simply include the code
directly in their source distributions, which will either build it in or will
provide a `./configure` option to use an already-built form (this is used by
distro packagers). When the “build it in” option is used, then it’s statically
linked.

~~~
tptacek
I think Thomas Dullien has been working on this stuff for a very long time;
his company, Zynamics, which Google acquired ages ago, was the author of
BinDiff.

