
Android Malware Detection Using Machine Learning - lainon
https://arxiv.org/abs/1703.10926
======
wimagguc
It's less about the actual malware detection aspects and more about whether
you can use an emulator to collect data from malicious apps. For background,
those apps tend to do the Volkswagen-trick where they detect their environment
first and are only active on real Android devices.

This white paper too finds that extracting the dynamic features (the data that
would feed the ML algorithm) is more reliable on a device than on an emulator.
Not surprising and not much to do with the machine learning aspects.

~~~
wolfos
Hey, I'm from the same research group as the authors above. At the same
conference, we also had a paper on Andorid malware detection using deep
learning. (Our malware detection results are ok-ish, but we know how to make
them much better, and will publish an updated paper soon).

paper -
[http://dl.acm.org/citation.cfm?id=3029823](http://dl.acm.org/citation.cfm?id=3029823)
code - [https://github.com/niallmcl/Deep-Android-Malware-
Detection](https://github.com/niallmcl/Deep-Android-Malware-Detection)

~~~
wimagguc
What's a minimum sample size you've been successfully using? I'd imagine that
finding APKs, disassembling them and finding the malware yourself for the
training datasets are rather resource intensive tasks, so it's a difficult
balancing act?

~~~
wolfos
We did an experiment where we vary the number of training samples and measure
validation accuracy (Fig. 3). Basically, the system keeps getting better as
you give it more data. (For good real world performance I'd say 10s of
thousands of training samples would be needed).

Initially, we were using an off the shelf dataset donated by an anti-virus
company. Later we did some experiments using a much larger dataset collected
by our colleagues at ASU. Although I was not involved with collecting that
dataset, as far as I know the Android APKs were scraped from various online
stores and checked for malware using virustotal.com

------
pawadu
I don't think ML is the solution here. At least not right now.

Google needs to create a better system to rate developers. As it is now a
normal person has ZERO chance of telling a part a big reputable corporation
from a scammer on the Play Store. Which has lead to crazy things like this:

[https://www.reddit.com/r/Android/comments/6dzw22/why_are_app...](https://www.reddit.com/r/Android/comments/6dzw22/why_are_apps_like_these_allowed_on_play_store_i/)

So Google has a bigger problems, and its Google itself. Once they clean their
own mess we may benefit from ML malware detection but until then its just a
whac-a-mole game.

~~~
tyingq
There's also several apps called "Gallery" that take advantage of the fact
that Android renamed Gallery to Photos.

People think it's missing, and download one of these ad injecting crapware
things.

Google seems unaware.

~~~
pawadu
> Google seems unaware

At this point I would say they deliberately allow this to happen.

There is a community of volunteers regularly reporting this sort of stuff. How
on earth could Google be aware after years of this?

[https://www.reddit.com/r/BadApps/](https://www.reddit.com/r/BadApps/)

------
chriswarbo
There's some nice work on ML detection of android malware at
[http://groups.inf.ed.ac.uk/security/appguarden/Publications....](http://groups.inf.ed.ac.uk/security/appguarden/Publications.html)

