
Researchers easily trick Cylance's AI Antivirus to think Malware is 'Goodware' - hhs
https://www.vice.com/en_us/article/9kxp83/researchers-easily-trick-cylances-ai-based-antivirus-into-thinking-malware-is-goodware
======
some_random
The actual research: [https://skylightcyber.com/2019/07/18/cylance-i-kill-
you/](https://skylightcyber.com/2019/07/18/cylance-i-kill-you/)

Actual summary, Cylance has a model that uses strings as a feature, and by
taking all the strings from the whitelisted files and catting them onto the
malware sample one can subvert the ensemble model completely.

~~~
jgalentine007
Which is something a novice computer user might try when 'hacking'. Really
Cylance?

~~~
ma2rten
Some early spam filters could also be tricked using the same technique. I
think PG might have written about this.

~~~
nsajko
[http://www.paulgraham.com/antispam.html](http://www.paulgraham.com/antispam.html)

[http://www.paulgraham.com/sofar.html](http://www.paulgraham.com/sofar.html)

> More Good Tokens

------
AlexandrB
Whenever I’ve used AV I have found it to be worse than useless. Only
detections are false positives, uses tons of systems resources, breaks
commonly used apps.

I generally agree with Tavis Ormandy[1] that AV products often just increase
the attack surface. Especially for applications that already have a high
security focus (e.g. browsers). I’m somewhat surprised to see so many AV
believers in the HN comments section.

[1] [http://blog.cmpxchg8b.com/2016/03/security-software-
certific...](http://blog.cmpxchg8b.com/2016/03/security-software-
certification.html?m=1)

~~~
self_awareness
Neither you nor Tavis are taking into account non-technical people. Nobody
forces you to use AV -- if you feel you have the skill to defend yourself, do
yourself a favor (and us), and remove AV from your boxes. But leave non-
technical people alone.

~~~
michaelmrose
Herein lies the challenge. The user that is so ill trained that AV is a great
idea actually needs an environment locked down sufficiently that he can't ruin
his own day not a tool to analyze in real time whether he is doing so right
now.

Those more skilled would be better off just using their wits.

The set between is empty.

~~~
self_awareness
You mean the user should not have access to computers? Because that is what
you're implying. If the user can send and receive emails, this user is
susceptible to attack.

------
Hitton
That's what you get when you hype Machine Learning as AI. But you can't really
blame the "AI antivirus" much more than regular ones, they can be fooled
similarly easily. Antivirus can't generally protect you against anything else
than old malware, with new one you are always on your own. That said, making
old malware undetectable again takes more effort against regular antiviruses.

~~~
jonathanstrange
Modern antivirus programs tend to have some heavy heuristic real-time virus
detection, too. The problem is if you set the heuristic detection levels very
high, then the number of false positives skyrockets.

In some environments the best solution is to use a whitelisting approach and
there are several solutions for Windows on the market. Unfortunately, I don't
know of any easy to use whitelisting solution for Linux.

My personal experience with one of these solutions on my gaming PC was that
for normal end-users whitelisting is too much work for everyday use, and at
the same time I was never really confident that the mechanisms the program
used to block non-whitelisted software couldn't be fooled easily. Those
mechanisms are proprietary and even seemingly reputable companies make claims
on their web pages that look like snake-oil to me. If someone talks about
"military grade protection", then you can count me skeptic. In the end, the
success of such software mostly depends on whether the malware author tests
against the particular vendor or not, so there is some high level of security
by obscurity involved that might actually help against most attacks,
especially if you're using some obscure vendor as I did. Probably not worth
the effort for everyday use, though.

------
Trias11
I literally make customers disable cylance on their servers or we are not
guaranteeing servers operations.

Too many buggy conflicts.

~~~
jm4
How does that usually work out? Personally, I would throw out a vendor who
asked me to disable my AV. I don't mind whitelisting a directory or making
certain adjustments, but it's hard to take them seriously if they can't work
with AV.

Cylance isn't that bad. I ran it for 3 years. The false positives were
annoying, but it also stopped a lot of nasty stuff that our traditional AV
wasn't detecting. I'm in an environment where there's very little appetite for
risk and highly standardized endpoints so it generally worked out.

We ran it alongside Bit9 (now Carbon Black). They were both catching the same
things and CB has more features. Once CB got to the point where it could be
the only AV we ditched Cylance.

~~~
gregmac
What type of "nasty stuff" did it stop? Are you talking servers or
workstations here? How did that nasty stuff get far enough in that it had to
be "stopped" by AV?

(I don't mean this to sound snarky, I'm genuinely interested in what type of
real-world infections are happening and being detected. My experience is
limited to internet-facing endpoints that have no security or are horribly
outdated -- which is typically a bigger failure of IT to begin with -- and
non-technical end-users running random executables from e-mails or shady
websites)

~~~
jm4
Mostly crap that was targeted at users via email. Weird droppers, infected
documents, PDFs loaded with some exploit. Lots of stuff that doesn't show up
on Virus Total. We see a lot of stuff that attacks unpatched software and
there's quite a bit that gets through the email gateway. We see frequent spear
phishing scenarios where it's not uncommon for the attacker to email back and
forth for a while under some pretext before delivering a malicious payload.
Several situations where a party we are communicating with has been breached
and the attacker will use them as a jumping off point to get to us - basically
a business email compromise scenario where they jump into the middle of a
conversation or document transaction with a malicious payload. We patch too,
block a lot of connections at the firewall and have other layers, but it's
nice when the endpoint protection stops it.

------
eps
And to "trick" it into thinking something is malware when it's not you just
need to pack it with UPX.

~~~
jonathanstrange
It's a shame. People have even stopped using certain niche programming
languages because their non-standard compilers create executables that are
often flagged by antivirus software.

~~~
userbinator
Basically everything from the demoscene...

AVs have also turned into moral police with their detection of
cracks/keygens/"potentially unwanted software". Detecting worms, ransomware,
and the like is one thing, but I think going beyond that crosses a line.

~~~
gruez
That might work for technical users like yourself, but non-technical people
need an opinionated detection engine. If the computer has some sort of
software that hijacks search queries, should that be removed? Chances are the
user didn't want the software there, but you can't know for sure. What about a
RAT? Maybe they wanted it there to monitor their kids.

~~~
squeaky-clean
I don't think this is what they were referring to. I've had an anti-virus nuke
the directory for a pirated and cracked game. All because there was a
keygen.exe detected in the directory.

------
tptacek
I'm a little baffled by how this could be news. It's an antivirus system.
Someone bypassed it. That's what people do with antivirus systems. Was there a
widespread belief that Cylance had somehow cracked the code on reliable
antivirus?

~~~
oeoeo00
Is this news to nerds? No. The effectiveness of virus software has been
measured over and over and holes are always found, cause that’s software.

It’s a useful talking point nonetheless. Naive managers can feel pressured to
make purchase decisions around these things.

Pointing to data and these stories has been helpful to me in getting time to
truly vet our choices.

Nothing hurts credibility like saddling the company with a service contract
that provides fuck all nothing.

~~~
buboard
just a heads up, i think your comments are autodead now

------
_underfl0w_
The featured article links to an article from Cylance that _does_ actually
claim their model could've _theoretically_ detected and flagged malware before
its creation.

_"...before the cybercriminals set up the crypto-system, the payment details
of the campaign, the C2 infrastructure and before anything else was readied,
our model was fully able to predict and prevent that campaign’s malware."_

They claim that a 2015 version of their product _could have_ detected malware
that was written in 2016. This conjecture seems plausible but on closer
inspection seems to be... speculative. Especially if something like this could
undermine it.

~~~
swinglock
Old school antiviruses also has heuristics so they could make the same claim.

It wouldn't be true in practice because malware authors would just test their
malware against common antiviruses and tweak it before shipping so that the
heuristics don't pick it up.

Just like these researches did against their "AI". AI really just meaning
"generated heuristics", doesn't it?

If it becomes a problem for malware authors they will make their own "AI"
obfuscation generators soon if they haven't already.

They still have the advantage since the "AI" antivirus runs locally, so they
can just run tests against it until it doesn't detect, without having to send
a large amount of malware samples to the defenders.

------
unsignedint
I have used their home offering Cylance Smart Antivirus, and I essentially
concluded that machine learning is just not enough to detect malware. It's
nice signature is small and works even when you do not have latest signature,
but there are a lot of false positives. (and it's generally identify the
threat just by class of its threat, and not by specific identify of a threat,
so there's no way to assess its impact.)

This combined with Cylance's attitude to treat games as its own class of
malware (they would tell their user "just add the file to exclusion list" and
there's no way to add the file by class -- mind you, this is a "home"
product.)

I've tried Sophos Home, which also features machine learning based detection
(on their paid premium version), but they use it to supplement the signature /
behavior based detections -- which I feel is a more modest approach. For most
of "known" threats it get caught by the signature engine by their identity
("EOF97/EicarDrp-A"), if not, it'll identify the threat by its class
("ML/PE-A").

Oh, also, EOF97/EicarDrp-A is actually a EICAR test file embedded in PDF file,
I think this type of file is where Cylance's approach would struggle. (I don't
think Cylance's engine even look at anything other than executables, anyways,
however.)

~~~
jm4
You're in for a bad time trying to run Cylance or something similar at home. I
generally have a positive opinion of Cylance having run it for 3 years. I
think where those types of products shine is in larger environments where the
end user is your biggest risk, you have layers of security and it's worthwhile
to invest the effort in getting your AV config right.

I would never put that thing on my home machine. I know what I'm doing and I
know what's on my own machine. My usage patterns at home are much different
than at work so I'm not worried about phishing emails, fileless malware, PDF
and .doc exploits or whatever. I'm more concerned about my webcam, known
malware, casual drive by stuff, some basic parental controls and something low
maintenance that will stay out of the way. I keep my machine patched,
installed BitDefender and called it a day.

~~~
unsignedint
Well, I'm talking about the specific version of the Cylance PROTECT intended
for homes, which I believe similar engine sans memory protection and other
controls in their management dashboard. I guess it's bad execution on their
end, too, not necessarily their engine itself, perhaps. I guess their engine
is more appropriate for corporate IT environment where what goes inside gets
more vetting on what goes in.

The reason I looked at Cylance was part curiosity, and other that that I do
manage machines beyond my own use; so "end user is your biggest risk" actually
applies to me as far as malware vector goes.

In any case, I'm not really sure how Cylance's trying to position for their
home offering; seems to allow very little control over its configuration and
while protection is inadequate. I haven't use their enterprise version but I'm
assuming it gives you a lot more configuration options...

------
threrw334113
Looks like a BOW classifier is the 'AI' here.

~~~
noir-york
Indeed. I wonder if its some version of Paul Graham's Bayesian classifier he
had posted many many years ago...

------
xorcist
Heuristics based antivirus has been around since the 90s. Vendors have pushed
it significantly more during the past five years or so. Supposedly, it has
become much better. The promise has always been to detect future malware.

However there has always been a fundamental problem with them. Malware authors
have access to them too. They won't release anything that's detected by the
tools they care about. Heuristics gets improved, but the end user ends up with
a patch cycle anyway.

That doesn't mean they are useless, just oversold. Machine learning antivirus
sounds like more of the same.

------
ape4
Shouldn't it be "bonware". Since "mal" is French.

~~~
secant
It's a portmanteau of "malicious software", so I guess goodware is fine.

~~~
hef19898
Isn't portmanteau French as well?

~~~
antihero
Portmantotally

------
higherkinded
The advertisement take about this product being able to detect the malware two
years before it's even written is discernibly turgid. Chuckled well off that.

When will companies selling ML stop making these statements in attempts to
whoa people? Like, the claim is ridiculous, how did they even come up with
this, ans moreover, what's the possible basis for such a statement? How do
they verify that and how would they prove it to anyone asking? It's hugely
apparent that it's not even physically possible to pull that move off.

~~~
higherkinded
So the approach described essentially defeats the purpose of antivirus
software as it's known. Malware so happens to be (usually) embedded in the
legit software. If that real-world fact is the way to defeat their model, is
it of any use? You get the binary off the spoofed page, boom, you're pwned
instantly and that piece of software is totally worthless at its primary goal
despite being advertised as a killer product.

------
djflutt3rshy
"That is, until they discovered that Cylance also had whitelisted certain
families of executable files to avoid triggering false positives on legitimate
software."

Surprised to see this way down in the article. No wonder that when they
manually whitelisted false positives, researchers could just append code from
those to malware and it'd rank it harmless. Isn't the proper answer to not
have manual whitelisting of entire programs, but to train it better to exclude
them?

------
dmitrygr
Rice's theorem is pretty clear on this. You PROVABLY CANNOT inspect a piece of
code and answer any nontrivial question about any nontrivial property of it.

"Is this code evil?" is a VERY nontrivial (bordering on philosophical)
question

Adding buzzwords like "AI" into the mix doesn't affect this in any way

------
sourabhkt
and what most of the companies try to sell as AI is not anywhere close.

~~~
mruts
I mean, is deep learning AI? Is word2vec AI? I would say yes, but I guess some
would say no.

But it doesn’t really matter because no one has a good definition of AI.
Perhaps the best one would be passing the Turing test, but that has problems
as well.

------
sqldba
The idea that some brogrammers can come along and bang out a disruptor in this
area - which has a few decades of extremely complex learning, patents, and
optimisation - just because they can cobble together some ML...

It'd be sad if it wasn't so naive.

~~~
solveit
Many ML successes look exactly like that.

~~~
Barrin92
that's mostly because the bar is so unbelievably low that "it works 95% of the
time" is good enough to please consumers. In security or anything that
requires engineering rigor where catching the exception is exactly what
matters ML is virtually useless, or even worse actively harmful.

~~~
jointpdf
That’s not true, there’s an entire subfield of stats/ML dedicated to anomaly
detection.

~~~
Barrin92
But even outliers in ML systems are pretty much always outliers of first
order, that is to say they're outliers in a predictable way, they're often
significantly different from their environment.

The sort of outliers that concern security problems are pretty much always
idiosyncratic by design because the people that create them know how easy it
is to create adversarial examples for machines.

There's a human ingenuity to genuine edge cases that ML is ill suited to
figure out because ML by design draws conclusions from patterns. My prediction
is that we'll very soon see the same problem in fields like autonomous
driving. Every time we see ML attack complex human domains, the "last 2%" seem
intractable.

