
Fwaf – Machine Learning Driven Web Application Firewall - Faizann20
http://fsecurify.com/fwaf-machine-learning-driven-web-application-firewall/
======
rfoo
It seems like what it actually "learned" is no better than banning some
keywords, for example:

    
    
      >>> p=lambda x:lgs.predict(vectorizer.transform([x]))
      >>> p("/product.php?name=etc")
      array([1])
      >>> p("/login.php?name=rfoo&pass=hehe")
      array([1])
      >>> p("/download.php?file=/root/.bashrc")
      array([0])
      >>> p("/example/test/q=" + lorem + "<script>alert(1)</script>") # len(lorem) = 4488
      array([0])
    

(FYI 1 means malicious and 0 means clean)

~~~
Faizann20
If you have a look at the data, there is a wide variety of malicious commands
that it can detect.

------
elchief
Doesn't look like he did any cross-validation, hence the high accuracy. Always
keep a hold-out set to test against

------
Faizann20
In case someone is having difficulty with the link, here is an alternative:

[https://web.archive.org/web/20170514081124/http://fsecurify....](https://web.archive.org/web/20170514081124/http://fsecurify.com/fwaf-
machine-learning-driven-web-application-firewall/)

Apologies for inconvenience.

Thanks

------
halflings
Like others have said, you might be overfitting your training data here: your
model is just memorising the examples you give it and would fail if somebody
slightly varies some payload. (inserting some whitespace or something)

Another thing to keep in mind is that an accuracy of 99% doesn't mean much in
an unbalanced problem like yours (much more clean queries than malicious
once).

What you should show instead is precision (of the ones labeled malicious, how
many are actually malicious?) and recall (out of the malicious queries in the
dataset, how many did your model label as malicious?)

~~~
halflings
Try to also do cross-validation (check sklearn's stratified K-Folds) to be
more confident you're not just overfitting.

------
bsg75
The similarities in name _and_ logo to F-Secure are a little bothersome.

~~~
Faizann20
They have a straight F :p

------
based2
[http://yararules.com/2017/04/06/yara-rules-strings-
statistic...](http://yararules.com/2017/04/06/yara-rules-strings-statistical-
study/)

------
bllguo
Fun, will look into this! Wonder if anyone can point to other datasets?

------
Faizann20
The website is working a bit slow but the page is loading. If you are facing
any problem, please wait for a minute and the page will load.

~~~
proyb2
All I got is blank page.

~~~
Faizann20
Here you go: [https://github.com/faizann24/Fwaf-Machine-Learning-driven-
We...](https://github.com/faizann24/Fwaf-Machine-Learning-driven-Web-
Application-Firewall)

------
mcboman
Why use a trigram as n ?

~~~
godmodus
Trigrams are a known sweetspot. Its a practical heuristic. Got no proofs to
link. Wrote awikipedia semana parser and witnessed similar results in
practice.

Going above 3 adds very,very little precision while guzzling space. Using 2
shows a drop in precision.

Probably has todo with how human built systems are built (we make them in our
image - and we seem to have a thing for 3s - map(subject, verb, object) ->
(origin, data, destinstion) etc)

Addendum; if you know what PCA is, id wager that added n gram dimensions share
a linear dependency with lower dimensions - so sharing a statistical
resemblence (covar(A,B) -> 0) that adds very little to the data's variability
once you start adding dims above 3.

~~~
godmodus
Why the downvotes with out correction?

------
godmodus
Woo, saving this.im planning a similar ptoject once im done with my studies.

