
Hack day report: Using Amazon Machine Learning to predict trolling - kilimchoi
https://www.theguardian.com/info/developer-blog/2015/jul/17/hack-day-report-using-amazon-machine-learning-to-predict-trolling
======
justonepost
Is AML using a deep learning architecture? I'm sure there are enough phrases
that when used alongside or near other phrases can cause trolling to occur.
Given a big enough training set and deep enough architecture, I can imagine it
would work well. Perhaps you'd want to use a thesaurus to canonicalize words
in case your training set isn't big enough. Removing fill words might help as
well.

For example, the phrase "Trump was bombastic in his political speech" might
become something like "Trump grandiose political speech" or even just
"political speech".

I wonder if this sort of feature extraction has been done elsewhere and could
be re-used for this problem.

edit: looks good: [http://blog.mafr.de/2012/04/15/scikit-learn-feature-
extracti...](http://blog.mafr.de/2012/04/15/scikit-learn-feature-extractio/)

~~~
zorrb
No.. From their documentation it says they use "Industry Standard Logistic
Regression", which just means logistic regression. Which can be very useful,
but is also the most basic model you could use.

------
binarysolo
I need to look at the git repo to make better sense of what AML is doing and
why their numbers are poor, but off-handedly: the way you structure the ML
problem is basically the battle and how it'll do. (Feature extraction etc.)
Blackboxing all that makes it really non-trivial to come up with a generalized
solution for everything because it counts on the human to frame the problem
still.

(Will comment more intelligently once I read the codebase itself.)

