
Predict the future with Machine Learning - majikarp
https://www.zeroequalsfalse.press/2017/08/10/ml/
======
wepple
For anyone who still sees ML as a big of a magic black box, I can't highly
recommend this book enough: [https://www.amazon.com/Make-Your-Own-Neural-
Network-ebook/dp...](https://www.amazon.com/Make-Your-Own-Neural-Network-
ebook/dp/B01EER4Z4G) it does a fantastic job of breaking down the concepts
into incredibly straightforward ideas.

In fact, the linked article does (IMO) a terrible job at touching on NN by
displaying a large equation without a great deal of context - possibly the
last thing anyone actually needs when trying to grok the basic principles
behind a NN

------
tinkerdol
We shouldn't ever confuse machine learning with predicting the future -- just
because you've never encountered a black swan in the wild, doesn't mean they
don't exist.

That being said, the article otherwise seems like a great introduction. Not
sure why they chose that title.

~~~
ben_w
Black Swans are the error rate of your predictions (the real error rather than
your prediction of your error rate) not existential proof that prediction is
always doomed.

After all, if Black Swans were common enough to make prediction a fool's
errand most of the time, the bird of that name would never have led to the
book of that name, _because everyone would be predicting their failure to
predict things_.

~~~
sgt101
I think that a Black Swan is when a new factor appears in your domain. In
science we are conditioned from the start to create fair tests in controlled
experiments. Control is the fundamental of experiment - and statistics are
designed to handle experimental data.

In the real world there are often no controls, and complex systems can be
driven by an attractor for a very long time before one morning they are not,
and every rule that you have is useless (often worse than useless).

Sources of error are not equal; "Black Swan Error" is unusual in that over
time it may be that this source is more important than any other source of
data in your domain - the strange attractor that drove the creation of your
classifier over the last 20 years may _never_ recapture your function and if
that's the case your classifier will be literally the most wrong thing you
could have!

~~~
ben_w
That's certainly one type of Black Swan. Taleb's example of that sort of thing
being a Turkey predicting they will be fed (because that is what happened
every other day of their life) but who is actually slaughtered.

However, it is not the only type. There is also the stock market, which
demonstrates major unpredictability every few years, but which can also be
approximated the same way _between_ each of the Black Swans. (And they keep
being Black Swans because the gap between them is large enough for people to
convince themselves that "This time it's different, this time n̵o̵b̵o̵d̵y̵
̵w̵i̵l̵l̵ ̵h̵a̵v̵e̵ ̵t̵o̵ ̵b̵e̵ ̵n̵a̵i̵l̵e̵d̵ ̵t̵o̵ ̵a̵n̵y̵t̵h̵i̵n̵g̵ growth
will be eternal!")

Edit:

Point is, it generalises as how wrong you are in your predictions, and the
closer your estimate of your error rate is to your actual error rate, the
better your model is.

------
hprotagonist
_Fundamentally it is Software that works like our brain.._

Stopped here. Moving on.

~~~
shincert
Why? Is it really that bad of an analogy for an absolute beginner?

~~~
TimPC
It's not terrible for an absolute beginner but it's fairly harmful overall.
People tend to use this analogy a lot to conflate specific AI with general AI
and argue for regulatory capture based on things completely outside of
evidence. The real brain is sparsely connected and has multiple activation
networks that reuse nodes. We also have the ability to train from single
examples to things we've never seen before so it seems unlikely that our brain
operates exclusively by derivatives on error or other data-fitting techniques.
Humans still seem more unreasonably effective than deep learning on many tasks
and this is despite having a harder problem (Humans have more tasks with
unlabelled data as far as I can tell).

------
hal9000xp
I would like to read mathematical foundations of machine learning written for
those who are bad at calculus but good at discrete mathematics and algorithms.

For example, I'm learning algorithms, participate at contests, quite
comfortable with combinatorics and discrete probability theory but I'm
absolute zero at calculus. I would like to read machine learning's math
introduction which is friendly to my "discrete" brains.

~~~
thousandpounds
I am in the same boat. I get the feeling that most Calculus books are just a
compilation of tips and tricks. So I am suggesting you invest time into
learning real analysis proper. Right now I am learning from [1]. It follows
Rudin closely and as opposed to many other analysis books meant to "better
explain" stuff, it goes deep into the trenches and actually tackles the
subject.

[1] [https://www.amazon.com/Real-Analysis-Lifesaver-Understand-
Pr...](https://www.amazon.com/Real-Analysis-Lifesaver-Understand-
Princeton/dp/0691172935/ref=sr_1_8?ie=UTF8&qid=1502546377&sr=8-8&keywords=real+analysis)

I think time invested into studying real analysis pays off because then you
can later study measure theory, functional analysis and more advanced
probability to deal with curse of dimensionality and whatnot.

edit: I started studying the book linked above starting from chapter 4 since
the first 3 chapters are familiar from discrete math. Then did chapter 5,
skimmed chapters 6(little linear algebra), 7, 8 (most "transition to higher
math" books contain this stuff) and am currently in chapter 9.

~~~
TimPC
The majority of the stuff you need from calculus for deep learning doesn't
rise to the level of real analysis. Real analysis is worth doing if their are
benefits to Fourier transforms on your data sets in the domain you're working
in but otherwise has good payoff for studying more math rather than studying
more deep learning.

~~~
thousandpounds
Broadly speaking, I want to read books like [1]. It looks like they use quite
a bit of advanced nondiscrete probability. Since I prefer books written in
definition - theorem - proof format anyway, I figured I might as well get
analysis out the way :)

[1][http://www.cs.cornell.edu/jeh/book%20June%2014,%202017pdf.pd...](http://www.cs.cornell.edu/jeh/book%20June%2014,%202017pdf.pdf)
(Foundations of Data Science by Bloom/Hopcroft/Kannan)

------
stareatgoats
To me as an AI novice this article seemed like a good overview. It did not
address one problem I have with AI however, which is the inherent lack of
transparency. That is, unlike normal programs, we have an input and an output,
but the reasoning in between is a black box to human intelligence. This
problem has to be solved before we can turn over any really vital tasks to AI
with confidence IMO.

~~~
zEVE16Ug50tV
That's a good and deep point. Let's consider the case of writing an algorithm
to drive a car. There are some ideas people have had:

1) What really matters is what happens in the worst case, so we need
explanations in the worst case, but not necessarily the rest of the time: when
the car is choosing a slightly more efficient trajectory, we don't need an
explanation of why it 'chose' to do that. In the case of a crash, though, we'd
like to know precisely what went wrong. This suggests, perhaps, that we could
have a simpler and more transparent fallback system that usually is uninvolved
in driving, but that takes 'responsibility' and has the ability to control the
vehicle at a minimum level of safety (rather than efficiency).

2) Humans seem to produce make decisions first and explanations later
[http://www.skepticink.com/tippling/2013/11/14/post-hoc-
ratio...](http://www.skepticink.com/tippling/2013/11/14/post-hoc-
rationalisation-reasoning-our-intuition-and-changing-our-minds/): 'To the
question many people ask about politics — Why doesn’t the other side listen to
reason? — Haidt replies: We were never designed to listen to reason. When you
ask people moral questions, time their responses and scan their brains, their
answers and brain activation patterns indicate that they reach conclusions
quickly and produce reasons later only to justify what they’ve decided.' so
perhaps we could also train an ML agent to produce explanations that we find
persuasive?

~~~
stareatgoats
Agree, the need for transparency occurs mainly when something goes wrong. But
it is not also a question of control? Without human understanding there is no
human control, with no human control we have HAL ... which sort of answers
your second point: no, we should not let AI come up with it's own explanations
that we just find persuasive.

------
lngnmn
Future cannot be defined for a any stochastic process with unknown number of
hidden variables with unknown weights. Everything core complex than a roll of
a dice is unpredictable by definition, just because the model is incomplete.
Even next roll of a dice cannot be predicted.

Probability is not reality. Simulations are cartoons. Map is not the
territory. Models and simulations based on them are different from reality in
the same way a movie is not reality.

People who can't grasp these simple ideas cannot be legitimately called
scientists.

~~~
visarga
You don't need perfect simulation, just a good enough one. Simulation is like
a dynamic, extensible dataset. Neural nets can learn from simulation not only
in game play, but also in genetics, robotics and general reasoning. I think
simulation is at the core of what will lead to AGI.

~~~
lngnmn
My comment was is about inevitable philosophical gap between reality and _any_
model or a simulation, which cannot be bridged in principle.

Yes, simulators as sources of sensory input to get _similar_ experience are
used, for example, to train airline pilots. Nevertheless, there is no airline
which train its pilots on a simulator only. Algorithm, like a pilot, would
learn a simulation, not reality.

~~~
visarga
That's true. I was thinking about using simulation as an addition to being
embedded in the real world. Simulation is necessary to plan complex actions
based on reinforcement learning (model based RL). When AI can do simulations,
it can check a few steps ahead to see how things will go and select and act in
a proactive way as to maximize rewards. When interacting with people, the AI
will need to model people's state of mind and knowledge, in order to infer how
they would act and react. When using a device, an AI would need to control it
and know what to expect from it. In almost any non-trivial interaction, AI
needs to simulate in order to adapt to situations. Since it is impossible to
rewind reality (like a game) in order to try another action, the agent needs
to do that in its imagination (simulator).

By the way, humans are good for simulating some things, but quite bad for
others, and it still doesn't stop us from being the most intelligent agents. I
think AI only needs to simulate a little ahead in order to act much more
intelligently than today, because today, AI is mostly reactive, or feed-
forward, like a simple reflex.

------
halfeatenpie
I'm a PhD candidate related to statistical forecasting and projection related
to natural science (hydrology/streamflow projection under climate change
uncertainty).

The article seems to be a decent introduction article that shows what Machine
Learning is about (which is great) and shows how it can POSSIBLY be applied to
forecasting and prediction. However, I think it would be even better if there
was a simple example or two with each method being applied and showing
different outcomes and then the significance of each methodology through those
examples.

Also, I'd like to add a comment to this.

This article is great when you look at it from the Machine Learning
perspective. However, when you look at it from a forecasting perspective it
only shows a very small portion of what forecasting/predicting really is.

Algorithms you develop through machine learning is something known as a black-
box model. You know that the input data and the output data you're matching up
with are related somehow, but don't know exactly how they're matched up. That
relationship is established based on a performance index determined from a
trial-and-error method (of course depending on what actual method you use).

There are different methods available such as ARMA and ARIMA based models. In
regards to physical science, there are models that focus on the physical
interaction between the input data to simulate what is happening inside the
system. ML methods are simply just a taste of other methods available.

Regards to programming use (as I'm sure most of you folks here are used to),
ML is a good tool to use for forecasting if you're really interested in it.
But just like any forecasting model you use, you should probably determine the
performance of your model based on not one index but multiple indices which
consider different parts of your "needs". Percent accuracy only shows how
accurate you are, you should probably also consider how frequently you're
over-estimating vs under-estimating, how many series of overestimation there
is, etc. The most important one though in my books is bias correction. When
working with ML Algorithms most of them do not consider for bias. So you, as
the modeller need to prepare for bias correction. However this article kinda
glossed over it by saying "right amount of data" and "combination of data"
(which I understand is an introductory post, but I think this is very
important).

Maybe look into applying a method like the K-Fold Cross Validation to make
sure the final output parameters aren't AS biased. It really depends on the
modeller and the model and your performance indices you use.

------
sgt101
"Fundamentally it is software that works like the brain". I stopped reading
there, no good thing can be in any document that contains this phrase*

* Apart from this one

