In fact, the linked article does (IMO) a terrible job at touching on NN by displaying a large equation without a great deal of context - possibly the last thing anyone actually needs when trying to grok the basic principles behind a NN
That being said, the article otherwise seems like a great introduction. Not sure why they chose that title.
After all, if Black Swans were common enough to make prediction a fool's errand most of the time, the bird of that name would never have led to the book of that name, because everyone would be predicting their failure to predict things.
In the real world there are often no controls, and complex systems can be driven by an attractor for a very long time before one morning they are not, and every rule that you have is useless (often worse than useless).
Sources of error are not equal; "Black Swan Error" is unusual in that over time it may be that this source is more important than any other source of data in your domain - the strange attractor that drove the creation of your classifier over the last 20 years may never recapture your function and if that's the case your classifier will be literally the most wrong thing you could have!
However, it is not the only type. There is also the stock market, which demonstrates major unpredictability every few years, but which can also be approximated the same way between each of the Black Swans. (And they keep being Black Swans because the gap between them is large enough for people to convince themselves that "This time it's different, this time n̵o̵b̵o̵d̵y̵ ̵w̵i̵l̵l̵ ̵h̵a̵v̵e̵ ̵t̵o̵ ̵b̵e̵ ̵n̵a̵i̵l̵e̵d̵ ̵t̵o̵ ̵a̵n̵y̵t̵h̵i̵n̵g̵ growth will be eternal!")
Point is, it generalises as how wrong you are in your predictions, and the closer your estimate of your error rate is to your actual error rate, the better your model is.
I have to say I rather prefer the black variety over the white.
Great point. We can't know that a machine learning algorithm used to make predictions won't be wrong if the future turns out to be significantly different from the past. A swan-classifier trained on images of white swans would fail hard if given pictures of black swans.
That said, people find it useful to use machine learning algorithms to predict the future, as the future tends to be similar to the past, at least in the limited domains to which machine learning is currently applied. As compute increases and we learn how to write machine learning architectures, we don't need to distinguish as much between 'machine learning' and plain old 'learning' and much of what philosophers have thought over the years about the problem of induction, and relevant domains of induction, becomes relevant to the topic.
 Or learn them. Jeff Dean mentions experimental success learning RNN architectures: https://www.youtube.com/watch?v=vzoe2G5g-w4
Any time you extend a statistical model temporally it immediately becomes mathematically invalid since probabilistic statistics are only valid for a fixed population at a fixed moment in time.
Unfortunately business and government is rife with people predicting the future based on statistical models that have no more mathematical validity than reading tea leaves.
Statistics as we know it "works" (can be derived) under the assumptions of controlled experimental data. As a thought experiment think about the weather - we know that if we build a classifier that predicts the weather in my garden tomorrow based on the history of the weather in my garden it will do very badly. Why - well because weather is very very very complex; the range of behavior is vast. But worse, it's unstable. The weather in my garden is driven by several complex systems; the ocean, the atmosphere, the earth's orbit and sol! Statistics can't predict the future of the weather in my garden.
Statistics also can't predict other things like the future of the financial markets (not least because if you find a statistical law about that they you will act on it and then screw it up)
It's important to me to bang on about this because there are loads of people who sit through their introductory courses and read the example of predicting a biased roulette wheel. Years later they end up running the company/country/community that I live in and they have a view that they can use the same principles to do it... and this thinking leads to nasty surprises for me.
Give me hourly readings of temperature, wind speed, wind direction, precipitation, cloud cover and barometric pressure for the last 10 years and I can give you a very accurate prediction of tomorrow's weather in your garden.
Many years ago I actually seriously tried to do what you describe above, I tried out all sorts of things around seasonal analysis and other features. What kills it is the chaotic nature of UK weather due to the jetstream and NAO.
that aside you are missing a larger point. if you predict the future based on past data all you are saying is "the future will be the same as the past." you aren't predicting anything. you will be wrong every single time something novel occurs, which is pretty frequently in the real world.
I don't have the data handy but to the best of my recollection weather forecasting for high/low temperature and precipitation does pretty well for the range of 24-48 hours but declines steadily in accuracy, and is no better than random guess around 2 weeks out.
That said, you are not addressing my other point, which is that "weather prediction" is just saying "things are going to stay the same." You are always starting with a set of conditions and then looking at your records and seeing what happened in similar conditions and predicting that the same thing will happen again.
Predicting that things will stay the same may come out as better than random guess in many cases but it will still be 100% wrong in cases where something novel happens.
Obviously, this is more or less feasible in practice, depending on the phenomenon under study. Calling markets unpredictable is not evidence against the existence of rigorous frameworks for statistical prediction.
Don't let bad experiences with inexperienced and overconfident practitioners blind you to established, uncontroversial, mathematical truths.
Stopped here. Moving on.
1. as the other poster to you noted, people are more apt to conflate "strong AI" with what we're actually doing with tensorflow, leading to very weird overreactions that aren't germane.
2. just as importantly, developers who believe this line of thinking are biased against a more correct understanding of their code, which makes debugging much more difficult and prevents advances in the underlying technology.
The implied abstraction ... is however beyond the computing scientist imbued with the operational approach that the anthropomorphic metaphor induces. In a very real and tragic sense he has a mental block: his anthropomorphic thinking erects an insurmountable barrier between him and the only effective way in which his work can be done well.
For example, I'm learning algorithms, participate at contests, quite comfortable with combinatorics and discrete probability theory but I'm absolute zero at calculus. I would like to read machine learning's math introduction which is friendly to my "discrete" brains.
Quick look at it e.g. here:
I think time invested into studying real analysis pays off because then you can later study measure theory, functional analysis and more advanced probability to deal with curse of dimensionality and whatnot.
edit: I started studying the book linked above starting from chapter 4 since the first 3 chapters are familiar from discrete math. Then did chapter 5, skimmed chapters 6(little linear algebra), 7, 8 (most "transition to higher math" books contain this stuff) and am currently in chapter 9.
My previous college calculus was far from the rigour of Rudin but also far from a cookbook flavour. Unless by 'cookbook' you mean the chain rule and differentials of standard forms. It's not that hard to teach this stuff at least it's no harder than, say, geometry. I found trigonometry far more difficult.
We were first taught limits then came differentiation of polynomials using infitesimals.Chain rule was introduced. Then we ventured into differentiation of other functions. Integration was first introduced much like riemann integrals then came integration as an inverse of differentiation.
http://www.cs.cornell.edu/jeh/book%20June%2014,%202017pdf.pd... (Foundations of Data Science by Bloom/Hopcroft/Kannan)
Also generating functions are kind of combinatorics + calculus.
We can also perturb the inputs to study how the outputs depend on them. There are many things we can do to probe a neural net. We might not be able to say for sure what situation will make it fail, but we can try it with millions of tests to see how good it is. That's how neural nets are rated, actually - by testing on loads of new data, they haven't seen during training.
On the other hand, human brains are really black boxes. We know much less about how they work or how they reach a conclusion. The fact that we can self-report is not the same as knowing how a human would behave in any reasonable situation. We don't actually know how people will behave. And yet, a few people do have access to incredible power to do harm to humanity. We have had to live with that fear since the invention of atomics and genetic viruses.
So I don't think the argument that "neural nets are black boxes" is so powerful. As opposed to what, I would say? Even our dear president is such a black box. We have no idea how it works, or what he will do next.
1) What really matters is what happens in the worst case, so we need explanations in the worst case, but not necessarily the rest of the time: when the car is choosing a slightly more efficient trajectory, we don't need an explanation of why it 'chose' to do that. In the case of a crash, though, we'd like to know precisely what went wrong. This suggests, perhaps, that we could have a simpler and more transparent fallback system that usually is uninvolved in driving, but that takes 'responsibility' and has the ability to control the vehicle at a minimum level of safety (rather than efficiency).
2) Humans seem to produce make decisions first and explanations later http://www.skepticink.com/tippling/2013/11/14/post-hoc-ratio...: 'To the question many people ask about politics — Why doesn’t the other side listen to reason? — Haidt replies: We were never designed to listen to reason. When you ask people moral questions, time their responses and scan their brains, their answers and brain activation patterns indicate that they reach conclusions quickly and produce reasons later only to justify what they’ve decided.'
so perhaps we could also train an ML agent to produce explanations that we find persuasive?
Probability is not reality. Simulations are cartoons. Map is not the territory. Models and simulations based on them are different from reality in the same way a movie is not reality.
People who can't grasp these simple ideas cannot be legitimately called scientists.
Yes, simulators as sources of sensory input to get similar experience are used, for example, to train airline pilots. Nevertheless, there is no airline which train its pilots on a simulator only. Algorithm, like a pilot, would learn a simulation, not reality.
By the way, humans are good for simulating some things, but quite bad for others, and it still doesn't stop us from being the most intelligent agents. I think AI only needs to simulate a little ahead in order to act much more intelligently than today, because today, AI is mostly reactive, or feed-forward, like a simple reflex.
The article seems to be a decent introduction article that shows what Machine Learning is about (which is great) and shows how it can POSSIBLY be applied to forecasting and prediction. However, I think it would be even better if there was a simple example or two with each method being applied and showing different outcomes and then the significance of each methodology through those examples.
Also, I'd like to add a comment to this.
This article is great when you look at it from the Machine Learning perspective. However, when you look at it from a forecasting perspective it only shows a very small portion of what forecasting/predicting really is.
Algorithms you develop through machine learning is something known as a black-box model. You know that the input data and the output data you're matching up with are related somehow, but don't know exactly how they're matched up. That relationship is established based on a performance index determined from a trial-and-error method (of course depending on what actual method you use).
There are different methods available such as ARMA and ARIMA based models. In regards to physical science, there are models that focus on the physical interaction between the input data to simulate what is happening inside the system. ML methods are simply just a taste of other methods available.
Regards to programming use (as I'm sure most of you folks here are used to), ML is a good tool to use for forecasting if you're really interested in it. But just like any forecasting model you use, you should probably determine the performance of your model based on not one index but multiple indices which consider different parts of your "needs". Percent accuracy only shows how accurate you are, you should probably also consider how frequently you're over-estimating vs under-estimating, how many series of overestimation there is, etc. The most important one though in my books is bias correction. When working with ML Algorithms most of them do not consider for bias. So you, as the modeller need to prepare for bias correction. However this article kinda glossed over it by saying "right amount of data" and "combination of data" (which I understand is an introductory post, but I think this is very important).
Maybe look into applying a method like the K-Fold Cross Validation to make sure the final output parameters aren't AS biased. It really depends on the modeller and the model and your performance indices you use.
* Apart from this one