Hacker News new | comments | show | ask | jobs | submit login
Predict the future with Machine Learning (zeroequalsfalse.press)
147 points by majikarp 7 months ago | hide | past | web | favorite | 45 comments

For anyone who still sees ML as a big of a magic black box, I can't highly recommend this book enough: https://www.amazon.com/Make-Your-Own-Neural-Network-ebook/dp... it does a fantastic job of breaking down the concepts into incredibly straightforward ideas.

In fact, the linked article does (IMO) a terrible job at touching on NN by displaying a large equation without a great deal of context - possibly the last thing anyone actually needs when trying to grok the basic principles behind a NN

We shouldn't ever confuse machine learning with predicting the future -- just because you've never encountered a black swan in the wild, doesn't mean they don't exist.

That being said, the article otherwise seems like a great introduction. Not sure why they chose that title.

Black Swans are the error rate of your predictions (the real error rather than your prediction of your error rate) not existential proof that prediction is always doomed.

After all, if Black Swans were common enough to make prediction a fool's errand most of the time, the bird of that name would never have led to the book of that name, because everyone would be predicting their failure to predict things.

I think that a Black Swan is when a new factor appears in your domain. In science we are conditioned from the start to create fair tests in controlled experiments. Control is the fundamental of experiment - and statistics are designed to handle experimental data.

In the real world there are often no controls, and complex systems can be driven by an attractor for a very long time before one morning they are not, and every rule that you have is useless (often worse than useless).

Sources of error are not equal; "Black Swan Error" is unusual in that over time it may be that this source is more important than any other source of data in your domain - the strange attractor that drove the creation of your classifier over the last 20 years may never recapture your function and if that's the case your classifier will be literally the most wrong thing you could have!

That's certainly one type of Black Swan. Taleb's example of that sort of thing being a Turkey predicting they will be fed (because that is what happened every other day of their life) but who is actually slaughtered.

However, it is not the only type. There is also the stock market, which demonstrates major unpredictability every few years, but which can also be approximated the same way between each of the Black Swans. (And they keep being Black Swans because the gap between them is large enough for people to convince themselves that "This time it's different, this time n̵o̵b̵o̵d̵y̵ ̵w̵i̵l̵l̵ ̵h̵a̵v̵e̵ ̵t̵o̵ ̵b̵e̵ ̵n̵a̵i̵l̵e̵d̵ ̵t̵o̵ ̵a̵n̵y̵t̵h̵i̵n̵g̵ growth will be eternal!")


Point is, it generalises as how wrong you are in your predictions, and the closer your estimate of your error rate is to your actual error rate, the better your model is.

I always find the term 'black swan' to be interesting, because where I live, black swans are the rule rather than the exception. I think this just makes the analogy even better, since it highlights how much your ability to predict events depends on your environment.

Me too :) It is not a term that get used much here (Australia).

I have to say I rather prefer the black variety over the white.

> just because you've never encountered a black swan in the wild, doesn't mean they don't exist.

Great point. We can't know that a machine learning algorithm used to make predictions won't be wrong if the future turns out to be significantly different from the past. A swan-classifier trained on images of white swans would fail hard if given pictures of black swans.

That said, people find it useful to use machine learning algorithms to predict the future, as the future tends to be similar to the past, at least in the limited domains to which machine learning is currently applied. As compute increases and we learn how to write machine learning architectures[0], we don't need to distinguish as much between 'machine learning' and plain old 'learning' and much of what philosophers have thought over the years about the problem of induction, and relevant domains of induction, becomes relevant to the topic.

[0] Or learn them. Jeff Dean mentions experimental success learning RNN architectures: https://www.youtube.com/watch?v=vzoe2G5g-w4

It is an unfortunate misconception that statistical probability can be used to predict the future.

Any time you extend a statistical model temporally it immediately becomes mathematically invalid since probabilistic statistics are only valid for a fixed population at a fixed moment in time.

Unfortunately business and government is rife with people predicting the future based on statistical models that have no more mathematical validity than reading tea leaves.

What??? Prediction is certainly a type of extrapolation, but to claim that it's "mathematically invalid" reveals a severe lack of knowledge on your part. In fact, under parametric assumptions about the data generating mechanism, we can exactly quantify the expected coverage of prediction intervals. That's literally a standard topic in an introductory statistics course.

Hello, I think Calafrax is probably right. :o) I think you implicitly agree because you say "under parametric assumptions..." which means you know whats going on; but to make the point->

Statistics as we know it "works" (can be derived) under the assumptions of controlled experimental data. As a thought experiment think about the weather - we know that if we build a classifier that predicts the weather in my garden tomorrow based on the history of the weather in my garden it will do very badly. Why - well because weather is very very very complex; the range of behavior is vast. But worse, it's unstable. The weather in my garden is driven by several complex systems; the ocean, the atmosphere, the earth's orbit and sol! Statistics can't predict the future of the weather in my garden.

Statistics also can't predict other things like the future of the financial markets (not least because if you find a statistical law about that they you will act on it and then screw it up)

It's important to me to bang on about this because there are loads of people who sit through their introductory courses and read the example of predicting a biased roulette wheel. Years later they end up running the company/country/community that I live in and they have a view that they can use the same principles to do it... and this thinking leads to nasty surprises for me.

> if we build a classifier that predicts the weather in my garden tomorrow based on the history of the weather in my garden it will do very badly

Give me hourly readings of temperature, wind speed, wind direction, precipitation, cloud cover and barometric pressure for the last 10 years and I can give you a very accurate prediction of tomorrow's weather in your garden.

Hello, interestingly governments and private industries have invested a very large amount of money in the launch of satellites, development of supercomputers and code and the training of forecasters to interpret them.

Many years ago I actually seriously tried to do what you describe above, I tried out all sorts of things around seasonal analysis and other features. What kills it is the chaotic nature of UK weather due to the jetstream and NAO.

is that a joke? weather predictions are notoriously unreliable even though they are given with extreme granularity.

that aside you are missing a larger point. if you predict the future based on past data all you are saying is "the future will be the same as the past." you aren't predicting anything. you will be wrong every single time something novel occurs, which is pretty frequently in the real world.

The perception that weather forecasting is notoriously unreliable is mostly false: https://mobile.nytimes.com/2012/09/09/magazine/the-weatherma...

From your link : "Why are weather forecasters succeeding when other predictors fail? It’s because long ago they came to accept the imperfections in their knowledge. That helped them understand that even the most sophisticated computers, combing through seemingly limitless data, are painfully ill equipped to predict something as dynamic as weather all by themselves. So as fields like economics began relying more on Big Data, meteorologists recognized that data on its own isn’t enough."

Quantifying uncertainty is one of the main points of statistics. Don't confuse the limitations of point estimates provided by machine learning techniques with all of statistical practice.

i am not sure what that article is supposed to prove. it doesn't contain any study results on the accuracy of meteorological predictions.

I don't have the data handy but to the best of my recollection weather forecasting for high/low temperature and precipitation does pretty well for the range of 24-48 hours but declines steadily in accuracy, and is no better than random guess around 2 weeks out.

That said, you are not addressing my other point, which is that "weather prediction" is just saying "things are going to stay the same." You are always starting with a set of conditions and then looking at your records and seeing what happened in similar conditions and predicting that the same thing will happen again.

Predicting that things will stay the same may come out as better than random guess in many cases but it will still be 100% wrong in cases where something novel happens.

The point is, statistical prediction is definitely a thing, and is not "mathematically invalid" - it's mathematically well defined, with predictable consequences (increasing variance as the extrapolation becomes greater). Certainly, statistical models are not Crystal balls, but they never claimed to be. If you have a reasonable frequentist model and good data about an ongoing process, you should be able to make predictions with reasonable confidence bounds. If you have a reasonable Bayesian model and good data about an ongoing process, you should be able to coherently quantify your uncertainty about the future state of the system.

Obviously, this is more or less feasible in practice, depending on the phenomenon under study. Calling markets unpredictable is not evidence against the existence of rigorous frameworks for statistical prediction.

Don't let bad experiences with inexperienced and overconfident practitioners blind you to established, uncontroversial, mathematical truths.

Any relevant prediction model should account for the probability of black swans existence, even if it may have no idea what a black swan might look like.

This runs against the challenge that (almost?) all statistical methods train by fitting a model to some sort of data. If you have zero examples of a black swan in the data you can agree in principle they might exist but you'd expect a statistical model to get them wrong.

Fundamentally it is Software that works like our brain..

Stopped here. Moving on.

Why? Is it really that bad of an analogy for an absolute beginner?

It's not terrible for an absolute beginner but it's fairly harmful overall. People tend to use this analogy a lot to conflate specific AI with general AI and argue for regulatory capture based on things completely outside of evidence. The real brain is sparsely connected and has multiple activation networks that reuse nodes. We also have the ability to train from single examples to things we've never seen before so it seems unlikely that our brain operates exclusively by derivatives on error or other data-fitting techniques. Humans still seem more unreasonably effective than deep learning on many tasks and this is despite having a harder problem (Humans have more tasks with unlabelled data as far as I can tell).

I think so. Primarily due to Djikstra's anti-anthropomorphic stance, which is very important here.

1. as the other poster to you noted, people are more apt to conflate "strong AI" with what we're actually doing with tensorflow, leading to very weird overreactions that aren't germane.

2. just as importantly, developers who believe this line of thinking are biased against a more correct understanding of their code, which makes debugging much more difficult and prevents advances in the underlying technology.

The implied abstraction ... is however beyond the computing scientist imbued with the operational approach that the anthropomorphic metaphor induces. In a very real and tragic sense he has a mental block: his anthropomorphic thinking erects an insurmountable barrier between him and the only effective way in which his work can be done well.

I would like to read mathematical foundations of machine learning written for those who are bad at calculus but good at discrete mathematics and algorithms.

For example, I'm learning algorithms, participate at contests, quite comfortable with combinatorics and discrete probability theory but I'm absolute zero at calculus. I would like to read machine learning's math introduction which is friendly to my "discrete" brains.

Learn discrete calculus from Knuth's Concrete Mathematics first; it is combinatorics-heavy and once you get to differences and summation per-partes, you can quickly extend it to "normal" calculus as you master notion of "limit".

Quick look at it e.g. here:


I am in the same boat. I get the feeling that most Calculus books are just a compilation of tips and tricks. So I am suggesting you invest time into learning real analysis proper. Right now I am learning from [1]. It follows Rudin closely and as opposed to many other analysis books meant to "better explain" stuff, it goes deep into the trenches and actually tackles the subject.

[1] https://www.amazon.com/Real-Analysis-Lifesaver-Understand-Pr...

I think time invested into studying real analysis pays off because then you can later study measure theory, functional analysis and more advanced probability to deal with curse of dimensionality and whatnot.

edit: I started studying the book linked above starting from chapter 4 since the first 3 chapters are familiar from discrete math. Then did chapter 5, skimmed chapters 6(little linear algebra), 7, 8 (most "transition to higher math" books contain this stuff) and am currently in chapter 9.

By most books I assume you mean pretty college US text books. I have no experience with them, are they really that bad ?

My previous college calculus was far from the rigour of Rudin but also far from a cookbook flavour. Unless by 'cookbook' you mean the chain rule and differentials of standard forms. It's not that hard to teach this stuff at least it's no harder than, say, geometry. I found trigonometry far more difficult.

We were first taught limits then came differentiation of polynomials using infitesimals.Chain rule was introduced. Then we ventured into differentiation of other functions. Integration was first introduced much like riemann integrals then came integration as an inverse of differentiation.

The majority of the stuff you need from calculus for deep learning doesn't rise to the level of real analysis. Real analysis is worth doing if their are benefits to Fourier transforms on your data sets in the domain you're working in but otherwise has good payoff for studying more math rather than studying more deep learning.

Broadly speaking, I want to read books like [1]. It looks like they use quite a bit of advanced nondiscrete probability. Since I prefer books written in definition - theorem - proof format anyway, I figured I might as well get analysis out the way :)

[1]http://www.cs.cornell.edu/jeh/book%20June%2014,%202017pdf.pd... (Foundations of Data Science by Bloom/Hopcroft/Kannan)

Combinatorics is significantly more difficult than the level of calculus required to understand "machine learning" (i.e. gradient descent).

Also generating functions are kind of combinatorics + calculus.

Perhaps Khan Academy can help you out with a little effort? https://www.khanacademy.org/math

Or consider learning calculus.

This... It's not like rocket science when u already know math. Just read cs229

To me as an AI novice this article seemed like a good overview. It did not address one problem I have with AI however, which is the inherent lack of transparency. That is, unlike normal programs, we have an input and an output, but the reasoning in between is a black box to human intelligence. This problem has to be solved before we can turn over any really vital tasks to AI with confidence IMO.

It's not a 'black box'. We know what goes in: matrix multiplication and simple math, in millions of similar units. We can also probe the net to know what a neuron does, or what a specific configuration means - but it might be disappointing - each individual neuron or weight might not have much meaning on its own and even if it were removed, the neural net world work just as well (we can remove almost 95% of the neurons and still make it work).

We can also perturb the inputs to study how the outputs depend on them. There are many things we can do to probe a neural net. We might not be able to say for sure what situation will make it fail, but we can try it with millions of tests to see how good it is. That's how neural nets are rated, actually - by testing on loads of new data, they haven't seen during training.

On the other hand, human brains are really black boxes. We know much less about how they work or how they reach a conclusion. The fact that we can self-report is not the same as knowing how a human would behave in any reasonable situation. We don't actually know how people will behave. And yet, a few people do have access to incredible power to do harm to humanity. We have had to live with that fear since the invention of atomics and genetic viruses.

So I don't think the argument that "neural nets are black boxes" is so powerful. As opposed to what, I would say? Even our dear president is such a black box. We have no idea how it works, or what he will do next.

That's a good and deep point. Let's consider the case of writing an algorithm to drive a car. There are some ideas people have had:

1) What really matters is what happens in the worst case, so we need explanations in the worst case, but not necessarily the rest of the time: when the car is choosing a slightly more efficient trajectory, we don't need an explanation of why it 'chose' to do that. In the case of a crash, though, we'd like to know precisely what went wrong. This suggests, perhaps, that we could have a simpler and more transparent fallback system that usually is uninvolved in driving, but that takes 'responsibility' and has the ability to control the vehicle at a minimum level of safety (rather than efficiency).

2) Humans seem to produce make decisions first and explanations later http://www.skepticink.com/tippling/2013/11/14/post-hoc-ratio...: 'To the question many people ask about politics — Why doesn’t the other side listen to reason? — Haidt replies: We were never designed to listen to reason. When you ask people moral questions, time their responses and scan their brains, their answers and brain activation patterns indicate that they reach conclusions quickly and produce reasons later only to justify what they’ve decided.' so perhaps we could also train an ML agent to produce explanations that we find persuasive?

Agree, the need for transparency occurs mainly when something goes wrong. But it is not also a question of control? Without human understanding there is no human control, with no human control we have HAL ... which sort of answers your second point: no, we should not let AI come up with it's own explanations that we just find persuasive.

Future cannot be defined for a any stochastic process with unknown number of hidden variables with unknown weights. Everything core complex than a roll of a dice is unpredictable by definition, just because the model is incomplete. Even next roll of a dice cannot be predicted.

Probability is not reality. Simulations are cartoons. Map is not the territory. Models and simulations based on them are different from reality in the same way a movie is not reality.

People who can't grasp these simple ideas cannot be legitimately called scientists.

You don't need perfect simulation, just a good enough one. Simulation is like a dynamic, extensible dataset. Neural nets can learn from simulation not only in game play, but also in genetics, robotics and general reasoning. I think simulation is at the core of what will lead to AGI.

My comment was is about inevitable philosophical gap between reality and any model or a simulation, which cannot be bridged in principle.

Yes, simulators as sources of sensory input to get similar experience are used, for example, to train airline pilots. Nevertheless, there is no airline which train its pilots on a simulator only. Algorithm, like a pilot, would learn a simulation, not reality.

That's true. I was thinking about using simulation as an addition to being embedded in the real world. Simulation is necessary to plan complex actions based on reinforcement learning (model based RL). When AI can do simulations, it can check a few steps ahead to see how things will go and select and act in a proactive way as to maximize rewards. When interacting with people, the AI will need to model people's state of mind and knowledge, in order to infer how they would act and react. When using a device, an AI would need to control it and know what to expect from it. In almost any non-trivial interaction, AI needs to simulate in order to adapt to situations. Since it is impossible to rewind reality (like a game) in order to try another action, the agent needs to do that in its imagination (simulator).

By the way, humans are good for simulating some things, but quite bad for others, and it still doesn't stop us from being the most intelligent agents. I think AI only needs to simulate a little ahead in order to act much more intelligently than today, because today, AI is mostly reactive, or feed-forward, like a simple reflex.

I'm a PhD candidate related to statistical forecasting and projection related to natural science (hydrology/streamflow projection under climate change uncertainty).

The article seems to be a decent introduction article that shows what Machine Learning is about (which is great) and shows how it can POSSIBLY be applied to forecasting and prediction. However, I think it would be even better if there was a simple example or two with each method being applied and showing different outcomes and then the significance of each methodology through those examples.

Also, I'd like to add a comment to this.

This article is great when you look at it from the Machine Learning perspective. However, when you look at it from a forecasting perspective it only shows a very small portion of what forecasting/predicting really is.

Algorithms you develop through machine learning is something known as a black-box model. You know that the input data and the output data you're matching up with are related somehow, but don't know exactly how they're matched up. That relationship is established based on a performance index determined from a trial-and-error method (of course depending on what actual method you use).

There are different methods available such as ARMA and ARIMA based models. In regards to physical science, there are models that focus on the physical interaction between the input data to simulate what is happening inside the system. ML methods are simply just a taste of other methods available.

Regards to programming use (as I'm sure most of you folks here are used to), ML is a good tool to use for forecasting if you're really interested in it. But just like any forecasting model you use, you should probably determine the performance of your model based on not one index but multiple indices which consider different parts of your "needs". Percent accuracy only shows how accurate you are, you should probably also consider how frequently you're over-estimating vs under-estimating, how many series of overestimation there is, etc. The most important one though in my books is bias correction. When working with ML Algorithms most of them do not consider for bias. So you, as the modeller need to prepare for bias correction. However this article kinda glossed over it by saying "right amount of data" and "combination of data" (which I understand is an introductory post, but I think this is very important).

Maybe look into applying a method like the K-Fold Cross Validation to make sure the final output parameters aren't AS biased. It really depends on the modeller and the model and your performance indices you use.

"Fundamentally it is software that works like the brain". I stopped reading there, no good thing can be in any document that contains this phrase*

* Apart from this one

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact