
Machine Learning Can't Handle Long-Term Time-Series Data - jocker12
https://www.lesswrong.com/posts/N594EF44CZD2aGkSh/machine-learning-can-t-understand-long-term-time-series-data
======
mjburgess
Time is only a symptom of what's missing: causation.

ML operates with associative models of billions of parameters: trying to learn
thermodynamics by parameterizing for every molecule in a billion images of
them.

Animals operate with causal models of a very small number of parameters: these
models richly describe how an _intervention_ on one variable _causes_ another
to change. These models cannot be inferred from association (hence the last
500 years of science).

They require direct causal intervention in the environment to see how it
changes (ie., _real_ learning). _And_ a rich background of historical learning
to interpret new observation. You need to have lived a human life to guess
what a pedestrian is going to do.

If you overcome the relevant computational infinities to learn "strategy" you
will still only do so in the narrow horizon of a highly regulated game where
causation has been eliminated by construction (ie., the space of all possible
moves over the total horizon of the game can be known in an instant).

The state of all possible (past, current, future) configurations of a physical
system cannot be computed -- it's an infinity computational statistics will
never bridge.

The solution to self-driving cars will be to try and gamify the roads:
robotize people so that machines can understand them. This is already
happening on the internet: our behaviour made more machine-like so it can be
predicted. I'm sceptical real-world behaviour can be so-constrained.

~~~
iandanforth
The causal argument suffers from a problem of nomenclature.

On one side we have the colloquial understanding of cause and effect where a
cause is a true impetus of effect. On the other side we have "causal" learning
in biology where you're not actually learning causes, just strong
correlations. We can learn just about any temporal association even if there
is no direct cause-effect relationship. Random reward structures are a way to
illustrate this: present a reinforcing stimulus to an animal at random times
and a random subset of behavior will increase in frequency. The animal
develops a false "causal" belief that a series of its actions is influencing
the presentation of a reward.

That's why I like focusing on "sequence prediction", even colloquially we know
predictions can be wrong. Those predictions can be influenced by low-d world
models, but you don't accidentally elevate that model to claim a
pure/symbolic/accurate model as can happen with incautious use of the words
like "causal."

~~~
mjburgess
The key element here is _intervention_. Animals learn by changing their
environment.

Superstition in pigeons arises because they believe their actions cause the
reward, it isnt "mere sequence". Any distribution over two variables observed
overtime, _for all time_ can change unpredictably given an environmental
change.

Animals have rich models of objects and their behaviour over time, these
models aren't "sequential", and they are brought to bare on deciding whether
mere sequences should be regarded causally.

~~~
iandanforth
'Animals have rich models of objects and their behaviour over time, these
models aren't "sequential"'

This strikes me as patently false, which means I'm probably not understanding
what you mean. What does forward simulation mean if it isn't sequential?

------
nabla9
This is clever crackpottery type brainstorming from a smart person.

The author has extremely grand set of connections he developed. It ties down
Buddha, enlightenment, vipassana meditation, artificial intelligence,
cybernetics, fractals and neuroscience. Nothing wrong with that, of course.

Creative thinker should have these kind of crazy ideas and connections every
day or at least once a week. I carry with me a notebook that is full of them.

Most ideas die as 'premature babies'. They may be interesting to think and
write down, but they are not fully developed and never fit together as well as
you initially thought. Filtering and piking some of them to work with is
important. Giving them up is the difference between crackpot and non-crackpot.

Forcing grand connections prematurely makes this crackpottery type. Sharing
the creative brainstorm in an essay that does not try make up connections
would be easier to read.

~~~
tgflynn
The line between crackpottery and genius is a fine one. If blogs had existed
in 1900 and a certain patent clerk had written a post on his ideas about clock
synchronization somehow being related to electromagnetism, many would have
dismissed him as a crackpot as well.

The questions this article relates to are among the most profound and
difficult that human reason has ever attempted to confront. I think one should
be careful in labeling such ambitious speculation as crackpottery just because
it doesn't yet amount to a fully coherent and formally testable theory.

~~~
throwawaymath
Okay...but Einstein didn't write a blog. He published a paper for peer review.
Some of his ideas remained controversial for decades, but he had a
sufficiently mature, cogent and well-specified theory that he could at least
work through hypotheses and publish results.

~~~
tgflynn
He published that paper in 1905. Is it absurd to think that if the Internet
existed in his time he might have blogged about his preliminary ideas before
publishing a formal paper ?

~~~
throwawaymath
Absurd is a really strong word. I will say that I really doubt he'd blog about
his ideas instead of just publishing them, even on arXiv, because all the
examples of groundbreaking new work in the modern era have been blogged about
contemporaneous with, or after peer review of formal papers.

I'll also go further and say that, while there's a kernel of validity to your
analogy, it's not the right analogy with which to deliver your overarching
point. I don't think the publishing method for one of the most significant
scientific advancements of the previous century is a particularly good lens
for analyzing this blog post.

The critical content of this post is far below the threshold usually
associated with an idea sufficiently well formed to be publishable. Einstein
had a minimum viable theory before he solicited feedback; and when he did
solicit that feedback, it was through what we'd consider orthodox channels.

~~~
tgflynn
What you say is true.

The main point I was trying to make was that given a speculative post of such
breadth, which touches on such difficult issues as AGI, how the brain works
and perhaps even the nature of conscious experience, and which makes some
claims that are at least interesting, I think it's quite presumptuous to
assert that these ideas are all nonsense without a deeper exploration of them.
I certainly would not want to make such an assertion, despite being troubled
by what I think are some inaccuracies in the author's description of certain
physical concepts.

Now a secondary issue is that it is true that as far as I'm aware major
scientific discoveries have typically been initially published in much more
developed form and have thus been the work of a single individual or of a
relatively small group of closely affiliated individuals. I'm not convinced
however that this historical model of very small scale scientific
collaboration is necessarily the only one nor the best one in light of modern
means of communication.

It seems at least conceivable to me that there is a possible future in which
the following hold:

* There is some kernel of validity in this author's ideas.

* A small number of other people find them intriguing and choose to collaborate with the author to further elaborate them.

* This collaboration leads to major progress in our understanding of one or more of the areas mentioned above.

For me the, admittedly very small, likelihood of such an outcome, justifies
the author's post and its appearance on HN.

------
skunkworker
This article seems out-of-date by 5 years or more even though it was published
today, and I am unsure as to why.

It calls out long short term memory but doesn't mention recent (last 5 years)
improvements like Gated Recurrent Networks (GRUs) or Transformers (GPT-2,
huggingface/transformers) which have shown significant improvements over the
traditional LSTM model. These can handle time series data much better than
older models could.

~~~
hnews_account_1
Any links to heavy time series based machine learning algorithms? I'm in
finance, and while I know how to establish and run a random forest or gradient
boost regressor using standard libraries, I've never had a good handle on
them.

~~~
cbsmith
Most everything Eamonn Keogh publishes:
[https://www.cs.ucr.edu/%7Eeamonn/selected_publications.htm](https://www.cs.ucr.edu/%7Eeamonn/selected_publications.htm)

------
joe_the_user
This claim seems plausible.

The reason seems even simpler than the article. Deep learning requires lots of
training data - that data naturally needs to more or less be "the same";
follow "the same" logic.

A long enough time series is going to involve a change in the logic of the
real world, a change that the network won't be trained for.

~~~
s_Hogg
Deep Learning doesn't necessarily require a huge amount of data. What DL does
is allow you to fit more complex relationships between input and output than
would be the case with, say, a bog standard linear model. If a relationship is
complex and also clearly defined in your data, then you don't necessarily need
much. In general it's true that may not be the case, but that doesn't make
"deep learning requires lots of training data" true, only that "the data used
for deep learning models is typically noisy on top of representing a complex
relationship".

It's a semantic difference, but a very important one if we're to avoid going
down the road of just mindlessly throwing compute at every problem. And if we
do that, we'll just wind up with millions of Rube Goldberg machines instead of
actually solving problems.

The change in logic of the real world thing is absolutely spot on, though.
Over enough time it becomes basically impossible to disentangle effects.

~~~
joe_the_user
_Deep Learning doesn 't necessarily require a huge amount of data._

References? I mean, I know "one shot" and similar approaches but as far as I
know, these involve extending a neural network that has been already trained,
on massive data, to a little bit more.

------
scottlocklin
Machine learning does just fine to _extremely well_ at long term time series
data; there are entire branches of machine learning dedicated to this. The
fact that this imbecile never heard of these tools is why nobody should be
reading his essay.

Uber's engineers didn't do this for their human finder because;

1) Image recognition stuff isn't explicitly built to do this (though it easily
could be jury rigged to do so)

2) Uber's engineers apparently never heard of the concept of "moving averages"
and "threshholds" which would have worked just fine.

"More precisely, today's machine learning (ML) systems cannot infer a fractal
structure from time series data."

-look at this idiot using words he doesn't understand. Muh fractals.

~~~
justapassenger
“Imbecile”, “idiot”? Please refrain from personal insults as they add nothing
to the discussion.

~~~
scottlocklin
"Imbecile" and "idiot" were measured and reasonable adjectives for the
gibbering nonsense published above. The drooling lackwit who wrote this should
be tarred and feathered for such frippery and nonsense.

As I said above; machine learning does just fine to extremely well at long
term time series data; there are entire branches of machine learning dedicated
to this.

------
cbsmith
I'm feeling like this is entire missing the whole world of Matrix Profiles and
Time Series Chains...

------
NPMaxwell
The article I would like to read is what the challenges are to including a few
prior states in navigation. I'm amazed that, when I drive over or under a
bridge, my online mapping software changes instructions as if my car were able
to levitate 20 feet onto the roadway above or below, even when that roadway is
a highway without exit or entrance within a mile.

------
longemen3000
I remembered that neural differential equations are better suited to represent
time series data, I saw them being used a lot in pharmacological processes,
any additional idea or insight related to this?

