Hacker News new | past | comments | ask | show | jobs | submit login

From the linked article:

> The method, in the form of an algorithm, takes in data that have been collected over time, such as the changing populations of different species in a marine environment. From those data, the method measures the interactions between every variable in a system and estimates the degree to which a change in one variable (say, the number of sardines in a region over time) can predict the state of another (such as the population of anchovy in the same region).

I also read the introduction of the paper. Maybe I misunderstood something about causal inference, but I thought from data alone one could only infer correlations or associations (in general). To talk about "causal" links, I thought you need either to assume a particular model of the data generation process, or perform some interventions on the system to be able to decide the direction of the arrows in the "links" in general.

I'm not saying that the paper is wrong or anything, it looks super useful! It's just that one should be careful when writing/reading the word "causal".




That's the first order truth. If you don't have any knowledge about the system, you can't infer causality with observations alone.

With some generic assumptions, or prior knowledge about the system, you can do causal discovery.

For example, just the assumption that there is additive random noise enables discovering causal arrows just by observing the system.


Any citation on the additive random noise?


I was not aware of the additive noise part. I will have to look into that, thanks for the info!


Correlation also assumes model of the data generating process, but you are correct in thinking that talking about causal links imposes even stronger assumption on the model and data structure for making inference. And then further you have to take a very narrow and convenient interpretation of what causality means (e.g. can't be at the actual level of individual samples, can't manifest through cycles or loops in the variables etc), which is even more of a vexing philosophical question than even the thorny questions in classical statistical inference


You are correct, as far as I know. I'm wondering if there's some sense in which one can infer such a model from conditional correlations.


You can always restrict the meaning of "causality".

E.g. Granger casusality means that A is typically detected before B and not the other way around (so not mere correlation). It's a moby useful concept.


"collected over time" is the operational phrase. You should be able to determine causality, at least partially, if you know the change in correlated variables occur at different times.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: