________________________________________________________________________________
Which book should you read?
https://www.bradyneal.com/which-causal-inference-book
It is worth mentioning that the "Causal Inference: What if" book in the flowchart is a free download. The dead tree version is still in the works.
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-...
It's good to see people working on this. They're focusing on the right part of the problem, too - predicting a few seconds ahead in a physical environment. Basic survival is about not really screwing up in the next 10 seconds. (People who do robotics are very aware of this.)
Don't get tangled up in the philosophy of causality. That's not the immediate problem.
As someone thoroughly tangled up in said philosophy, I concur. Here be dragons.
Focusing on the short term also seems like it'd make paperclip/Skynet scenarios less likely.
Species survival requires looking much farther ahead, but it’s harder to do, and even harder to monetize.
I recommend "The Book of Why" by Judea Pearl as an introduction to the topic. It is targeted at a general (but educated) audience.
The Epilog to his book Causality is a reasonably short read and does a good job of introducing the concepts imo. It's basically the same content he used in his Turing award speech and similar presentations over the last decade or so:
http://bayes.cs.ucla.edu/BOOK-2K/causality2-epilogue.pdf
For some shorter reads, I saw this recommendation for biology undergrads yesterday:
> I would really love all biology students to read Elliott Sober's "Apportioning Causal Responsibility", Susan Oyama's "Causal Democracy and Causal Contributions in Developmental Systems Theory", and Richard Lewontin's "The Analysis of Variance and the Analysis of Causes".
https://twitter.com/hanemaung/status/1321488068717719552?s=2...
These are available through research gate for those without library access.
(First-order) causality actually isn't hard to determine in environments where first-order effects dominate. The way to determine causality here is through a combination of physical laws and controlled experimentation [1]. In fact, we have plenty of causal models (e.g. 1st principles physics-based models, or design-of-experiments models). Without these models, machines/control systems/etc would not work.
The trouble is, outside of these 1st-order effect dominant, deterministic environments, causality becomes much harder. In complex systems, stochasticity, nonlinearity, feedback loops and higher-order effects dominate. There's also emergent behavior -- properties that are true in the small are not true in the large.
Consider a complex system like human society -- can we truly determine causality of broad interventions? Likely not in a first-order way like in the physical sciences. We can do it imperfectly through tools like causal inference (Rubin) which makes much more modest claims about the "strength" of effects (average causal effect). Randomized Controlled Tests (RCT) is another tool for making causal claims.
But in a complex world, 2nd, 3rd and higher order effects dominate and so the notion of root causes itself becomes ambiguous. Richard I. Cook once said "post-accident attribution to a 'root cause' is fundamentally wrong". Though humans are attracted to the idea of a chain of simple causes (which is why we have the myth of Mrs O'Leary's cow kick over a lantern and starting the Great Chicago Fire of 1871), there's typically no easily-identified root cause. First-order causal thinking assumes a Directed-Acyclic-Graph (DAG) idea of a causality chain which converge into a set of effects, but the reality is that such a DAG, if it can be represented, is likely to be infinitely complex in a complex environment.
First-order causal thinking is an insufficient mental model in a complex environments.
Instead, I think instead of aiming for a deep understanding of epistemic causality (where we try to know and represent causality), it's probably more useful to focus on instrumental causality (where we aim to know the main points of leverage that are effective in changing the system). I think we'll likely get very far just by finding the knobs that have the most effect on the variables we would like to change (that don't also simultaneously change variables that we wouldn't want to change).
[1] to determine causality, we typically have to perturb the system -- determining causality through observational data is possible, e.g. via natural experiments, but there are many epistemic restrictions which limit the claims that can be made.
Where is a conditional average treatment effect insufficient? We want to say that, seeing X right now, if we do a1, we'll see a change of d (+/- D), compared to doing a0. Or being able to predict y1 and y0 corresponding to actions a1 and a0. That would be a huge step, and something some of ML's most useful methods can't do.
DAGs are not the only valid approach to causality, nor do they subsume other approaches (I have previously commented on this during the Gelman&Pearl debates).
As you suggest, the treatment effect and econometrics literature currently is on a semi-parametric trend: Given that they don't believe that one can actually produce a believable complete causal model (or DAG), one tries to estimate a treatment effect that does not depend on parametric or functional assumptions.
> DAGs are not the only valid approach to causality
It has been a truism to some that "most efforts in policy are responses to previous efforts in policy." In SPC you can measure noise and overcontrol. With a long-lived policy-making institution [unusual?] honesty might admit to facing effects of previous well-intentioned policies.
> to determine causality, we typically have to perturb the system
Couldn't agree more! For computer models to understand causality, it must be able to interact with the environment and probe it. I think understanding causality is one and the same as reinforcement learning, where a computer model learns to interact with its environment
The majority of the books listed in the first link in these comments are about whether and when we can determine causality given observational data - data where the system is perturbed, just not by us (and not as we would ideally like).
I have not looked at the links, but aren't there people who use Bode plots to estimate direction-of-causality, given time-series data? IIRC there are basic relationships between wide-frequency phase behavior and, e.g., impulse-response functions. There is apparent phase information beyond correlation and ANOVA.
But many datasets will be full of recorded perturbations already. Can't you usually hunt for the evidence you need in the data you have?
The Michelson-Morley experiment was enough data to get special relativity, for example.
Einstein was inspired to come up with special relativity from that experiment because he had the relevant concepts to draw upon to think through various thought experiments. That and the math to back up his intuition.
Yes, but Einstein was able to learn cause and effect by interacting with the physical world as a child. Without being able to interact with an environment, the I believe the ability to learn causality is limited if not impossible
Alternatively, we need to extend our typical notion of a dataset to include where it came from/how it was perturbed. If you give your best statistician a dataset without column names and a description of the experiment, they aren't going to be able to do causal inference. We need to make those things more available to the machines too, in a structured way.
I was under the impression understanding causality is like, the secret of space, time and existence.
To understand causality, you must first have understood causality.
https://en.wikipedia.org/wiki/Unmoved_mover#First_cause
One should mention the anthropic principle in this context. This would then lead one to speculate about a connection between observing causal order and self-awareness.
I have a fun story about causality inference. I pressed a light switch and immediately thunder crashed. I pressed it again, no sound was heard. Followed by nervous laughter.
I take this to mean that we have a notion of "effectiveness", and that consequences are attributed to preceding effective actions.
There have been a lot of advances recently.
https://arxiv.org/abs/2010.12237v1
Why does anyone think AI can pick up on causation when humans can't even do it?
I think it's clear that humans at least try to pick out the causal factors, and reason causally about the outcomes of specific actions. Non human animals do too; they can learn to take specific actions for food rewards in a Skinner Box for example. Now I agree humans get it wrong a lot of the time, and other animals might get it wrong even more than we do.
I don't think our fallibility in causal reasoning makes it useless to pursue as a goal in artificially intelligent systems. It doesn't need to be perfect, just useful and better than not having it. Afterall, our perception systems are pretty fallible too, otherwise things like optical illusions wouldn't exist.
Humans can and we are really good at it.
I suspect that you made that comment because you think in the terms of rigorous detection of causality, not everyday effective detection of causality heuristically.
Nowadays yes. For most of history though (and still commonly today), causal explanations for plenty of things involved some flavor of the supernatural.
Yes - humans do it abductively, which isn't rigorous but serves our purpose most of the time. You can't trust a self driving system with abduction though - it's the style of reasoning that gave us rain dances and homeopathy.
>You can't trust a self driving system with abduction though
All wetware based self driving systems on the road use heuristics in their wetware system.
You can't hope to have self driving without heuristics. That's what deep learning is.
I'm sure there was a philosopher who said that his deepest wish was to know just one cause. Been trying to find out who said it for ages with no luck, so maybe I'll just attribute it to myself.
David Hume perhaps? He reasoned that causality was not empirical and therefore was a habit of thinking humans acquired from constant conjunction of events. Kant was troubled by that so he elevated causality to a category of thought, like space and time, which the mind used to structure sensations.
+1 to Hume. He described causality as being a glorified expectation which is subject to change at any time.
The area is under-explored because AFAIK there is no introduction for someone adept in machine learning to learn the basics with coding projects in for example pytorch and showing how causal inference is bringing added benefit. If you want to advance causality research, lower the barrier to entry just like it has been done with deep learning.
it wasn't too long ago that interpretable models are seen as unimportant in the field. the value of prediction is so much more valuable than any other result of machine learning application. explanatory models were considered fuddy-duddy econometric pseudo-science.
... and humans, tbh.
Humans dont understand causality at all and yet they are considered intelligent though
Anyone working on GPT-4? it might figure some of that out on its own?
The article is missing an important point: you _cannot_ learn causality from observational data alone. It's not about shortcomings of this or that model, it's a theoretical impossibility.
Reinforcement learning is uniquely positioned to build machines that understand cause-and-effect on their own because the algorithm is allowed to interact with the world, observe the results, gather more data, rule out hypotheses, and so on.
That's not correct.
1. First let's get through the easy part: reinforcement learning (RL) is _not_ unique in its ability to identify cause-and-effect - this was achieved long ago through the use of randomized controlled trials. RL merely streamlines the task of _reacting_ to such information (as well as optimizing experiments w.r.t. a desired goal).
2. Now the trickier part: you _can_ learn causality from observational data alone if you combine this with understanding of a mechanism. Indeed, the whole field of causal inference is an attempt to formalize and extend such methods.
There is a massive space of problems where experimentation (whether old-fashioned A/B testing or more sophisticated online learning) is simply not possible, whether because of ethical reasons, cost, or other reasons behind non-destructive study. These problems are common in medicine, economics, physics. In such problems the only data is observational. Causal inference is very valuable here.
You are completely right, I jumped through too many hops to reach that conclusion, and I was very imprecise in stating it.
Let me try again: the most interesting and frequent setting is where you cannot control the experimental conditions and you have no idea about the causal mechanism at play. And that is where traditional ML cannot help, but RL can.
What you are describing seems to be essentially the process of science: Develop, step by step, the causal mechanisms at play until you solve the problem.
It should not be underestimated that our collective knowledge of causal inference with statistical methods is good, but still improving.
I mean, in that sense, ML is helpful. Heck, it is already used in causal inference in two-step estimators and the like.
While CRTs are the gold standard for establishing causality, that doesn't mean all is lost in observational data alone. Causal inference is a rich field, with a lot of work in recent decades. There's so much more to causal inference than the old dismissive chesnut about correlation.
That's not true. Look up doubly robust estimators for a neat counterexample. Or IPTW for an extension of a dataset that can help with this (X, t, y, and propensities, instead of just X, t, y). Causal inference has a super rich literature that's quickly growing.
Or better yet (stretching and sidestepping how you meant it), give your friendly neighborhood statistician/econometrician just a dataset and they can't do causal inference. Give them propensities and column names/descriptions and a writeup of the experiment/where the data came from, and suddenly they might be able to do causal inference. It points to a need to augment our observations with more structured metadata, if we want to do causal inference with data that's just lying around.
Can you elaborate on this? If agent A interacts with a system and can learn causal relationships, how could agent B who observes all of agent A's experiments not be capable of drawing the same conclusions?
It seems that any theorem that rules out learning causality from observational data alone would also rule out learning causality from any kind of interactions.
Unless you're assuming that agent A "knows" it has free will so its own actions have no cause, while agent B can't tell whether the environment caused agent A's actions or vice-versa. But if that's what the proof hinges on, it's pretty shallow, because agent A has no such guarantee that its own choices have no root cause.
Sorry, I was quite sloppy in my previous comment. Agent B certainly can learn everything that A learns from the same observations.
What I meant is that you cannot learn from "general" observational data, unless it is structured in a certain way (the randomized controlled trial mentioned in a sibling). RL is able to gather data on its own, while other ML methods must do with what they are given. This means that RL could eventually discover the causal relationships, while non-RL cannot (except if the data comes from a RCT).
Do I understand you correctly that you mean: "Learn automatically without further human input?"
Because in statistics, causal inference is certainly possible without RCTs.
Thank you for the follow-up! That's much clearer.
>you cannot learn causality from observational data alone.
As others have pointed out, this is kind of the point of much of Pearl's work on causality. Specifically, do-calculus provides a set of primitive operations that can be used to convert queries in interventional/counterfactual (causal) distributions to estimands in a purely observational distribution.
Agreed. But even RL is often difficult to transfer to other domains of knowledge.
Though I agree that this is an important next challenge (and causality has been the "next challenge" for at least the last 5 years), it's often more easily solved these days with mixing in human expert knowledge to the equation (that is, using ML alongside of human expertise).
What of astronomy, then? I think it is possible, but certainly more difficult, to infer causality from observation.