Building Tomorrow explores the ways technology, innovation, and entrepreneurship are creating a freer, wealthier, and more peaceful world.

Aug 23, 2018

‘Correlation Isn’t Causation’…Except When It Is

A Review of The Book of Why

By moving beyond a basic understanding of correlation & causation, we, & the AI systems we design, can better understand why things happen.

In their day to day lives, people do a decent job of figuring out what causes the things that happen to them — this mental power has done a lot of good for homo sapiens. But the theory around causation remains problematic. As a former critical thinking instructor, I know all too well what happens when you invite students to think more analytically about causation. No matter the observation, 30 sophomoric voices will chime in unison, “Correlation doesn’t equal causation!” And as a producer of light science journalism, I also know what you have to do when confronted with such an audience: endless qualifications and pussyfooting. Those who dare to make causal claims (especially in front of polarized internet audiences) will be asked to construct a complete metaphysical justification for any argument from the ground up. There has to be a better way.

So I was pleased to hear about The Book of Why, by computer scientist-statistician-philosopher Judea Pearl and mathematician-science writer Dana Mackenzie. This book can be viewed in many different ways. On one level, it offers a rollicking historical introduction to statistics. On other another level, it’s about artificial intelligence and the limits of “deep learning.” Along the way, it offers new explanations for classical paradoxes of probability. By the end, you even get a theory of moral learning.

Though Pearl & Mackenzie insist the book is for non-expert readers, I, despite years of philosophy and calculus under my belt, found it challenging in places. Yet The Book of Why is a completely appropriate next stop for anyone thoughtful who’s become ambivalent about making specifically causal claims, but who’s not willing to refrain from ever making them either.

According to Dr. Pearl, the field of statistics itself has gone through the same phases as my critical thinking students. When statistics was in its infancy, it became expedient to eliminate the messiness of causality per se or to explain causality away as a special, limiting case of regular correlation. For instance, if we can empirically observe that a rooster always crows when the sun rises, what does it really add to further specify which causes the other? Perhaps it only muddies the situation.

Unfortunately, this simplification of statistics calcified into ideology. That’s a problem, because “while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes” (p51) — and we do often need the latter. If you already have good reason to believe that the rooster crowing doesn’t cause the sun to rise (and nothing depends on it anyways), then there’s no real harm in representing that relationship as a 1-to-1 correlation.

But at times, the correlations are obvious and the causal relationships are exactly what we want to find out. Poverty and crime sure seem related. But does poverty lead people to crime or does crime lead to jail time, which breaks apart families and causes poverty? Or does lead exposure cause both low IQ (which causes underemployment) and criminal tendencies? Or maybe disordered environments are the culprit. We need a way to tackle these questions without relying too heavily on the assumptions we bring to the data, because those assumptions are loaded and unreliable. But regular old statistics isn’t it.

Thus, according to Pearl, statistics has outgrown its toolbox. “Big data” — both the sets and the tools that work on them — are important and valuable. But crunching more and more data with more and more computing power never wrings causal blood from the observational stone. Multivariate statistics can begin to disentangle things like multiple risk factors for poverty or health conditions. These findings are real and valuable. They can provide a certain degree of action-guiding information for individuals and governments and point the way for future research.

Unfortunately, sometimes statistical information on correlations raises as many questions as it settles. An observed correlation between vandalized environments and crime didn’t settle anything, and for 20 years people have argued whether “broken windows”-type policing works. If it works, then (setting aside moral issues) more resources should be dedicated to it. But if broken windows policing mistakes consequences of crime for its causes, then the money is wasted. Either way, there is a human cost to getting things like this wrong. The situation is multiply confounded by other variables, including unobservable or unknown ones, and randomly controlled experiments aren’t possible. What now?

Moreover, the language of statistics doesn’t even have any way of notating causal relationships (you can’t study what you can’t talk about). Instead, to solve very real problems—of medicine, public policy, and more—we must use tools like Pearl’s “Causal Ladder.”

At the lowest level of the Ladder of Causation, we find association. To discover associations in the world, we just look. The act of observing has no causal impact on the objects and processes being observed. Noticing the coincidence of crime and poverty exists here. 

One rung up on the Ladder of Causation, we find intervention: questions like “if I do X, will Y happen?” Intervention is about “wiggling” one piece of the world and checking if something else wiggles in return. When NYC police commissioner William Bratton famously launched his program of broken windows policies, he conducted an intervention.

Counterfactual claims rest at the top of the Ladder of Causation, on the third level. Unlike observing and intervening, counterfactual activities are necessarily intangible and imaginary. On Pearl’s view, that counterfactuals aren’t observable doesn’t make counterfactual knowledge impossible. Well-grounded claims about causal relationships give us a richer understanding of the past as well as the present, plus a heightened ability to predict the future. Perhaps another type of police work or social program (e.g. lead abatement) would reduce crime more than broken windows policing did. With enough relevant information about the choices and the situation, we can make this counterfactual assertion about crime prevention confidently.

On the one hand, trading in counterfactual claims seems so normal for humans. On the other hand, skepticism about knowing what didn’t or won’t happen abounds. So where can counterfactual knowledge come from?

You can capture information about interventions in causal diagrams: drawings within which causes lead to effects via arrows. Plus, to make a long and technical story about the “do-calculus” absurdly short, Pearl has developed techniques for formalizing the information captured by a causal diagram and making it mathematically manipulable. At a very high level, the causal tools that Pearl has helped to develop and now describes in The Book of Why provide an algorithmic way to combine qualitative and quantitative information on probabilities.

After all, you go to causation problems with the data you have, not the data you want. Sometimes you genuinely need to do a randomized controlled trial, but sometimes you actually don’t. For instance, Pearl (and his disciples) offer a technique for determining external validity. This can help researchers to decide whether the results they got from an experiment may be applied to some other group on which we have only observational data (aka whether the experiment has “transportability”). Related work on “meta-synthesis” offers ways to combine data from more than one study, when the studies were conducted on different groups, in an attempt to find the effect that you’d get with still another group.

Like the vast majority of The Book of Why’s intended audience, I am unqualified to provide an informed critique of this portion of the book. I will leave that to more technical experts (for instance, see this review by a data scientist).

But there’s plenty to say about what the do-calculus might mean for the general public, assuming that the Causal Revolution comes to fruition. Let’s just stipulate that, in due time, relevant empirical scientists (epidemiologists, meteorologists, geologists, biologists, economists) and relevant facilitators (statisticians, computer scientists, etc) will take up the do-calculus as a method of analysis. Then what?

Pearl’s compelling examples of causal tools being brought to bear on policy issues look backwards, of course, and hindsight is 20/20. For instance, Pearl repeatedly describes mid-20th century attempts to disentangle the effects of tobacco smoke on human health. At first, the very obvious correlation between smoking and lung cancer remained open to alternate causal explanations. Perhaps some gene variant itself causes both smoking and lung cancer!

Debaters at the time didn’t yet have the do-calculus or sophisticated tools for statistical analysis. Still, some of them did eventually reason using a causal consideration (now called “Cornfield’s Inequality”) to figure out that the strength of the smoking-lung cancer correlation was much greater than could be accommodated by an unobserved genetic confounder. This was correct, and a critical mass of experts shifted over. Pearl’s self-proclaimed Whiggishness shows a bit here. His version of the history of statistics and AI leads irresistibly towards his team’s super-powerful causal techniques, but this plot arc does feel a bit too good to be true.

Thanks to these examples, The Book of Why is about as much of a page-turner as a book in its genre could be. The history is complicated and the math is thorny. Pearl treats each with aplomb, at least for the purposes of the intended reader. But The Book of Why makes the commonplace mistake of breezing through some of the most provocative (i.e. moral) claims, especially towards the end.

Take another prime example: climate change. Big Data has made some inroads here, but all the data points in the world can’t congeal into useful (i.e. predictive) models unless we pin down the causal relationships between various environmental factors. Unfortunately, the causal relationships are just what’s at issue in the first place. A proper causal diagram to use to further the science isn’t going to fall out of the sky. Instead, it must be drawn by specific actors with their own biases, interests, etc. 

This is not to say that any causal diagram is ok to draw and use, or that no single one is better than any other. Ordinary scientific desiderata apply: parsimonious, coherence, fit with other known data. Drawing an arrow from X to Y in the causal diagram says that “some probability rule or function specifies how Y would change if X were to change,” even if we don’t know the rule or function yet (p.45). So you can’t just draw any old arrows. Still, with so many holes, theories about climate (and other complex phenomenon) are very often doomed to underdetermination.

Pearl earnestly points out that “climate models reflect more than a century of study by physicists, meteorologists, and climate scientists… the climate models are strong and compelling evidence” etc. (p 296). But I see no reason why history shouldn’t repeat itself. People with scientific, political, or personal reasons to deny the consensus will band together (just as in the smoking/lung cancer example) to raise trouble around the inevitable confounders and uncertainties that arise. Nothing about a diagram in a scientific paper can change that.

So, for all their elegance as presented in the book, causal diagrams simply aren’t a cut and dried answer to actual empirical messes (even if the whole do-calculus totally works, which is a big assumption). As Pearl admits, what makes the causal diagrams powerful is the assumptions they represent; in a way you get out what you put in. This sounds like a fine bullet to bite in theory, worth doing to unlock the wonders of the do-calculus. But in practice, it means there will be just as much room as there ever was for theory-laden disagreement about causation and everything that flows therefrom (especially in regards to public policy).

For instance, Pearl mentions in passing that tools for quantitative causal reasoning might help us generate an answer to “how much the unemployment rate will go up if we raise the minimum wage.” At the risk of belaboring the much-abused “is-ought” distinction, nothing about that answer will necessarily put an end to the minimum wage debate. Facts about the world are inputs for public policy debate, but the process does not work deterministically.

If the answer is 0, plenty of people will still argue that a wage floor violates employer and employee rights to free contract. Their side of the issue might weaken when not paired with actual bad effects in practice, but it’s not altogether obviated. Likewise, if unemployment really does go up significantly as a result of increased minimum wages, people in favor of the latter will say we should help the unemployed in some other way.

The more general issue is that public policy debates aren’t fundamentally about facts (or counterfactuals!), they’re about values, which might be a feature, and not a bug, depending on who you ask. In my causal diagram of the world, supported by some research, moral beliefs cause factual beliefs at the same time as factual beliefs impact moral ones. Will politics really improve if we shift the subject of debate from controversy regarding speculation on climate change to controversy regarding who drew the diagrams powering the models? This might matter a lot to genuine subject matter experts, but it won’t matter to anyone else, that is if the difference even gets communicated by the popular media at all.

I do not mean to say we should continue to push for totally “objective” means for generating public policy-relevant info. This is the mistake Pearl claims earlier statisticians made in hoping that their science could be totally stripped of subjective influence, and I will not repeat it. Instead, I mean to say that no such thing is ever coming and even if it did people wouldn’t care very much. Policy isn’t primarily suffering for lack of precise models. Instead, policy suffers for lack of proper voter and politician incentives. 

If Pearl’s optimism about do-calculus resolving public policy was a little quick, then his conclusions about free will and his hopeful take on artificial intelligence moved at the speed of light. Pearl is a compatibilist who holds that the problem of free will is just a clash between levels of evaluation: on a molecular level, the physical world is determined by laws of physics. But on a psychological level, choices seem undeniably real — making them real in the relevant sense.

Once artificial intelligences have been programmed with the capacity for counterfactuals, they can exercise the same type of free will as humans by taking their own objects of thought as inputs in causal reasoning. These strong AIs will also be able to communicate well with humans, since humans use causal claims as a powerful tool for education and assessment, including morally.

The upshot of all of this, Pearl adds in passing, is that strong AIs with counterfactual powers can excel at moral reasoning. After all, what is empathy but the ability to counterfactually place yourself in another being’s shoes? With all necessary data inputs and more sheer processing power, AIs can crunch moral causal numbers better than any mere mortal.

As much as I liked most of the book, I can’t follow Pearl that far. It makes sense that robots might be able to choose between two counterfactual outcomes when they are commensurable in some way, like if outcome X from choice A is better than outcome Y from choice B in virtue of being larger.

But how is the robot supposed to choose between actions A and B when outcomes X and Y are not obviously commensurable? Perhaps X includes 10 additional utiles for Alice, while Y results in 10 utiles for Bob. Or X has a higher fairness score, but Y has a greater compassion score. Following G.E. Moore, moral goodness is not reducible to any natural property. To decide how to weight Alice vs. Bob, or fairness vs. compassion, the AI will have to look at something else or receive another input via the causal diagram — and the only option is data on human moral judgment, whether somehow aggregated or selected top-down (i.e. by fiat). As such, a strong AI could get closer and closer to existing humans’ moral judgments (no small feat!), or perhaps decide which humans’ moral judgments were outliers. Admittedly, these would be huge achievements in AI, but even an AI of this kind could not surpass those judgments while arriving at an independently-specified moral truth.

Still, if you have thought about anything related to causality for more than a few minutes, The Book of Why is the big book of causality you didn’t know you needed. Despite being intended for a lay audience, this is not exactly beach reading. But the effort is worth it. You might come away from The Book of Why with the impression that causality and its various problems have in some way been solved. That’s unfortunately not quite true. Still, The Book of Why will shed plenty of light on how we got to where we are now, causally, and where we might go from here, counterfactually.