Video 1 of 6 of the Understanding Statistics guide

Why Do We Use Statistics? 15:02

Antony Davies distinguishes stochastic relationships from deterministic ones and explains why we need statistical tools to understand the former.


Antony Davies: Before we start talking about statistics, I think it’s important to ask the question why do we use statistics at all? These are painful things, nobody likes to study … well, some people like to study them. Generally, people aren’t interested in these things, right? Why mess with them? The reason we mess with statistics is because the world is divided in to two sets of relationships, [00:00:30] and we tend to think in terms of one set, ignoring the other.

Here’s what I mean by that. There’s virtually all laws of psychics are what we call deterministic relationships. That is they always hold, right? We’ve got Bernoulli’s Law, this is the thing that enables us to fly in airplanes. If you have an airplane wing, and it’s going through the air at a certain speed, by going through the air at a certain speed, the air goes over it and creates pressure [00:01:00] and the airplane lifts. If air pressure and temperature is held constant, this wing, which goes through at a certain speed, gets a certain lift every single time. It works this way all the time. If you increase the speed, you get more lift all the time. This is a determinist relationship. It’s always true, in every single instance. This is what we’re used to dealing with, right? You drop a glass, and it you know, drops to the floor at a certain speed and it smashes. [00:01:30] That happens every single time you drop the glass, this is the way we’re used to dealing with life, determinist relationships.

That’s well and good for things like the laws of psychics, but it doesn’t apply to the laws of economics. Laws of economics are stochastic relationships. Stochastic relationships are very different. They’re true on average. They’re not true in every instance. For example, [00:02:00] if you raise the minimum wage, it is the case that on average, you’ll get less employment. Now, what happens is it doesn’t happen in every single case, you might have a friend who you know, when the minimum wage didn’t rise, he lost his job. When the minimum wage did rise later on, he didn’t lose his job. You point to your friend and say well look, this minimum wage nonsense must not apply, because I know people who did not lose their jobs when the minimum wage went up. [00:02:30] The reason they didn’t lose their jobs when the minimum wage went up, is not an indictment of minimum wage, rather what’s going on is, you’re dealing with a stochastic relationship. That it’s true on average, even if it isn’t true every single time, right?

The good example of the difference between the two, determinist relationship is four is greater than three. Four is always greater than three, right? It doesn’t matter what your circumstances are, what the temperature is outside, whether or not you woke up on the left side of the bed. [00:03:00] Four is always greater than three, it’s a deterministic relationship. If you ever find in instance in which four is not greater than three, just one example, you could disproven the relationship. That’s deterministic. A stochastic relationship: dogs are bigger than cats. Generally, that’s true. Now you can find some cats that are particularly large, and some dogs that are particularly small. You could point to this example and say look at this, this is gargantuan cat and a very small dog. [00:03:30] I’ve disproven your statement that dogs are bigger than cats. No you haven’t, because the statement dogs are bigger than cats is a stochastic statement. It talks about a stochastic relationship. On average, dogs are bigger than cats even though you’ll find some examples in which it’s not true.

All right, why is this matter? Well, it matters because to understand stochastic relationships, we have to do more than simple observe what’s happening. We have to process data about what’s happening, and that’s the [00:04:00] nastiness of stats. The reason we have to do this is because our usual method of talking about relationships is anecdotes, right? We use anecdotes, we say things like well, you know the tax rate went up and because of that, I can see that I have less money in my checking account, this is an anecdote. The minimum wage went up and my uncle lost his job, this is an anecdote. Anecdotes [00:04:30] work very well when you’re talking about deterministic relationships, because the deterministic relationship is true in every single instance. Every anecdote you find, is a perfect encapsulation of this deterministic relationship.

Stochastic relationships are only true on average. They might be true in some specific cases, right? Other cases they may not be, but they’re true on average, which means that anecdotes really don’t work. You can pick an anecdote that fits [00:05:00] the stochastic relationship, you can also pick an anecdote that doesn’t fit the stochastic relationship. This is the thing people … often when they talk, debate back and forth about economic issues, one side or the other at some point will accuse the other one of cherry picking the data. That’s what cherry picking means. That I’ve picked an anecdote that happens to support my position, right? Anecdotes, when it comes to stochastic relationships, are very useful for [00:05:30] illustrating the relationship, but they aren’t at all good for understanding the relationship. To understand the relationship, we need lots of data and we need to apply the rules of statistics to analyze this.

All right. One of the things that happens when we talk about statistics is that we run in to a road block, that statistics is discussed in the language of mathematics. But when we talk about things to which statistics [00:06:00] apply, we use English, right? We say things like China’s economy is bigger than the US economy, right? Some people don’t care about that, some people get bent out of shape about it, right? But we say this thing, and we say it in English. China’s economy’s is bigger than the US economy. What we’re missing when we say this in English, are the particular nuances that statistics in the language of mathematics can convey, but the language in English has problems with. For example, it is the case [00:06:30] that if you add up all of the productivity that goes on in China, the total productivity is somewhat larger than total productivity in the US, but if you measure the productivity on a per person basis. Take the total economic productivity in China, divide it by the number of Chinese workers. Take the total amount of economic productivity in the United States, divide it by the number of American workers. What you find now is the numbers are very different, right? American productivity per person is something like $ [00:07:00] 54,000. The average American worker, generates about $54,000 worth of goods and services. The average Chinese worker generates about $13,000 worth of goods and services.

What’s happening here? What’s happening here is we have to be very careful when we talk about statistics, to remember that we’re talking about them in English, but their native language is mathematics. Be careful when you say things like the Chinese economy’s larger than [00:07:30] the US economy. Be careful to underline precisely what you mean by that, right? That’s going to take more than just a single sentence. You’ve gotta say things like well, I’m talking about on a per capita basis and I’m adjusting for differences in costs of living, right? All of that’s nasty stuff that cause peoples eyes to glaze over, but it’s important in understanding what it is you’re really trying to convey.

People—and when I say people, I point the finger largely at the media here—tend to be [00:08:00] comfortable talking about averages. Even talking about medians, right. People tend to understand that the average income in the US is somewhat different than the median income in the US, and that there’s a fundamental difference in these things that deserves note, right? If we have the five of us, six of us in a room, and Jack’s income is $2,000,000 a year, our incomes are all $50, [00:08:30] 000 a year. On average, the average income in this room is quite high. Of course the problem is well, that’s not really indicative of what’s going on here, right? Because what’s happening is Jack’s this one guy with a very high income, this is dragging up the numbers. Rather than talking about averages, we can talk about medians. With median, we line us all up, poorest person to richest person and take the guy in the middle. If we’re all earning $50,000 and Jack over here is earning $2,000,000, the guy [00:09:00] in the middle … well he’s a guy with a $50,000 income. Jacks $2,000,000 income doesn’t affect this media, right? The median in some sense, under some circumstances is more indicative of what it is we’re trying to convey. Peoples you know, average income.

The media and people in general, tend to be comfortable talking about averages and even on occasion, talking about medians. What you find people almost never [00:09:30] talking about, outside of statisticians is standard deviation or variance. Standard deviation or variance tells us the degree to which a random phenomenon wanders away from its average. Give you an example, picture that you have a guitar string and you pluck this guitar string. What happens? If you look at it, you see it vibrating, right? It’s moving very fast, [00:10:00] and what you see is it kind of forms this pattern of this … you can see the edges where it vibrates to left and right and you know, you can see kind of the center point where it is and it’s kind of a blur. That center point, that’s the average. You see the string vibrating back and forth, on average it’s location is the center point. The blur around it, the blur is the standard deviation. It turns out this blur, the standard deviation becomes under some circumstances, incredibly important. [00:10:30] Sometimes more important than the average.

Give you an example. The average temperature in February in Fairbanks Alaska is about -2 Fahrenheit. The average temperature on the moon is 5 Fahrenheit. If you look at those two averages, you would conclude that the environment on the moon, absent the fact that there’s no air, is more hospitable than the environment in [00:11:00] Fairbanks Alaska, right? We’re looking just at the averages. Well here’s what the averages don’t tell you. In February, in Fairbanks Alaska, the temperature varies … of course the averages too, but it can vary as high as maybe you know, 15 degrees and maybe as low as -20, right? The averages is three and you’ve got this plus or minus 10, 15 degrees on it.

The plus or minus represents … it’s kind of like that vibrating guitar string. That usually, we find the February temperature [00:11:30] in Fairbanks, Alaska to be -2, sometimes a little bit higher, sometimes a little bit lower. But it’s in this range, -2 plus or minus you know, 10, 15, 20 degrees. On the moon, the average temperature is five, but it varies by about 250 degrees. On you know the highs, you’re getting 250 degrees Fahrenheit, on the lows you’re getting -200 some degrees Fahrenheit. What you see is, although the average temperature on [00:12:00] the moon is higher than the average temperature in Fairbanks, Alaska, when you account for the standard deviation, you find that the environment on the moon is way less hospitable, right? In Fairbanks, you’re going to be incredibly uncomfortable, but you’ll survive. On the moon, no. The difference is not the averages, it’s the standard deviations. The degree to which this thing we’re measuring, vibrates around its mean.

Student: I’m no physicist, but I’m curious if it’s a contentious subject to say [00:12:30] that laws and physics are deterministic in a sense.

Antony Davies: Yeah, that’s a good question. I’m not a physicist either, right, but I can’t tell you a little bit about it. There are relationships in the physical world, that are not deterministic, they’re stochastic. In fact, a good example of this is the field of fluid dynamics. The set of physical rules that describe how fluids behave [00:13:00] as they’re flowing through whatever it is that fluids flow through, right? Here’s the interesting thing, the laws of fluid dynamics work very well. It’s like Bernoulli’s Law, right? The equations predict in a deterministic fashion what’s going on, provided that your fluid has lots of molecules in it. If you have a bit of fluid that has very few molecules in it, right like a thimble full of water or something. You’ll find that the laws of fluid dynamics start to break [00:13:30] down. Not because the laws don’t apply, but rather because the random behavior of the individual molecules, starts to become more noticeable because you have fewer of them.

This is an interesting thing with economics when people say, well you know, the problem with economics is it’s really not a science because people behave randomly. To which as an economist I reply, it is and people do behave partially randomly, but they also partially [00:14:00] behave in predictable ways. When you get lots of humans, like having lots of molecules of the fluid, the individual random behaviors, cancel out. What you start to see more predominantly is the underlying predictable behavior, similarly with the fluids. Yes, I described it as this binary thing of relationships that are either deterministic of stochastic, in [00:14:30] fact there’s a continuum. Physical laws tend to fall over here, social science laws tend to fall over here, but there’s also a point where we start to overlap depending on how many people you’re dealing with.