Video 5 of 6 of the Understanding Statistics guide

How Are Two Things Different? 37:19

Professor Davies explains how to interpret the results of a difference-of-means test and a simple regression analysis.

Transcript

Antony Davies: I want to turn my attention now to some practical matters of calculating statistics. One of the very common types of questions that statistics can answer is how are two things different. Let me show you some examples. We have here the unemployment rate for Pennsylvania over time. January ‘05 through July 2015. You can see the [00:00:30] unemployment rate is sometimes low, it’s sometimes high. You can see the big bump from 2008 for a recession and then coming back down again. That’s the unemployment rate over time for Pennsylvania. Here’s the unemployment rate over time for New York. It’s a similar thing. It’s low. It goes back up again. It’s high, comes back down. An interesting question this statistics can address is which state has the lower unemployment rate? Or is there a difference between the unemployment rates at all?

[00:01:00] One way to approach this question, and it’s the naïve approach, is to simply take the average unemployment rate for Pennsylvania over this period, the average unemployment rate for New York over the period, and whichever one is lower, that’s the state that had the lower unemployment rate. What that approach ignores is the fact that the unemployment rate itself is what we call a stochastic variable, that is, it has a random component to it. [00:01:30] I can look up the unemployment rate for Pennsylvania right now and what I get is not exactly the unemployment rate. There’s noise for all the reasons that we’ve talked about earlier, of how the thing is measured, what it’s measuring, who’s measuring it, all of this stuff. What I get back is not really the unemployment rate for Pennsylvania, but one view. It’s a dirty view of this unemployment rate. The thing [00:02:00] I want to measure, the unemployment rate, I’m measuring with some error. There’s noise in the system.

Similarly, with New York, if I want to New York’s unemployment rate, again, there’s noise in the system. The way we describe noise is with standard deviation. Standard deviation, if you picture the unemployment rate, is this number that vibrates. Sometimes it’s high, sometimes it’s low. The degree of vibration around this unemployment rate, this is the standard deviation. I can [00:02:30] use the standard deviation of the unemployment rate for Pennsylvania and the standard deviation of the unemployment rate for New York, and I can employ them in answering the question, which of these states has the greater unemployment rate?

If we look at a number line, you could see the two unemployment rates here. Pennsylvania’s averaged over this period 6.4%. New York averaged whatever that is, 6.7%. Look at this. That is plus or minus one standard deviation [00:03:00] for Pennsylvania’s unemployment rate. Very roughly speaking, you could imagine that Pennsylvania’s unemployment rate, yes, it averaged 6.4%, but you could find it as low as 4.8, you could find it as high as 7.9. It fluctuated around this range. New York’s unemployment rate, similarly, on average it was 6.7%, but depending on what month you’re talking about, you could find at low as five percent or as high as 8.3%. [00:03:30] It wanders around.

If you overlap the two, notice something. The overlaps are almost the same. Pennsylvania’s unemployment rate wanders around the same kind of area that New York’s unemployment rate does. This overlap of the wandering around, we can take into account and when we do all of our statistical analysis, we would say something like, “There’s not [00:04:00] a strong statistical difference between the unemployment rate in New York and the unemployment rate in Pennsylvania. There’s not a strong statistical difference in the unemployment rates. What I’ve said now are two apparently contradictory things. The first thing is that the average unemployment rate in Pennsylvania is 6.4, the average unemployment rate in New York is 6.7. First statement, the average unemployment rate in Pennsylvania is lower.

The second thing, [00:04:30] which kind of sounds like it contradicts that, is that when you account for the variation in the unemployment rates, there isn’t much difference between them at all. The way we rectify this apparent contradiction is by talking about what economists call P values. What P value measures is the likelihood that the data you’re observing coincides with or confirms is [00:05:00] not contradictory to the thing that you believe. Does the data match your belief? In this case, my belief is that there’s no difference between the unemployment rate in New York and the unemployment rate in Pennsylvania. If I get a low P value, it means that the data’s contradicting my belief. If I get a high P value, it means that the data is not contradicting my belief.

There’s more that goes into it than that, but as a rough gut level approximation, that’s [00:05:30] not a bad way to think of P values. P values you’ll see repeated. Every time you see them, think of them as this statistician’s measure for the extent to which this data that I’m looking at is in agreement with the thing I believe to be true to begin with. The technical term that statisticians use the null hypothesis. The null hypothesis is the thing I’m believing or I’m assuming to be true. It gets a little more complicated because this statistician will [00:06:00] also state what he calls the alternative hypothesis, the thing that he assumes will become true if the null hypothesis isn’t true and you can see we’re getting way down in the weeds.

At a very high level view, this P value, you can think of it as accompanying this null hypothesis or the thing I believe to be true. The question is, I’m going to look at the world, I’m going to assume that something’s true, my null hypothesis, and ask the question do I see data out there that confirms [00:06:30] or denies this thing? That P value is what’s measuring this. High P value is confirming what I’m already assuming to be true. Low key value is rejecting or contradicting this thing I believe to be true.

Student: What types of hypotheses can you set as a null hypothesis? Can it just be any claim or does it have to be something like there is no relationship or there’s no significant difference? Null implies a not. Is your default usually no relationship, no significant difference, [00:07:00] that sort of thing?

Antony Davies: The default tends to be, and there are mathematical and statistical reasons that cause you to form your hypothesis a certain way. Usually, the null hypothesis or the thing that you’re assuming is that there’s no difference. There’s no difference between Pennsylvania and New York, or you assume that there’s no effect, that if I increase stimulus spending, I should see no effect on the economy. The null hypothesis, generally speaking, is [00:07:30] a no effect or no difference sort of assumption.

We can look at Pennsylvania and New York and you see how the unemployment very niche. You can see there’s a tremendous amount of overlap here, which you get a gut level from what we described of P values, would indicate that the P value here should probably be high. My going in assumption is there isn’t much difference between Pennsylvania’s and New York’s unemployment rates, and indeed, the overlap is quite significant [00:08:00] here. This would be like having a high P value. In fact, if you run the statistical test, you do get a high P value here. The statistician would look at the data that we’ve seen and say, “There is no compelling evidence here that New York’s unemployment rate is markedly different from Pennsylvania’s.”

Let’s look at Pennsylvania and California, which you see here. Again, there’s overlap. Look at California’s. It’s markedly different. On the low end, it goes [00:08:30] over the same period California’s unemployment rate dropped to 5.5% and on the high end it was up around 11.2. It’s fluctuating around this range. That range appears to markedly different from Pennsylvania’s. It’s not just that Pennsylvania’s average is different than California’s average, but when you account for the range over which the unemployment rate would tend to wander on average, those ranges are different. In fact here, you do get [00:09:00] a very low P value. My going in belief here would be let’s assume that Pennsylvania’s unemployment rate is the same as California’s, and let’s run some statistical tests. You run the tests, you get a very low P value, indicating that the evidence is contradicting this assumption of yours. My going assumption is that the unemployment rates are the same, the evidence contradicts this, and get a very low P value.

If you’re interested in this sort of thing, I encourage [00:09:30] you to, as a contrary view, take a look at some of Deirdre McCloskey’s work, who Deirdre McCloskey doesn’t like P values and makes some interesting arguments why economists should just stop talking about them. Being a statistician and an economist, this is what I do, so I’m going to talk about them. P values range over this range of zero to one. By the way, I’m going to take a moment out here. This will be as technical as we get as we talk about stats. The reason for this is you see P values repeating themselves over and over again. [00:10:00] It’s something that common people should get into their vocabulary because what people often do, and we’ll talk about this later on, common people, non-statisticians, will talk about correlation. Often times, they will talk about correlation when really, they should be talking about P values. If there are two things that the average man on the street needs to know about stats, I’d say those are the two. The correlation and the P values.

Here are P values. They range [00:10:30] from zero to one, and generally speaking, what we’re doing is walking in with this hypothesis that we’ve got two things that are the same or we have some effect that there’s no effect here. That’s our going in assumption. No effect or no difference. We look at the P values. If you get a very low P value, below one percent, we say, “This is very strong evidence. Very strong evidence from the data that your hypothesis is incorrect.” A P value between one percent and five [00:11:00] percent we say is strong evidence. There’s strong evidence here that your hypothesis that there’s no difference or no effect, that this hypothesis is wrong.

Economists tend not to use them, but people in other business fields will, P values in the five to 10 percent range. They say, “This is weak evidence that the data’s contradicting you.” Again, the question is to what extent is the data contradicting your hypothesis? Five to 10 percent we’d say is weak evidence. If you get [00:11:30] above 10%, we say, “There’s just no evidence in the data to contradict your initial belief.”

You might see something that’s common here, this two-pronged approach of we have initial belief and we look at P values and ask do they contract these things. What you’re seeing is the scientific method at work. We learn in grade school the scientific method is you form a hypothesis, you collect some [00:12:00] data, and you ask, “Does the data confirm or deny your hypothesis?” That’s exactly what’s happening here. The difference with economics and social sciences is that we find it very hard, not impossible, but very hard to conduct experiments. We have to look at data that already exists. Yet, even though it is hard, sometimes impossible to conduct experiments, we can still use the scientific method by walking in the door saying, “I hypothesize something,” and then asking, “Does the data support [00:12:30] or refute my hypothesis?”

What can we do with this? We see some interesting things. We talked about inequality earlier. What you see here is income inequality measured by the Gini coefficient. For countries that are less economically free and more economically free. We’re measuring economic freedom by the Fraser Institute, which asks questions like what percentage of [00:13:00] your economy is due to government spending, the question you asked earlier. How much of your economy is sucked up by the government in taxes? What’s your top marginal income tax rate? How much restrictions are there on labor markets? Can you go get a job wherever it is that you want or is the government going to interfere in your negotiation with your prospective employer? What kind of protections are there for property rights? What kind of [00:13:30] rule of law is there? Is the money sound or is your central bank inflating it away? These sorts of questions.

Fraser asks a whole suite of questions like this of every country and puts on the country a stamp that represents the amount of economic freedom that’s there. What you see are all the countries that report 127, I believe, split into two groups. On the right we have those that are above the median for economic freedom, and on the left, [00:14:00] those that are the below the median for economic freedom. The size of the bar vertically is the amount of income inequality within that country. You can see over on the left, the second bar in, has tremendous income inequality. You move further in, you have some countries that have much less income inequality. The question here is, is it the case that countries that are more economically free have less income inequality?

This is [00:14:30] a fascinating statistical question, because like the unemployment rate, inequality is this stochastic process. It’s a random thing. Sometimes it’s high, sometimes it’s low, it could be high for various reasons and have nothing to do with economic freedom. It could be low for reasons that have nothing to do with economic freedom. It fluctuates like the unemployment rate does. What we’ve done is split into two parts. Like we had New York and Pennsylvania, now we have less free countries, more free countries. We asked the question, [00:15:00] on average, am I seeing a difference in income inequality for the less free versus the more free?

I’m going to walk into this problem with the null hypothesis, my initial belief that there is no difference. My null hypothesis is income inequality is the same for less free countries as it is for more free countries, on average. We can conduct a statistical test, and if you conduct the statistical test on this, what [00:15:30] you get is a P value of .0002. Incredibly low. It falls way into the range of very strong evidence that the data rejects my hypothesis. My hypothesis going on is that there’s no difference in the average inequalities here. I look at the data and the data comes back very strongly and says, “No, that hypothesis is wrong.” You can tell by looking at the bars. If you calculate the averages, [00:16:00] what you find is the average income inequality for the more free countries is lower than is the average income inequality for the less free countries.

I would walk away from this picture with the conclusion that there appears to be strong evidence that economic freedom is associated with less income inequality. Notice my choice of words here. This is very important. Economic freedom is associated with less [00:16:30] income inequality. The statistical test doesn’t tell me anything about causation. It’s not telling me that one causes the other. All I know walking away is that when I look at the more free countries, the income inequality on average is less. When I look at the less free, the income inequality on average is more. We use this term they’re associated.

Some fun stuff you can do is look at some other things. This is gender inequality. The interesting thing here is while Fraser is providing [00:17:00] the economic freedom measures that let us divide the countries into more free and less free, the United Nations is providing the gender inequality data. When the U.N. measures gender inequality, it doesn’t ask any questions about economic freedom. When Fraser measures economic freedom, it doesn’t ask any questions about gender inequality. Yet, if you cross reference these two completely separate data sets, you get what you see here, which is that [00:17:30] the more free countries exhibit less gender inequality than the less free countries do. The more free countries exhibit less gender inequality than the more free. Again, you get a very small P value here indicating that you walk in with the hypothesis that there is no difference between the less free and the more free, the statistical test tells you the data strongly rejects that hypothesis. There seems to be a very significant difference.

Student: What was the source [00:18:00] of the data for the first graph?

Antony Davies: The first graph, income inequality comes from the United Nations, and again, it’s this interesting thing, the United Nations does not consider economic freedom when it evaluates income inequality, and Fraser does not consider income inequality when it evaluates economic freedom. They’re two completely separate data sets, and yet you find this phenomenon, that the more free countries have less income inequality, the less free have more income inequality.

We’ve talked about using statistics to determine whether [00:18:30] two things are different from each other, so unemployment rate in Pennsylvania versus unemployment rate in New York. Inequality in less free countries, inequality in more free countries. This detecting differences in things that we’ve been talking about is called a difference in means test. It’s useful for determining whether two things are different, but it’s not useful for determining the relationship between two things. The relationship [00:19:00] between two things is a much richer thing to talk about. I’ll give you an example. We’ve seen here the countries split into two parts, the more free and the less free for gender inequality. Here’s the same data, but presented slightly differently.

What you see here is economic freedom measured across the horizontal access, so the right is more economic freedom, to the left is less economic freedom. Up and down is gender inequality. Up is more inequality, down is less inequality. [00:19:30] You see these dots. Every dot is a country. There are two things. First off, there’s noise in the system. There are dots all over the place. There’s also a trend. On average, as you move to the right, the dots fall. There are exceptions, but it’s the underlying trend that’s interesting. Remember, we talked about that when you see a phenomenon in the social sciences, in economics and psychology [00:20:00] and so forth, what you’re seeing is human behavior, which is comprised of two components.

One component is a random component. Humans will sometimes just do random things. The other is a more deterministic component. It’s a component that causes the human to do certain things based on his surroundings. For example, if I go to the coffee shop in the morning and I always buy a bagel, [00:20:30] and today the price of bagels has gone up from what it was yesterday. There are two things playing on me. One is my desire for the bagel or the habit that I always buy a bagel, this sort of thing, or maybe I had breakfast or didn’t have breakfast earlier and that influences whether or not I like the bagel. Those are random things. That’s one thing that plays into my decision to buy the bagel or not.

The other thing that plays into my decision to buy the bagel or not is the fact that the price has gone up. What economics [00:21:00] tells us is when the price goes up, people buy less of the thing. That’s true on average. It’s not necessarily true in every single case. In this case, maybe I really want the bagel and the price went up and it doesn’t matter, I’m still going to buy it. This is not a violation of the laws of economics, rather, you’re looking at one person. The laws of economics apply in the aggregate. Although, I might still buy [00:21:30] the bagel anyway, you might not. Maybe he would, but he wouldn’t. Maybe he wouldn’t either.

What happens is, when you put us all together, in total, the number of bagels we buy declines. What’s important is to keep in mind is, there’s this randomness when you look at the individual observations, the individual people or the individual countries. There’s also this underlying trend. The underlying trend is the deterministic part, the thing that we, as economists or statisticians were interested [00:22:00] in seeing. Part of, and you’ll see this theme recurring over the next couple of topics, one of the things that we’re very much interested in is taking this messy, dirty data world and stripping out the noise. Blow it away so that what we see underlying is the phenomenon, the behavior that we actually want to to examine.

What you’re seeing in this picture now is the same data, the economic freedom, the gender inequality, but now we’ve shown [00:22:30] each dot as a country and you see this trend that as you go to the right, more economic freedom, you get less gender inequality. I can super impose on top of this cloud of data points a line. This line, some people call it a trend line, the statisticians call it a regression line. This is a line that comes as close as possible to approximating the data points that you see here. Think of it as the underlying average trend in the data as you move to the right, the line [00:23:00] declines. What this line gives us is this thing we didn’t have before. It’s not just that more economically free countries have less gender inequality. This line can tell us by how much. This line tell me the relationship between economic freedom and gender inequality.

The way we go about getting this line is we start off by saying, “Let’s propose that there is a relationship [00:23:30] between gender inequality and economic freedom and that that relationship can be shown mathematically.” I propose this thing and you see here, gender inequality is A plus B times economic freedom plus U. What are these things? A and B are numbers. We’re going to put them to the side for the moment, but it basically it is what is giving me some function that relates economic freedom to gender inequality. U is the noise in the system. It’s all the things that [00:24:00] would affect gender inequality other than economic freedom.

If I put all this together, what I have here is a proposed, now my proposed relationship may be faulty, but it’s the relationship I propose exists between economic freedom and gender inequality. You take economic freedom, you multiply it by some number, and you add some other number to it, and that plus the noise in the system gives you gender inequality. Our [00:24:30] goal as statisticians, is to find values for A and B that cause this line to come as close to the data points that we have as possible. There are technical things that go on in the background, but for the moment, if we run this thing through the appropriate statistical analyses, what will come back is this value for A and this value for B. The regression analysis is what we call it, would come back and tell me the [00:25:00] best value for A is 1.38. The best value for B is negative .15. Those two values will give you the red line that comes as closest to the data points, as many data points as possible.

What we walk away with then, is the idea that there are two things happening here in the relationship between economic freedom and gender inequality. The first is that economic freedom is associated with gender inequality [00:25:30] according to this equation that we see here. The other thing is, oh yes, and there’s all that noise that I can’t account for but it’s noise on the back end. What happens is when I blow away the noise, this is the relationship that I find. A couple of things that come out of this. First, there will be what many people are used to talking about, what people refer to as the R squared. The technical term is the multiple correlation coefficient. [00:26:00] People would use the term correlation or many people, even non-statisticians will talk about the R squared and they’ll ask questions like, “You’ve got this relationship between economic freedom and gender inequality, but what’s the R squared?”

The R squared, or the multiple correlation coefficient, ranges from zero to one. The higher it is, the more closely the data points fall on your trend line. [00:26:30] Right now, you can see that the data points, they’re a cloud around the trend line. The R squared in this model is .44. If you could imagine, picture those data points getting closer and closer and closer to the trend line, that would be the R squared going up and getting closer and closer to one. As the data points moved away from the trend line so they start to become this amorphous cloud, that’s the R squared going down towards zero. R squared is one thing we want to look at.

Another thing we want to look at is the value of that [00:27:00] variable B. Actually, we call it a parameter. Parameter means this is a value in the regression model that does not correspond to data, but it’s something we’re going to calculate. B in this case, we would call a parameter. This value of negative .15, what does it mean? What it means is on average, there are exceptions, but on average, a one unit increase in economic freedom is associated [00:27:30] with a .15 unit decline in gender inequality. On average, a one unit increase in economic freedom is associated with a .15 unit decline in economic inequality.

That parameter B, we get the value of negative .15 for it, is telling me about the magnitude of the relationship between economic freedom and gender inequality. There’s a third number [00:28:00] that isn’t listed here. It’s a P value. It’s the same kind of P value we’ve talked about before. We talked about P values as measuring the extent to which data contradict my hypothesis. I walk in with my hypothesis that Pennsylvania unemployment is the same as New York unemployment. A low P value says the data contradicts that hypothesis. There does seem to be a difference here. Here for regression analysis, the hypothesis is always that your parameter is zero. [00:28:30] In other words, our hypothesis here would be that the true value of B is zero. That is, there is not relationship between economic freedom and gender inequality.

A value of B, of zero, means there’s no relationship between economic freedom and gender inequality. That’s our going in assumption. When we look at the data, what comes back is a P value of .00001 something, a very low P value. Meaning [00:29:00] that the data contradict that hypothesis. The data contradict my claim that there is no relationship between economic freedom and gender inequality. Let’s pause here for a moment because we’ve discussed three measures, and these are three key measures that we’re going to take away from a regression analysis. First is the P value, which is measuring the significance of the relationship. High P [00:29:30] value, there’s not any relationship here. Low P value, there appears to be a relationship. The P value is measuring the significance of the relationship.

The value of B, the B parameter, that’s the magnitude of the relationship. Changes in economic freedom are associated with what kind of changes in gender inequality. Big changes, small changes, positive changes, negative changes. The B parameter gives me that, the magnitude of the relationship between economic freedom and [00:30:00] gender inequality. Last, the R squared, which you might think of as the precision of the relationship. I’ll give you an analogy that ties these three together. You’re at a party and you’re at one side of the room. There’s lots of people in the room and they’re making lots of noise and singing, dancing on the tables, and what not. Your friend is over there by the door, and you notice that you’re running low on beer. You yell to [00:30:30] your friend, “Get more beer.”

First, your friend’s over there and lots of things are hitting his ears. The noise of the people, dancing, and the music, and this table cracked because someone stepped on it wrong. One of the things that hits his ears is your voice yelling, “Get more beer.” R squared measures what fraction of all the noise that hits his ears is coming from your voice. [00:31:00] A high R squared would mean that the noise in the room is very low. Most of what hits his ears is your voice. A low R squared would mean that the room is very noisy, so a small percentage of the total that hits his ear is your voice. A large percentage is other things.

Notice right off, the R squared, a low R squared [00:31:30] doesn’t mean that there isn’t a relationship. It simply means that there are other things included in the relationship other than your voice. R squared. The P value is like measuring whether your friend understood what you said. Just because he hears my voice doesn’t mean that he [00:32:00] heard and understood my words. I could’ve been slurring or speaking a language he doesn’t understand, or maybe I’m not slurring and I’m speaking English, but the noise in the room, and people are saying things such that he’s only catching every other word. The P value is attempting to measure how much of the words that I am saying he is understanding as opposed to hearing. How many is he understanding [00:32:30] as opposed to hearing.

Finally, the value of B, this parameter, measures the magnitude of the relationship. That’s like measuring the extent to which my words have spurred him to action. If he hops up and runs out the door to buy more beer, I have spurred him to action. That’s like having [00:33:00] a large B. If I don’t spur him to action, that’s like having a small B. The magnitude of the relationship isn’t that large. Notice three things going on. I’m yelling to my friend. First he’s got to hear me, second he’s got to understand what I’ve said, and third, I want to spur him to action. Those are these three measures. The R squared, the P value, and the magnitude of B.

I make a [00:33:30] big point about this because many people who don’t know statistics but do, at a gut level, understand R squared, get fixated on it. I’ll show a picture like this one that has the economic freedom and gender inequality and people will say, “Yes, okay, that looks very good, but what’s the R squared?” What they really mean is, “Is this relationship significant between these two or is it simply noise?” [00:34:00] The answer is the relationship between freedom and gender inequality, significant or is it noise, isn’t answered by the R squared. It’s answered by the P value. The P value, did you understand what I said, tells me whether economic freedom actually does have some relationship with gender inequality. The R squared simply tells me are there lots of other things that also influence gender inequality. [00:34:30] There may be lots of other things that also influence gender inequality. The question I’m asking here is, does this specific thing influence it? That’s a P value question.

Student: In this specific equation, what does the A stand for and why there is no U in the second equation?

Antony Davies: Both good questions. The A in this example, in some regression models, the A has more meaningful meaning [00:35:00] and others has less meaningful meaning. This one is one of these cases where it has somewhat less meaningful meaning. Here, A would be the gender inequality we would expect to exist in the absence of economic freedom. In other words, if economic freedom were zero, then our regression model predicts that the amount of gender inequality you should see is simply A. That measure isn’t very useful here in this particular example, but we’ll look at an example later in which A has a really nice [00:35:30] interpretation. Whether or not it has interpretation depends in part in the data you’re looking at.

Your other question, what happens to U? We’re looking at two equations. The top one is the regression model. In other words, this is the thing I walk into the room assuming that there is this relationship and I want to measure it. This equation says gender inequality is A, I don’t know what A is plus B, I don’t know what that is yet either, times economic freedom plus [00:36:00] U. U is all the things other than economic freedom that might influence gender inequality. Picture that equation as corresponding to the blue dots. The second equation is the estimated regression model. That is, when I run my data through my statistical package, the statistical package comes back and says, “Our best guess for A is 1.38 and our best guess for B is negative .15.” Our estimated regression model says this, we expect or we estimate [00:36:30] gender inequality to be 1.38 minus .15 times economic freedom.

Where’s the U? The U has disappeared because our whole purpose here was to brush away the noise so that all we’re left with is the relationship between economic freedom and gender inequality. The U is gone because the U is the noise. Our whole goal was to brush it away, to get rid of it so we could see this underlying relationship.