Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Use regression analysis to describe the relationships between a set of independent variables and the dependent variable. Regression analysis produces a regression equation where the coefficients represent the relationship between each independent variable and the dependent variable. You can also use the equation to make predictions.

As a statistician, I should probably tell you that I love all statistical analyses equally—like parents with their kids. But, shhh, I have secret! Regression analysis is my favorite because it provides tremendous flexibility, which makes it useful in so many different circumstances. In fact, I’ve described regression analysis as taking correlation to the next level!

In this blog post, I explain the capabilities of regression analysis, the types of relationships it can assess, how it controls the variables, and generally why I love it! You’ll learn when you should consider using regression analysis.

Related post: What are Independent and Dependent Variables?

Use Regression to Analyze a Wide Variety of Relationships

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Regression analysis can handle many things. For example, you can use regression analysis to do the following:

These capabilities are all cool, but they don’t include an almost magical ability. Regression analysis can unscramble very intricate problems where the variables are entangled like spaghetti. For example, imagine you’re a researcher studying any of the following:

  • Do socio-economic status and race affect educational achievement?
  • Do education and IQ affect earnings?
  • Do exercise habits and diet effect weight?
  • Are drinking coffee and smoking cigarettes related to mortality risk?
  • Does a particular exercise intervention have an impact on bone density that is a distinct effect from other physical activities?

More on the last two examples later!

All these research questions have entwined independent variables that can influence the dependent variables. How do you untangle a web of related variables? Which variables are statistically significant and what role does each one play? Regression comes to the rescue because you can use it for all of these scenarios!

Use Regression Analysis to Control the Independent Variables

As I mentioned, regression analysis describes how the changes in each independent variable are related to changes in the dependent variable. Crucially, regression also statistically controls every variable in your model.

What does controlling for a variable mean?

When you perform regression analysis, you need to isolate the role of each variable. For example, I participated in an exercise intervention study where our goal was to determine whether the intervention increased the subjects’ bone mineral density. We needed to isolate the role of the exercise intervention from everything else that can impact bone mineral density, which ranges from diet to other physical activity.

To accomplish this goal, you must minimize the effect of confounding variables. Regression analysis does this by estimating the effect that changing one independent variable has on the dependent variable while holding all the other independent variables constant. This process allows you to learn the role of each independent variable without worrying about the other variables in the model. Again, you want to isolate the effect of each variable.

Regression models help you prevent spurious correlations from confusing your results by controlling for confounders.

How do you control the other variables in regression?

A beautiful aspect of regression analysis is that you hold the other independent variables constant by merely including them in your model! Let’s look at this in action with an example.

A recent study analyzed the effect of coffee consumption on mortality. The first results indicated that higher coffee intake is related to a higher risk of death. However, coffee drinkers frequently smoke, and the researchers did not include smoking in their initial model. After they included smoking in the model, the regression results indicated that coffee intake lowers the risk of mortality while smoking increases it. This model isolates the role of each variable while holding the other variable constant. You can assess the effect of coffee intake while controlling for smoking. Conveniently, you’re also controlling for coffee intake when looking at the effect of smoking.

Note that the study also illustrates how excluding a relevant variable can produce misleading results. Omitting an important variable causes it to be uncontrolled, and it can bias the results for the variables that you do include in the model. This warning is particularly applicable for observational studies where the effects of omitted variables might be unbalanced. On the other hand, the randomization process in a true experiment tends to distribute the effects of these variables equally, which lessens omitted variable bias.

Related post: Confounding Variables and Omitted Variable Bias

How to Interpret Regression Output

To answer questions using regression analysis, you first need to fit and verify that you have a good model. Then, you look through the regression coefficients and p-values. When you have a low p-value (typically < 0.05), the independent variable is statistically significant. The coefficients represent the average change in the dependent variable given a one-unit change in the independent variable (IV) while controlling the other IVs.

For instance, if your dependent variable is income and your IVs include IQ and education (among other relevant variables), you might see output like this:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

The low p-values indicate that both education and IQ are statistically significant. The coefficient for IQ indicates that each additional IQ point increases your income by an average of approximately $4.80 while controlling everything else in the model. Furthermore, an additional unit of education increases average earnings by $24.22 while holding the other variables constant.

Regression analysis is a form of inferential statistics. The p-values help determine whether the relationships that you observe in your sample also exist in the larger population. I’ve written an entire blog post about how to interpret regression coefficients and their p-values, which I highly recommend.

Obtaining Trustworthy Regression Results

With the vast power of using regression comes great responsibility. Sorry, but that’s the way it must be. To obtain regression results that you can trust, you need to do the following:

Using regression analysis gives you the ability to separate the effects of complicated research questions. You can disentangle the spaghetti noodles by modeling and controlling all relevant variables, and then assess the role that each one plays.

There are many different regression analysis procedures. Read my post to determine which type of regression is correct for your data.

If you’re learning regression and like the approach I use in my blog, check out my eBook!

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

In Correlation Basic Concepts we define the correlation coefficient, which measures the size of the linear association between two variables. We now extend this definition to the situation where there are more than two variables.

Multiple Correlation Coefficient

Definition 1: Given variables x, y, and z, we define the multiple correlation coefficient

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

where rxz, ryz, rxy are as defined in Definition 2 of Basic Concepts of Correlation. Here x and y are viewed as the independent variables and z is the dependent variable.

Coefficient of Determination

We also define the multiple coefficient of determination to be the square of the multiple correlation coefficient.

Often the subscripts are dropped and the multiple correlation coefficient and multiple coefficient of determination are written simply as R and R2 respectively. These definitions may also be expanded to more than two independent variables. With just one independent variable the multiple correlation coefficient is simply r.

Unfortunately, R is not an unbiased estimate of the population multiple correlation coefficient, which is evident for small samples. A relatively unbiased version of R is given by R adjusted.

Definition 2: If R is Rz,xy as defined above (or similarly for more variables) then the adjusted multiple coefficient of determination is

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

where k = the number of independent variables and n = the number of data elements in the sample for z (which should be the same as the samples for x and y).

Data Analysis Tools

Excel Data Analysis Tools: In addition to the various correlation functions described elsewhere, Excel provides the Covariance and Correlation data analysis tools. The Covariance tool calculates the pairwise population covariances for all the variables in the data set. Similarly, the Correlation tool calculates the various correlation coefficients as described in the following example.

Example 1: We expand the data in Example 2 of Correlation Testing via the t Test to include a number of other statistics. The data for the first few states are displayed in Figure 1.

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Figure 1 – Data for Example 1

Using Excel’s Correlation data analysis tool we can compute the pairwise correlation coefficients for the various variables in the table in Figure 1. The results are shown in Figure 2.

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Figure 2 – Correlation coefficients for data in Example 1

We can also single out the first three variables, poverty, infant mortality, and white (i.e. the percentage of the population that is white) and calculate the multiple correlation coefficients, assuming poverty is the dependent variable, as defined in Definitions 1 and 2. We use the data in Figure 2 to obtain the values , and .

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Partial and Semi-Partial Correlation

Definition 3: Given x, y, and z as in Definition 1, the partial correlation of x and z holding y constant is defined as follows:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

In the semi-partial correlation, the correlation between x and y is eliminated, but not the correlation between x and z and y and z:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Causation

Suppose we look at the relationship between GPA (grade point average) and Salary 5 years after graduation and discover there is a high correlation between these two variables. As has been mentioned elsewhere, this is not to say that doing well in school causes a person to get a higher salary. In fact, it is entirely possible that there is a third variable, say IQ, that correlates well with both GPA and Salary (although this would not necessarily imply that IQ is the cause of the higher GPA and higher salary).

In this case, it is possible that the correlation between GPA and Salary is a consequence of the correlation between IQ and GPA and between IQ and Salary. To test this we need to determine the correlation between GPA and Salary eliminating the influence of IQ from both variables, i.e. the partial correlation .

Property

Property 1:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Proof: The first assertion follows since

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

The second assertion follows since:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Example 2: Calculate and for the data in Example 1.

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

We can see that Property 1 holds for this data since

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Partitioning Variance

Since the coefficient of determination is a measure of the portion of variance attributable to the variables involved, we can look at the meaning of the concepts defined above using the following Venn diagram, where the rectangular represents the total variance of the poverty variable.

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Figure 3 – Breakdown of variance for poverty

Using the data from Example 1, we can calculate the breakdown of the variance for poverty in Figure 4:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Figure 4 – Breakdown of variance for poverty continued

Note that we can calculate B in a number of ways: (A + B –  A, (B + C) – C, (A + B + C) – (A + C), etc., and get the same answer in each case. Also note that

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?
Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

where D = 1 – (A + B + C).

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Figure 5 – Breakdown of variance for poverty continued

Property 2: From Property 1, it follows that:

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

If the independent variables are mutually independent, this reduces to

Which quality tool is useful in deciding if there is a correlation between the values of two variables multiple choice question?

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack contains the following functions where the samples for z, x, and y are contained in the arrays or ranges R, R1, and R2 respectively.

CORREL_ADJ(R1, R2) = adjusted correlation coefficient for the data sets defined by ranges R1 and R2

MCORREL(R, R1, R2) = multiple correlation of dependent variable z with x and y

PART_CORREL(R, R1, R2) = partial correlation rzx,y of variables z and x holding y constant

SEMIPART_CORREL(R, R1, R2) = semi-partial correlation rz(x,y)

Multiple Correlation for more than 3 variables

Definition 1 defines the multiple correlation coefficient Rz,xy and the corresponding multiple coefficient of determination for three variables x, y, and z. We can extend these definitions to more than three variables as described in Advanced Multiple Correlation.

E.g. if R1 is an m × n array containing the data for n variables then the Real Statistics function RSquare(R1, k) calculates the multiple coefficient of determination for the kth variable with respect to the other variables in R1. The multiple correlation coefficient for the kth variable with respect to the other variables in R1 can then be calculated by the formula =SQRT(RSquare(R1, k)).

Thus if R1, R2, and R3 are the three columns of the m × 3 data array or range R, with R1 and R2 containing the samples for the independent variables x and y and R3 containing the sample data for dependent variable z, then =MCORREL(R3, R1, R2) yields the same result as =SQRT(RSquare(R, 3)).

Similarly, the definition of the partial correlation coefficient (Definition 3) can be extended to more than three variables as described in Advanced Multiple Correlation.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Howell, D. C. (2010) Confidence intervals on effect size
https://www.uvm.edu/~statdhtx/methods8/Supplements/MISC/Confidence%20Intervals%20on%20Effect%20Size.pdf

Schmuller, J. (2009) Statistical analysis with Excel for dummies. Wiley
https://www.wiley.com/en-us/Statistical+Analysis+with+Excel+For+Dummies%2C+3rd+Edition-p-9781118464311