Ancova

In short: Analysis of covariance (ANCOVA) is a statistical test that compares the means of more than two groups by taking under the control the "noise" caused by covariate variable that is not of experimental interest. This is done in order to see the true effect of the variable of interest on a dependent variable.

Prerequites

Prerequisite knowledge An understanding of ANOVA

Main effects and interaction effects
Sum of squares
Mean squares
ANOVA tables
F-statistics and significance values
Post-hoc analysis (Tukey, Bonferonni, etc.)

An understanding of Linear Regression

Regression slopes
p-values
Coefficients

Definition

Analysis of covariance (ANCOVA) is the statistical test that compares the means of more than two groups by taking under the control the "noise" caused by covariate variable that is not of experimental interest. This is done in order to see the true effect of the variable of interest on a dependent variable. Fundamental idea of ANCOVA is based on hypothesis testing, its goal is to evaluate multiple mutually exclusive theories about our data, where null hypothesis claims that the means of the groups are equal, and alternative hypothesis rejects that based on p-value. Additionally, it is important to highlight that ANCOVA also gives information on variables by means of Sum of Squares partitioning that shows the size of error or unexplained information in relation. Since ANCOVA historically comes from the combination of Linear regression and ANOVA, one should meet the assumptions inherent from them as well as specific to ANCOVA before proceeding to the ANCOVA test.

Assumptions

Regression assumptions

The relationship between dependent and independent variables must be linear for each treatment group.

ANOVA assumptions

Variances between groups are homogeneous.
Residuals are randomly and normally distributed.

Specific ANCOVA assumptions

A further specific (but optional) assumption is homogeneity of slopes. It is optional because it is only required to simplify the model for estimation of adjusted means.

What is One-way ANCOVA?

One-way ANCOVA compares the variance in the group means within a sample with only one independent variable or factor that has three or more than three categorical groups whilst considering the coavriate. Since ANCOVA is hypothesis-based test, we need to have a understanding and well developed question about our data that we want an answer to, before we can generate a hypothesis and run the test.

Data preparation

In order to demonstrate One-way ANCOVA test we will refer to balanced dataset "anxiety" taken from the "datarium" package. The data provides the anxiety score, measured at three time points, of three groups of individuals practicing physical exercises at different levels (grp1: basal, grp2: moderate and grp3: high). The question is "What treatment type has the most effect on anxiety level?"

install.packages("datarium")
#installing the package datarium where we can find different types of datasets

Exploring the data

data("anxiety", package = "datarium")
data = anxiety
str(data)
#Getting the general information on data

cor(data$t1, data$t3, method = "spearman")
#Considering the correlation between independent and dependent continious variables in order to see the level of covariance

library(repr)
options(repr.plot.width=4, repr.plot.height=4)
#regulating the size of the boxplot

boxplot(t3~group,
data = data,
main = "Score by the treatment type",
xlab = "Treatment type",
ylab = "Post treatment score",
col = "yellow",
border = "blue"
)

Based on the boxplot we can see that the anxiety mean score is the highest for individuals who have been practicing basal(grp1) type of exercises and the lowest for individuals who have been practicing the high(grp3) type of exercises. These observations are made based on descriptive statistic by not taking into consideration the overall personal ability of each individual to control his/her anxiety level, let us prove it statistically with the help of ANCOVA.

Check assumptions

Linearity assumption can be assessed with the help of the regression fitted lines plot for each treatment group. Based on the plot bellow we can visually assess that the relationship between anxiety score before and after treatment is linear.

plot(x   = data$t1,
     y   = data$t3,
     col = data$group,
     main = "Score by the treatment type",
     pch = 15,
     xlab = "anxiety score before the treatment",
     ylab = "anxiety score after the treatment")

legend('topleft',
       legend = levels(data$group),
       col = 1:3,
       cex = 1,   
       pch = 15)
abline(lm (t3[group == "grp1"]~ t1[group == "grp1"],
              data = data))
abline(lm (t3[group == "grp2"]~ t1[group == "grp2"],
              data = data))
abline(lm (t3[group == "grp3"]~ t1[group == "grp3"],
              data = data))

In order to evaluate whether the residuals are unbiased or not and whether the variance of the residuals is equal or not a plot of residuals vs. dependent variable can be compiled. Based on the plot bellow we can conclude that the variances between the groups are homogeneous(homoscedastic). For interpretation of the plot please refer to the Fig. 1.

File:Reading the residuals

Fig 1. Reading the residuals (Source: condor.depaul.edu/sjost/it223/documents/resid-plots.gif)