Nonparametric Two-Way ANOVA

By Charles Holbert

April 16, 2022

Introduction

Often data from more than two groups within a single factor needs to be evaluated, usually on the basis of a representative value from each group. For uncensored data, comparisons among groups are made using the parametric analysis of variance (ANOVA) and the nonparametric Kruskal-Wallis (KW) test. These tests determine whether the groups essentially belong to the same underlying population, or whether there are significant differences between the group distributions. This is the simplest type of ANOVA design and is called one-way ANOVA. Two-way ANOVA is when two factors or variables potentially affect the response variable. For conventional two-way ANOVA with several observations or data values for each combination of factors, it is possible to test for factor interaction. Factor interaction refers to the case where each factor not only exerts an effect on the response variable (described as main effects), but also may interact with the other factor or factors to exert additional joint effects on the response variable (known as interaction effects)

If the data meet normality and homoscedasticity (i.e., equality of variance) requirements of parametric ANOVA, methods are available for computing two-way ANOVA. However, the KW test for nonparametric ANOVA cannot be directly extended to the case where there are two or more factors as is done with parametric two-way ANOVA. Instead, a nonparametric approximate approach is to transform the data values. One method that is not unduly affected by violations of the normality and homoscedasticity assumptions of parametric ANOVA is to transform the data values to their ranks, and then compute a parametric two-way ANOVA on the data ranks. This rank transformation procedure can be used to determine whether the ranks differ from group to group. Post-hoc comparisons can be performed on the data ranks to determine which groups are different from each other. An alternative approach is to use ordinal logistic regression to provide general contrasts on the log odds ratio scale (Harrell 2015).

In this post, we will evaluate whether sample depth and/or site location affect arsenic concentrations measured in soil. To address non-normality and heteroscedasticity, two-way ANOVA will be performed using the rank-transformation of the data values.

Data

Arsenic concentrations were measured in soil from two depth intervals, shallow (SS) and deep (SB), at five sites (RA1 through RA5). We would like to determine whether arsenic concentrations in soil not only exhibit spatial variability between site locations but also may be affected by sample depth. This constitutes a two-way ANOVA problem, with one factor as location (represented by the five sites) and the second factor as matrix (represented by the two soil sample depths).

First, let’s load the data from the comma-delimited file and then make the LOCATION and MATRIX variables ordered factors.

# Load libraries
library(dplyr)
library(car)


# Load data
dat <- read.csv('data/Soil_Arsenic.csv', header = T)
dat$LOGDATE <- as.POSIXct(dat$LOGDATE, format = '%m/%d/%Y')

# Make matrix variable ordered factor
dat$MATRIX <- factor(
  dat$MATRIX, ordered = TRUE,
  levels = c('SS', 'SB')
)

# Make matrix variable ordered factor
dat$LOCATION <- factor(
  dat$LOCATION, ordered = TRUE,
  levels = c('RA1', 'RA2', 'RA3', 'RA4', 'RA5')
)

str(dat)

## 'data.frame':	266 obs. of  11 variables:
##  $ LOCATION: Ord.factor w/ 5 levels "RA1"<"RA2"<"RA3"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ SACODE  : chr  "N" "N" "N" "N" ...
##  $ LOGDATE : POSIXct, format: "2019-08-15" "2019-08-15" ...
##  $ MATRIX  : Ord.factor w/ 2 levels "SS"<"SB": 1 1 1 1 1 1 1 1 1 1 ...
##  $ MEDIA   : chr  "Soil" "Soil" "Soil" "Soil" ...
##  $ DILUTION: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ PARAM   : chr  "Arsenic" "Arsenic" "Arsenic" "Arsenic" ...
##  $ DETECTED: chr  "Y" "Y" "Y" "Y" ...
##  $ RESULT  : num  2.95 2.83 2.96 2.55 4.02 3.7 3.52 3.03 3.27 3.52 ...
##  $ MDL     : num  1 0.8 0.8 0.8 0.8 1 0.8 0.8 0.8 0.8 ...
##  $ UNITS   : chr  "ppb" "ppb" "ppb" "ppb" ...

We see that the data contain 266 observations and 11 variables. Let’s create a frequency table to see how the observations are distributed between the MATRIX and LOCATION groups.

# Create frequency table
with(dat, table(LOCATION, MATRIX))

##         MATRIX
## LOCATION SS SB
##      RA1 25 23
##      RA2 25 11
##      RA3 25 25
##      RA4 25  7
##      RA5 25 75

For the SS soil depth, there are 25 samples per location. For the SB soil depth, the sample size ranges from 7 for RA4 to 75 for RA5. Thus, we have an unbalanced design. When sample sizes are substantially dissimilar, accurate interpretation of the main and interaction effects of the factors on the response variable becomes complicated, and the unequal sample sizes have a confounding effect on the factors.

Let’s calculate summary statistics for each data group.

# Compute summary statistics
dat %>%
  group_by(MATRIX, LOCATION) %>%
  summarise(
    count = n(),
    mean = round(mean(RESULT, na.rm = TRUE), 2),
    median = round(median(RESULT, na.rm = TRUE), 2),
    sd = round(sd(RESULT, na.rm = TRUE), 2),
    cv = round(sd/mean, 2),
  ) %>%
  ungroup()

## # A tibble: 10 x 7
##    MATRIX LOCATION count  mean median    sd    cv
##    <ord>  <ord>    <int> <dbl>  <dbl> <dbl> <dbl>
##  1 SS     RA1         25  6.37   6.12  1.57  0.25
##  2 SS     RA2         25  8.54   8.48  1.46  0.17
##  3 SS     RA3         25  3.59   3.27  1.01  0.28
##  4 SS     RA4         25  7.21   7.17  2.04  0.28
##  5 SS     RA5         25  6.6    6.68  0.8   0.12
##  6 SB     RA1         23  4.82   3.62  3.01  0.62
##  7 SB     RA2         11  4.58   4.57  2.83  0.62
##  8 SB     RA3         25  3.1    3.09  0.61  0.2 
##  9 SB     RA4          7  4.57   4.2   3.04  0.67
## 10 SB     RA5         75  6.33   6.24  0.64  0.1

As expected, the central tendency and spread in arsenic concentrations vary between sample depth and location. The highest variability in concentration is associated with samples collected from the SB soil depth at locations RA1, RA2, and RA4.

Let’s visually inspect the data using box plots and an interaction plot. Box plots show the central tendency, degree of symmetry, range of variation, and potential outliers of a data set. The upper value of the box represents the 75th percentile for the data and the lower value of the box is the 25th percentile for the data. Thus, 50 percent of the data fall within the box. The top of the whisker represents the 75th percentile plus 1.5 times the interquartile range (IQR), where the IQR is the 75th percentile minus the 25th percentile. The bottom of the whisker is the 25th percentile minus 1.5 times the IQR. Any value outside of this range is considered a potential statistical outlier, which is represented by a dot on the plot. An interaction plot displays the levels of one variable on the x-axis and has a separate line for the means of each level of the other variable. The y-axis is the dependent or response variable. We will use the R package ggpubr to create the plots.

# Load library
library(ggpubr)
library(cowplot)

# Box plot
p1 <- ggboxplot(
  dat, x = 'LOCATION', y = 'RESULT', fill = 'MATRIX',
  palette = c('#257ABA', '#C9B826')
)

# Line plot
p2 <- ggline(
  dat, x = 'LOCATION', y = 'RESULT', color = 'MATRIX',
  add = c('mean_se', 'dotplot'), size = 1,
  palette = c('#257ABA', '#C9B826'))

plot_grid(p1, p2, ncol = 1, align = 'v')

Based on these plots, it appears that arsenic concentrations are significantly different across the two soil depths for locations RA2 and RA4.

Two-Way ANOVA

Two-way ANOVA with replication can be used to test three null hypotheses: that the means of observations grouped by one factor are the same; that the means of observations grouped by the other factor are the same; and that there is no interaction between the two factors. When the interaction term is significant, each factor is evaluated separately using a one-way ANOVA.

Unbalanced two-way ANOVA is generally performed using Type-III sum of squares (SS). Estimates based on Type-III SS tell us how much of the residual variability in the response variable Y can be accounted for by factor A after having taken into account factor B and the A:B interaction, and how much of the residual variability in Y can be accounted for by factor B after having taken into account factor A and the A:B interaction. When using Type-III SS, factors should be set to use effects coding rather than relying on the default treatment codes. This can be done by adding the appropriate sum to zero contrasts as an argument to R’s aov function.

Let’s first perform two-way ANOVA on the original data.

# ANOVA procedure on original data
aov.org <- aov(
  RESULT ~ MATRIX * LOCATION, data = dat,
  contrasts = list(
    MATRIX = 'contr.sum',
    LOCATION = 'contr.sum'
  )
)
Anova(aov.org, type = 'III')

## Anova Table (Type III tests)
## 
## Response: RESULT
##                 Sum Sq  Df   F value    Pr(>F)    
## (Intercept)     5851.4   1 2399.7704 < 2.2e-16 ***
## MATRIX           149.7   1   61.4003 1.248e-13 ***
## LOCATION         339.5   4   34.8078 < 2.2e-16 ***
## MATRIX:LOCATION   93.0   4    9.5304 3.377e-07 ***
## Residuals        624.2 256                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, let’s perform two-way ANOVA on the log-transformed data.

# ANOVA procedure on log-transformed data
aov.log <- aov(
  log(RESULT) ~ MATRIX * LOCATION, data = dat,
  contrasts = list(
    MATRIX = 'contr.sum',
    LOCATION = 'contr.sum'
  )
)
Anova(aov.log, type = 'III')

## Anova Table (Type III tests)
## 
## Response: log(RESULT)
##                 Sum Sq  Df  F value    Pr(>F)    
## (Intercept)     479.08   1 4379.867 < 2.2e-16 ***
## MATRIX            8.24   1   75.355 4.684e-16 ***
## LOCATION         14.40   4   32.914 < 2.2e-16 ***
## MATRIX:LOCATION   4.63   4   10.571 6.079e-08 ***
## Residuals        28.00 256                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finaly, let’s perform two-way ANOVA on the rank-transformed data. Ranking is one of many procedures used to transform data that do not meet the assumptions of normality. Conover and Iman (1981) provided a review of the four main types of rank transformations. One method replaces each original data value by its rank (from 1 for the smallest to N for the largest, where N is the combined data sample size). This rank-based procedure has been recommended as being robust to non-normal errors, resistant to outliers, and highly efficient for many distributions.

# ANOVA procedure on rank-transformed data
aov.rnk <- aov(
  rank(RESULT) ~ MATRIX * LOCATION, data = dat,
  contrasts = list(
    MATRIX = 'contr.sum',
   LOCATION = 'contr.sum'
  )
)
Anova(aov.rnk, type = 'III')

## Anova Table (Type III tests)
## 
## Response: rank(RESULT)
##                  Sum Sq  Df F value    Pr(>F)    
## (Intercept)     2911804   1 936.506 < 2.2e-16 ***
## MATRIX           168683   1  54.252 2.419e-12 ***
## LOCATION         442327   4  35.566 < 2.2e-16 ***
## MATRIX:LOCATION  101199   4   8.137 3.431e-06 ***
## Residuals        795960 256                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results indicate that soil depth (MATRIX), site location (LOCATION), and the interaction between depth and location (MATRIX:LOCATION) are statistically significant (p-value < 0.05). Let’s inspect the grand means and the group means.

model.tables(aov.org, type = "means", se = TRUE)

## Tables of means
## Grand mean
##          
## 5.840902 
## 
##  MATRIX 
##          SS      SB
##       6.463   5.289
## rep 125.000 141.000
## 
##  LOCATION 
##        RA1    RA2    RA3    RA4     RA5
##      5.567  7.069  3.312  6.265   6.659
## rep 48.000 36.000 50.000 32.000 100.000
## 
##  MATRIX:LOCATION 
##       LOCATION
## MATRIX RA1   RA2   RA3   RA4   RA5  
##    SS   6.37  8.54  3.59  7.21  6.60
##    rep 25.00 25.00 25.00 25.00 25.00
##    SB   4.82  4.58  3.10  4.57  6.33
##    rep 23.00 11.00 25.00  7.00 75.00

Let’s inspect the residuals from each model for normality. The residuals from the models using the orginal data and the log-transformed data deviate signficantly form nonmality. Residuals from the rank-transformed data model appear much better, although ranks aren’t expected to be normally distributed; barring any ties, ranks within any given group act like a sample without replacement from a discrete uniform distribution.

res.org = aov.org$resid
res.log = aov.log$resid
res.rnk = aov.rnk$resid
qqnorm(
  res.org, pch = 20, main = "Original Data",
  cex.lab = 1, cex.axis = 0.7, cex.main = 1
)
qqline(res.org)

qqnorm(
  res.log, pch = 20, main = "Log-Transformed",
  cex.lab = 1, cex.axis = 0.7, cex.main = 1
)
qqline(res.log)

qqnorm(
  res.rnk, pch = 20, main = "Rank-Transformed",
  cex.lab = 1, cex.axis = 0.7, cex.main = 1
)
qqline(res.rnk)

Now, let’s inspect the residuals versus the fitted values as a visual check for homogeneity of variances.

plot(aov.org, 1, main = "Original Data")
plot(aov.log, 1, main = "Log-Transformed")
plot(aov.rnk, 1, main = "Rank-Transformed")

There is no evident relationships between residuals and the fitted values. Several points are identified as outliers, which can affect normality and homogeneity of variance.

Nature of Interactions

Let’s inspect the nature of the interactions using the R package emmeans. The emmeans package includes functions to facilitate post-hoc comparisons and contrast analysis. Let’s compare LOCATION separately for each MATRIX. The first thing to investigate is the significance of the interaction. If it is nonsignificant, we can focus on the main effects. However, if the interaction is significant, the main effects will not be helpful, as we will need to explore when each factor is significant given the levels of the other factor.

# Compute estimated marginal means for factor combinations
library(emmeans)
emmeans(aov.rnk, pairwise ~ LOCATION | MATRIX)

## $emmeans
## MATRIX = SS:
##  LOCATION emmean    SE  df lower.CL upper.CL
##  RA1       150.7 11.15 256    128.8    172.7
##  RA2       229.3 11.15 256    207.4    251.3
##  RA3        52.2 11.15 256     30.2     74.1
##  RA4       175.1 11.15 256    153.2    197.1
##  RA5       163.7 11.15 256    141.7    185.7
## 
## MATRIX = SB:
##  LOCATION emmean    SE  df lower.CL upper.CL
##  RA1       105.0 11.63 256     82.1    127.9
##  RA2        95.7 16.81 256     62.6    128.8
##  RA3        37.2 11.15 256     15.2     59.1
##  RA4        84.0 21.08 256     42.5    125.5
##  RA5       150.0  6.44 256    137.3    162.7
## 
## Results are given on the rank (not the response) scale. 
## Confidence level used: 0.95 
## 
## $contrasts
## MATRIX = SS:
##  contrast  estimate   SE  df t.ratio p.value
##  RA1 - RA2   -78.62 15.8 256  -4.985  <.0001
##  RA1 - RA3    98.54 15.8 256   6.248  <.0001
##  RA1 - RA4   -24.40 15.8 256  -1.547  0.5328
##  RA1 - RA5   -12.98 15.8 256  -0.823  0.9234
##  RA2 - RA3   177.16 15.8 256  11.233  <.0001
##  RA2 - RA4    54.22 15.8 256   3.438  0.0061
##  RA2 - RA5    65.64 15.8 256   4.162  0.0004
##  RA3 - RA4  -122.94 15.8 256  -7.795  <.0001
##  RA3 - RA5  -111.52 15.8 256  -7.071  <.0001
##  RA4 - RA5    11.42 15.8 256   0.724  0.9508
## 
## MATRIX = SB:
##  contrast  estimate   SE  df t.ratio p.value
##  RA1 - RA2     9.29 20.4 256   0.455  0.9911
##  RA1 - RA3    67.86 16.1 256   4.212  0.0003
##  RA1 - RA4    21.02 24.1 256   0.873  0.9065
##  RA1 - RA5   -44.96 13.3 256  -3.383  0.0073
##  RA2 - RA3    58.57 20.2 256   2.903  0.0325
##  RA2 - RA4    11.73 27.0 256   0.435  0.9925
##  RA2 - RA5   -54.26 18.0 256  -3.014  0.0235
##  RA3 - RA4   -46.84 23.8 256  -1.964  0.2865
##  RA3 - RA5  -112.83 12.9 256  -8.762  <.0001
##  RA4 - RA5   -65.99 22.0 256  -2.994  0.0249
## 
## Note: contrasts are still on the rank scale 
## P value adjustment: tukey method for comparing a family of 5 estimates

We see that the effect of MATRIX depends on the levels of LOCATION and vice versa. The next step is to determine the nature of the interaction. Let’s perform something akin to a one-way ANOVA for MATRIX within each level of LOCATION. However, the F-test will utilize all of the information from the full two-way factorial model.

em_out_category <- emmeans(aov.rnk,  ~ MATRIX | LOCATION) 
print(em_out_category)

## LOCATION = RA1:
##  MATRIX emmean    SE  df lower.CL upper.CL
##  SS      150.7 11.15 256    128.8    172.7
##  SB      105.0 11.63 256     82.1    127.9
## 
## LOCATION = RA2:
##  MATRIX emmean    SE  df lower.CL upper.CL
##  SS      229.3 11.15 256    207.4    251.3
##  SB       95.7 16.81 256     62.6    128.8
## 
## LOCATION = RA3:
##  MATRIX emmean    SE  df lower.CL upper.CL
##  SS       52.2 11.15 256     30.2     74.1
##  SB       37.2 11.15 256     15.2     59.1
## 
## LOCATION = RA4:
##  MATRIX emmean    SE  df lower.CL upper.CL
##  SS      175.1 11.15 256    153.2    197.1
##  SB       84.0 21.08 256     42.5    125.5
## 
## LOCATION = RA5:
##  MATRIX emmean    SE  df lower.CL upper.CL
##  SS      163.7 11.15 256    141.7    185.7
##  SB      150.0  6.44 256    137.3    162.7
## 
## Results are given on the rank (not the response) scale. 
## Confidence level used: 0.95

Next, we pipe this object into the pairs() function and use the joint = TRUE option in the test() function to get the joint test of significance (i.e., the ANOVA F-test). The test function is similar to running multiple one-way ANOVAs, except we are using the error term from the full two-way factorial model. The results include the following:

df1: degrees of freedom between groups
df2: degrees of freedom within groups
F.ratio: F statistic for MATRIX within each level of LOCATION
p.value: probability of a Type-I error for each category

em_out_category %>% 
  pairs() %>% 
  test(joint = TRUE)

##  LOCATION df1 df2 F.ratio p.value
##  RA1        1 256   8.046  0.0049
##  RA2        1 256  43.861  <.0001
##  RA3        1 256   0.907  0.3418
##  RA4        1 256  14.604  0.0002
##  RA5        1 256   1.134  0.2879

We find that MATRIX is significant for RA1, RA2, and RA4; the effect of MATRIX is not significant for RA3 and RA5. Because MATRIX only has two levels, we do not need to do any further pairwise comparisons. The following code tells us which pairwise difference led to the significant joint F-test.

pairs(em_out_category)

## LOCATION = RA1:
##  contrast estimate   SE  df t.ratio p.value
##  SS - SB      45.7 16.1 256   2.837  0.0049
## 
## LOCATION = RA2:
##  contrast estimate   SE  df t.ratio p.value
##  SS - SB     133.6 20.2 256   6.623  <.0001
## 
## LOCATION = RA3:
##  contrast estimate   SE  df t.ratio p.value
##  SS - SB      15.0 15.8 256   0.952  0.3418
## 
## LOCATION = RA4:
##  contrast estimate   SE  df t.ratio p.value
##  SS - SB      91.1 23.8 256   3.821  0.0002
## 
## LOCATION = RA5:
##  contrast estimate   SE  df t.ratio p.value
##  SS - SB      13.7 12.9 256   1.065  0.2879
## 
## Note: contrasts are still on the rank scale

Multiple Comparisons

Although the original two-way ANOVA indicated evidence of interaction between soil depth (MATRIX) and site (LOCATION), let’s perform multiple comparisons using the glht() function in the R package multcomp. glht stands for general linear hypothesis tests and is used to perform multiple pairwise-comparisons for parametric models. Let’s perform post-hoc pairwise-comparisons between the means of groups using the Tukey Honest Significant Differences (HSD) test. We don’t need to perform the test for MATRIX because it has only two levels, which have been shown to be significantly different by the ANOVA test. Therefore, the Tukey HSD test will be done only for LOCATION.

library(multcomp)
summary(glht(aov.rnk, linfct = mcp(LOCATION = "Tukey")))

## 
## 	 Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = rank(RESULT) ~ MATRIX * LOCATION, data = dat, contrasts = list(MATRIX = "contr.sum", 
##     LOCATION = "contr.sum"))
## 
## Linear Hypotheses:
##                Estimate Std. Error t value Pr(>|t|)    
## RA2 - RA1 == 0   34.663     12.909   2.685   0.0572 .  
## RA3 - RA1 == 0  -83.201     11.273  -7.381   <0.001 ***
## RA4 - RA1 == 0    1.689     14.388   0.117   1.0000    
## RA5 - RA1 == 0   28.972     10.312   2.809   0.0410 *  
## RA3 - RA2 == 0 -117.864     12.804  -9.205   <0.001 ***
## RA4 - RA2 == 0  -32.974     15.617  -2.111   0.2136    
## RA5 - RA2 == 0   -5.690     11.967  -0.475   0.9892    
## RA4 - RA3 == 0   84.890     14.294   5.939   <0.001 ***
## RA5 - RA3 == 0  112.173     10.180  11.019   <0.001 ***
## RA5 - RA4 == 0   27.283     13.550   2.014   0.2569    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

The results indicate that arsenic concentrations in soil at location RA3 are statistically different than those at the other four site locations.

References

Conover, W.J. and R.L. Iman, R. L. 1981. Rank transformations as a bridge between parametric and nonparametric statistics. American Statistician. 35 (3): 124-129.

Posted on:: April 16, 2022

Length:: 17 minute read, 3441 words

See Also: