Mann-Kendall Power Analysis Revisited

By Charles Holbert

April 5, 2022

Introduction

Detection of a long-term, temporal trend in environmental data is affected by a number of factors, including the size of the trend to be detected, the time span of the data, and the magnitude of variability and autocorrelation of the noise in the data. Trends are often statistically modelled as being a smooth linear change, generally expressed in percent change per unit of time. Environmental data often contain significant variation on different timescales in addition to the linear trend. For independent (uncorrelated) time-series data the sample size required to detect the trend is determined by the magnitude of variation of the noise. Because environmental data often exhibit positive autocorrelation, there is an increase in the required sample size to detect a given trend.

Many different statistical approaches are available for detecting and estimating trends that may be present in environmental data. These include simple correlation and regression analyses, time-series analyses, and methods based on nonparametric statistics. The Mann-Kendell test (Mann 1945; Kendall 1975; Gilbert 1987) is one of the most popular nonparametric tests for determining temporal trends. The test is well suited for analyzing non-normally distributed data that may contain outliers or have missing observations. In this post, the power of the Mann-Kendall test for trend detection will be estimated by Monte Carlo simulation for various combinations of trend, variability, and sample size.

Statistical Approach

The Mann-Kendall test is a nonparametric procedure used to assess if there is a monotonic upward or downward trend of the variable of interest over time. A monotonic upward (downward) trend means that the variable consistently increases (decreases) through time, but the trend may or may not be linear. The data values are evaluated as an ordered time series. Each data value is compared to all subsequent data values. Thus, the test can be viewed as a nonparametric test for zero slope of the linear regression of time-ordered data versus time, as illustrated by Hollander and Wolfe (1973, p. 201). A requirement of the Mann-Kendall test is that the data are independent. Successive measurements in environmental time series are often correlated with one another. This means that pairs of consecutive measurements taken in a series will be positively correlated, exhibiting a stronger similarity than expected from pairs collected at random times. The existence of serial correlation will affect the ability of the Mann-Kendall test to assess the significance of a trend (von Storch 1995).

The significance level, or a Type I error, \(\alpha\) of the Mann-Kendall test is the probability of rejecting the null hypothesis when there is no trend in the data. The calculated probability (p-value) for the test represents the probability that any observed trend would occur purely by chance (given the variability and sample size of the data set). Typically, a significance level of 0.05, corresponding to a 95% confidence level, is used to test the null hypothesis that there is no trend in the data. The result could be a statistically significant increasing or decreasing trend, or a nonsignificant result, indicating the null hypothesis of no trend cannot be rejected at the specified confidence. The power of the Mann-Kendall test is the probability of rejecting the null hypothesis, 1- \(\beta\), when there is a trend in the data. A Type II error, \(\beta\), is the probability of accepting the null hypothesis of no trend when it is false.

For this evaluation, the power of the Mann-Kendall test will be evaluated under various combinations of trend, variability, and sample size. Trend will vary in 2% increments from no trend to 22% and the coefficient of variation (CV) will range from 0.05 to 0.5. The CV is a relative measure of variation described by the ratio of the sample standard deviation to the sample mean. Values less than 1 indicate that the data form a relatively close group about the mean value. Values larger than 1 indicate that the data show a greater degree of scatter about the mean. These values of CV represent idealized conditions as most environmental data exhibit CV values greater than 0.5. The Mann-Kendall test will be calculated at a significance level of 0.05 (95% confidence) for sample sizes ranging from 4 to 32. Power will be estimated using 1,000 simulations for each combination of the variables, which are defined below.

# Load Required libraries
library(dplyr)
library(Kendall)
library(ggplot2)
library(scales)
library(cowplot)

# Set seed for reproducibility
seed <- 56784567

# Set number of cores
ncores = 4
if (ncores > parallel::detectCores())
  stop("You requested more cores than are available on the machine.")

# Assign initial parameters
dist <- 'norm'  # select 'norm' or 'lnorm'
alpha <- 0.05   # significance level
nsim <- 1000    # number of simulations
cv <- 0.0       # initialize CV of the constant term
slope <- 0      # starting slope
ncv <- 10       # number of CVs
nslope <- 11    # number of slopes
obs_start <- 4  # initial sample size
obs_end <- 32   # end sample size
mu <- 100       # mean of the constant term
slope <- slope + 2 * seq(0, nslope, by = 1)    # varying trend slope
cv <- cv + 0.05 * seq(1, ncv, by = 1)          # varying cv
nobs <- seq(obs_start, obs_end, by = 2)        # varying sample size

Data

The power of a test can be determined only when the true situation is known. In a trend test, this requires knowledge of the trend. For this evaluation, synthetic time series with trend will be generated using the following equation:

\begin{equation} y_t = \alpha + \beta t + N_t, \quad t = 1, 2,..., n \end{equation}

where \(\alpha\) is a constant parameter, \(\beta\) is the slope, t is time, and \(N_t\) is the noise. For the time series \(y_t\), \(\alpha + \beta t\) represents the trend term and \(N_t\) represents the random term. The random term is assumed to be autocorrelated and is represented by an autoregressive model of first order, AR(1):

\begin{equation} N_t = \phi N_{t-1} + \varepsilon_t \end{equation}

where \(\varepsilon_t\) are independent random variables with zero mean and constant variance \(\sigma_\varepsilon^2\), such that \(\varepsilon_t\) is independent of \({y_t,..., y_{t-1}}\) (Weatherhead et al. 1998). It also is assumed that \(-1<\phi<1\), so the noise process is stationary. This model allows the noise to be autocorrelated among successive measurements. The variance of the noise \(N_t\) is directly related to the variance \(\sigma_\varepsilon^2\) of the white noise \({\varepsilon_t}\) in the model by the following:

\begin{equation} \sigma_N^2 = \frac{\sigma_\varepsilon^2}{1 - \phi^2} \end{equation}

Using these equations, let’s create a function to generate time series based on the given parameters using the code provided below. This function will be used in our Monte Carlo simulation. Let’s also create a data frame for storing our results.

# Generate the data frame for storing results
param <- expand.grid(
  nobs = nobs,
  slope = slope,
  cv = cv,
  var_ratio = NA,
  var_r = NA,
  nsim = nsim
)
param$var_r <- with(param, (mu * cv)^2)
param$var_ratio <- with(param, (nobs^2 - 1) * slope^2 / (12 * var_r))

# Function to generate data
get_data <- function(x) {
  t <- seq(from = 1, to = x[[1]], by = 1)
  trend_comp <- x[[2]] * t
  if (dist == 'norm') {
    y <- trend_comp + rnorm(x[[1]], mean = mu, sd = mu * x[[3]])
  }
  if (dist == 'lnorm') {
    location <- 0.5 * log(mu^2 / (1 + x[[3]]^2))
    shape <- sqrt(log(1 + x[[3]]^2))
    y <- trend_comp + rlnorm(x[[1]], location, shape)
  }
  data.frame(t, y)
}

Power of the Mann-Kendall Test

Let’s run the 1,000 simulations and calculate the power of the Mann-Kendall test under the various combinations of the given parameters using the code provided below. We will use the base R parallel package to run computations in parallel to minimize simulation time.

# Function to run tests on nsim datasets
simulate <- function(x){
  
  # Get data
  dt <- t(replicate(nsim, get_data(x)))
  
  # Run Mann-Kendall test
  mktest <- apply(dt, 1, function(x) MannKendall(x[[2]])$sl)
  mktest_power <- sum(mktest < alpha)/nsim
  list(Power_MK = mktest_power)           
}

# Get test results for nsim datasets
if (ncores <= 1) {

  if (!is.null(set.seed)) set.seed(seed)
  zz <- apply(param, 1, simulate)

} else {

  cl <- parallel::makeCluster(ncores)
  parallel::clusterExport(
    cl,
    list("alpha", "dist", "nsim", "mu", "get_data", "MannKendall")
  )

  if (!is.null(seed)) parallel::clusterSetRNGStream(cl, seed)
  zz <- parallel::parApply(cl, param, 1, simulate)
  parallel::stopCluster(cl)
}

# Convert list of lists to data frame
zz_df <- do.call(rbind, lapply(zz, as.data.frame))

# Combine output with param
param <- bind_cols(param, zz_df)
str(param)
## 'data.frame':	1800 obs. of  7 variables:
##  $ nobs     : num  4 6 8 10 12 14 16 18 20 22 ...
##  $ slope    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cv       : num  0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
##  $ var_ratio: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ var_r    : num  25 25 25 25 25 25 25 25 25 25 ...
##  $ nsim     : num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Power_MK : num  0 0.014 0.033 0.043 0.043 0.054 0.031 0.053 0.033 0.054 ...
##  - attr(*, "out.attrs")=List of 2
##   ..$ dim     : Named int [1:6] 15 12 10 1 1 1
##   .. ..- attr(*, "names")= chr [1:6] "nobs" "slope" "cv" "var_ratio" ...
##   ..$ dimnames:List of 6
##   .. ..$ nobs     : chr [1:15] "nobs= 4" "nobs= 6" "nobs= 8" "nobs=10" ...
##   .. ..$ slope    : chr [1:12] "slope= 0" "slope= 2" "slope= 4" "slope= 6" ...
##   .. ..$ cv       : chr [1:10] "cv=0.05" "cv=0.10" "cv=0.15" "cv=0.20" ...
##   .. ..$ var_ratio: chr "var_ratio=NA"
##   .. ..$ var_r    : chr "var_r=NA"
##   .. ..$ nsim     : chr "nsim=1000"

The simulation results are saved in our param data frame and consist of 1,800 records and 7 variables. The variables include our initial combinations of trend, variability, and sample size as well as the power of the Mann-Kendall test calculated for the 1,000 simulations. Let’s plot the power as a function of sample size for different values of CV.

# Function to create plots
make_plot <- function(df0, slopeID) {
  p <- ggplot(data = df0, aes(x = nobs, y = Power_MK, color = cv)) +
    geom_point() +
    geom_line() +
    scale_x_continuous(limits = c(obs_start, obs_end),
                       breaks = seq(obs_start, obs_end, 4)) +
    scale_y_continuous(limits = c(0, 1),
                       breaks = seq(0, 1, 0.2)) +
    labs(title = paste0('Trend Slope = ', slopeID),
         color = 'CV',
         x = 'Sample Size',
         y = 'Power') +
    scale_color_hue(l = 50) +
    theme_bw(base_size = 9) +
    theme(plot.title = element_text(size = 9, hjust = 0.5),
          axis.title.y = element_text(size = 8, color = 'black'),
          axis.title.x = element_text(size = 8, color = 'black'),
          axis.text.x = element_text(size = 8, color = 'black'),
          axis.text.y = element_text(size = 8, color = 'black'),
          legend.key.height = unit(0.75, 'line'))
  
  p <- p +  theme(legend.position = 'none')
  
  return(p)
}

# Create plot list
param$slope <- as.factor(sprintf("%1.1f%%", param$slope))
param$cv <- as.factor(sprintf("%0.2f", round(param$cv, digits = 2)))

slopes <- unique(param$slope)
plots <- list()
i <- 1

for (slp in slopes) {
  dfs <- param[with(param, slope == slp),]
  plt <- make_plot(dfs, slp)
  plots[[i]] <- plt
  i <- i + 1
}

# Create plot grid
prow <- plot_grid(plotlist = plots, ncol = 3)

# Extract legend from first plot
legend <- get_legend(
  plots[[1]] + 
    guides(color = guide_legend(nrow = 1)) +
    theme(legend.position = "bottom")
)

# Add legend underneath the plot grid giving it 10% the height
# of one plot (via rel_heights).
plot_grid(prow, legend, ncol = 1, rel_heights = c(1, 0.1))
Results of the simulation indicate the power of the Mann-Kendall test depends on sample size and the amount of variation within the time series. As the sample size increases, the test becomes more powerful; however, the power of the test decreases as the amount of variation increases within the time series. The magnitude of the trend also has a significant impact on the power of the test. For time series exhibiting small trends, large sample sizes are needed to achieve a minimum acceptable power in detecting the trend. The low power of the Mann-Kendall test to detect trends of low magnitude is exacerbated for time series with moderate variability, corresponding to CV values of about 0.5 or greater. A sample size of 20 or more is needed under these conditions to achieve an acceptable minimum power of 80%.

Normal Versus Lognormal Distribution

Yue et al. (2002) found that the power of the Mann-Kendall test is dependent upon distribution type. The lowest power was reported to be associated with a lognormal distribution. This is an interesting result indicating that the power of the Mann-Kendall test is dependent on distribution type, in contrast to common thinking that states that the test is distribution free.

Let’s explore the sensitivity of the power of the Mann-Kendall test to the distribution type of sample data. The power-slope curves for normal and lognormal distribution type for a CV value of 0.5 are shown below.

# Set cv value for plotting a single cv
cv_value <- 0.5

# Read data in comma-delimited file
dat_norm <- read.csv('data/MK_power_results_norm.csv', header = T)
dat_lnorm <- read.csv('data/MK_power_results_lnorm.csv', header = T)

# Add type to each data set and combine
dat_norm$Type <- 'norm'
dat_lnorm$Type <- 'lnorm'
dat <- bind_rows(dat_norm, dat_lnorm) %>%
  filter(cv == cv_value)

# Make ordered factor
dat$Type <- factor(
  dat$Type,
  levels = c('norm', 'lnorm'),
  ordered = TRUE)

# Function to create plots
make_plot <- function(df0, slopeID) {
  p <- ggplot(data = df0, aes(x = nobs, y = Power_MK, color = Type)) +
    geom_point() +
    geom_line() +
    scale_x_continuous(limits = c(4, 32),
                       breaks = seq(4, 32, 4)) +
    scale_y_continuous(limits = c(0, 1),
                       breaks = seq(0, 1, 0.2)) +
    labs(title = paste0('Trend Slope = ', as.factor(sprintf("%1.1f%%", slopeID))),
         color = 'Type',
         x = 'Sample Size',
         y = 'Power') +
    scale_color_hue(l = 50) +
    theme_bw(base_size = 9) +
    theme(plot.title = element_text(size = 9, hjust = 0.5),
          axis.title.y = element_text(size = 8, color = 'black'),
          axis.title.x = element_text(size = 8, color = 'black'),
          axis.text.x = element_text(size = 8, color = 'black'),
          axis.text.y = element_text(size = 8, color = 'black'),
          legend.key.height = unit(0.75, 'line'))

  p <- p +  theme(legend.position = 'none')

  return(p)
}

# Create plot list
slopes <- sort(unique(dat$slope))
plots <- list()
i <- 1

for (slp in slopes) {
  dfs <- dat[with(dat, slope == slp),]
  plt <- make_plot(dfs, slp)
  plots[[i]] <- plt
  i <- i + 1
}

# Create plot grid
prow <- plot_grid(plotlist = plots, ncol = 3)

# Extract legend from first plot
legend <- get_legend(
  plots[[1]] + 
    guides(color = guide_legend(nrow = 1)) +
    theme(legend.position = "bottom")
)

# Add legend underneath the plot grid giving it 10% the height
# of one plot (via rel_heights).
plot_grid(prow, legend, ncol = 1, rel_heights = c(1, 0.1))

In the case of no trend, the power of the test remains the same for the two different distribution types and is equal to the pre-assigned significance level of 0.05. This indicates that the null distribution of the Mann-Kendall test statistic is not sensitive to the distribution type of the time series. However, in the case when a trend exists, the power of the test is different for the two distribution types. In general, the lognormal distribution exhibits a higher power than the normal distribution. This finding indicates that when a trend does exist, the power of the Mann-Kendall test is also dependent on the distribution type, which is in contrast to the common thought that the Mann-Kendall test is rank-based and would be distribution-free. This also contrasts with the findings of Yue et al. (2002) that reported the lowest power was associated with a lognormal distribution.

Conclusions

The rank-based, nonparametric Mann-Kendall test is commonly used to assess the significance of trends in environmental time-series data. Simulation results indicate the power of the Mann-Kendall test depends on the magnitude of trend, sample size, and the amount of variation within a time series. The bigger the absolute magnitude of trend, the more powerful the test; as the sample size increases, the test becomes more powerful; and as the amount of variation increases within a time series, the power of the test decreases. Large sample sizes generally are required to detect small trends even for data with small autocorrelation and low variability. The large sample sizes required for trend detection are a result of the random variability in the data. The power of the test is also dependent on distribution type. Because most environmental data are thought to be lognormally distributed, such increases/decreases in power may inadvertently affect the interpretation of significance, especially when comparing variables that have different statistical properties.

The results presented in this evaluation provide a better understanding of the power of the Mann-Kendall test for detecting trends in time series. Two major factors influencing trend detection are autocorrelation and variance of the noise in the data. These two parameters can vary substantially from site to site with the result being that one site may require many more observations to detect a given trend than another site. Establishing an appropriate level of power when designing a sampling program is as important as establishing the significance level of the test. If the power is too low, meaning the probability of rejecting the null hypothesis of no trend when in fact there is a trend, the performance of the test suffers. It is important to understand the statistical power and detectible magnitudes of change associated with a test for temporal trend. Given the small sample sizes and highly variable nature of environmental data, caution is advised when performing and interpreting the results of the Mann-Kendall test.

References

Gilbert, R.O. 1987. Statistical Methods for Environmental Pollution Monitoring. Wiley, New York.

Hollander, M. and D.A. Wolf. 1973. Nonparametric Statistical Methods. Wiley, New York.

Kendall, M.G. 1975. Rank Correlation Techniques, 4th edition. Charles Griffen. London.

Mann, H.B. 1945. Nonparametric Tests Against Trend. Econometrica 13, 245-259.

von Storch, V.H. 1995. Misuses of statistical analysis in climate research. In H.V. Storch and A. Navarra (eds), Analysis of Climate Variability: Applications of Statistical Techniques, SpringerVerlag Berlin, 11-26.

Weatherhead, E.C., G.C. Reinsel, G.C. Tiao, X. Meng, D. Choi, W. Cheang, T. Keller, J.DeLuisi, D.J. Wuebbles, J.B., Kerr, A.J. Miller, S.J. Oltmans, and J.E. Frederick. Factors affecting the detection of trends: Statistical considerations and ap[lications to environmental data. J. Geophys. Res. 103, 17,149-17,161.

Yue, S., P. Pilon, and G. Caradias. 2002. Power of the Mann-Kendall and Spearman’s rho tests for detecting monotonic trends in hydrological series. J. Hydrol. 259, 254-271.

Posted on:
April 5, 2022
Length:
15 minute read, 3053 words
See Also: