Blog

Data analytics, statistics, and more

Groundwater Detection Monitoring: Importance of Limiting the Number of Constituents

Detection monitoring uses statistical analyses to differentiate natural groundwater variations from those due to landfill activities. These monitoring programs prioritize two key performance characteristics: adequate statistical power and a low sitewide false positive rate (SWFPR), distributed across all annual statistical tests. Fewer tests result in a lower single-test false negative error rate, and therefore an improvement in statistical power. To illustrate this concept, the per-test false positive rate and the corresponding power for semiannual testing at four compliance wells will be calculated, first considering 10 constituents and then 100 constituents. This post aims to correct the misconception that increasing the number of constituents enhances the statistical power of detection monitoring.

March 12, 2025

Test for Stochastic Dominance Using the Wilcoxon Rank Sum Test

The two-sample Wilcoxon Rank Sum (WRS) is often perceived as a median comparison procedure based on the assumption that two populations differ only by a consistent shift, a condition that is infrequently met in practice. Its actual purpose is to determine if one distribution stochastically dominates another. This post seeks to clarify the WRS test’s true function through a simulation involving two samples with the same medians but different distributions. In cases of non-symmetric data, alternative methods such as quantile regression and bootstrapping are recommended, offering nonparametric alternatives that do not rely on rank-based assumptions.

March 7, 2025

Statistical Properties of Autocorrelated Data

In classical statistical analysis, positive autocorrelation leads to an underestimation of the standard error because standard methods assume independence of data. This underestimation results in inflated test statistics, increasing the risk of incorrectly rejecting the null hypothesis. Autocorrelated data implies that each observation is related to nearby values, reducing the degrees of freedom and making the effective sample size smaller than the actual sample size. Monte Carlo simulation is used to explore the effect of autocorrelation on a hypothesis test to determine whether an observed data set is drawn from a population with mean zero.

November 6, 2024

Lognormal Kriging and Bias-Corrected Back-Transformation

Kriging assumes spatial stationarity and does not require a specific distribution for estimated variables. However, non-symmetric distributions, often found in earth sciences, can complicate variogram calculations and lead to over-prediction, especially when high values are present. To address these challenges, data are often transformed using the natural logarithm. A challenge occurs during back-transformation of predictions and variances from the log scale to the original scale, as simple exponentiation is insufficient due to the weighted sums in log-transformed data. This post will explore the mathematical formulations essential for effective back-transformation in lognormal kriging.

August 15, 2024

Predictive Modelling of Traffic Accidents in the U.S.

Motor vehicle accidents are an important part of traffic safety research. Analyzing the factors contributing to accidents and accident severity is critical for enhancing road safety standards. In this post, traffic accident data patterns will be explored and studied using machine-learning analysis techniques.

August 9, 2024