Blog

Data analytics, statistics, and more

Statistical Properties of Autocorrelated Data

In classical statistical analysis, positive autocorrelation leads to an underestimation of the standard error because standard methods assume independence of data. This underestimation results in inflated test statistics, increasing the risk of incorrectly rejecting the null hypothesis. Autocorrelated data implies that each observation is related to nearby values, reducing the degrees of freedom and making the effective sample size smaller than the actual sample size. Monte Carlo simulation is used to explore the effect of autocorrelation on a hypothesis test to determine whether an observed data set is drawn from a population with mean zero.

November 6, 2024

Lognormal Kriging and Bias-Corrected Back-Transformation

Kriging assumes spatial stationarity and does not require a specific distribution for estimated variables. However, non-symmetric distributions, often found in earth sciences, can complicate variogram calculations and lead to over-prediction, especially when high values are present. To address these challenges, data are often transformed using the natural logarithm. A challenge occurs during back-transformation of predictions and variances from the log scale to the original scale, as simple exponentiation is insufficient due to the weighted sums in log-transformed data. This post will explore the mathematical formulations essential for effective back-transformation in lognormal kriging.

August 15, 2024

Predictive Modelling of Traffic Accidents in the U.S.

Motor vehicle accidents are an important part of traffic safety research. Analyzing the factors contributing to accidents and accident severity is critical for enhancing road safety standards. In this post, traffic accident data patterns will be explored and studied using machine-learning analysis techniques.

August 9, 2024

Generalized Least Squares Regression

In OLS regression, assumptions such as independent and identically distributed errors are important for accurate estimation and inference. Heteroskedasticity, or unequal variances of residuals, can lead to biased estimates and incorrect standard errors. Alternatives to OLS, such as GLS and WLS regression, can be considered when OLS assumptions are violated. GLS is used for dependent errors, while WLS is used for independent but non-identically distributed errors.

April 17, 2024

Weighted Least Squares Regression

Heteroscedasticity in regression analysis refers to varying levels of scatter in the residuals. Its presence affects OLS estimators and standard errors, leading to biased estimates and misleading results. When errors are independent, but not identically distributed, weighted least squares regression can be used to address heteroscedasticity by placing more weight on observations with smaller error variance. This results in smaller standard errors and more precise estimators.

March 19, 2024