Missing Data Imputation in Time Series of Air Pollution

    loading  Checking for direct PDF access through Ovid


ISEE-0841Background and Objective:Missing data is a frequent problem in epidemiological studies on the effects of air pollution on health. Air quality monitoring stations can present failures and stay off-line for several days. These gaps can distort the exposure assessment since the missing data mechanism is often ignored in the analysis. Analyses based only on the available observations can yield biased estimates of the association as well as overestimate the precision. We propose an imputation procedure for multivariate time series data, e.g. daily concentrations of atmospheric contaminants, based on the EM algorithm. The time component of the series can be modelled by using splines, regression models, or ARIMA models with multiple covariance regime.Methods:A simulation study was carried out in order to evaluate the validity of the proposed method and compare it with those available as default in most software packages for statistical analysis. The accuracy and agreement of the methods were also evaluated and a penalty criterion due to the lost information was proposed in order to account for the imputation uncertainty in the analysis.Results:(i) data analysis using only the complete units tended to underestimate the association between the air contaminants and the health events even when only a small amount of data is missing; (ii) mean and median imputation overestimated the association and estimates show high dispersion and low agreement between when compared to the original values; (iii) multivariate procedures presented better performance and accuracy than the univariate ones; (iv) multivariate methods with temporal adjustment presented higher accuracy and precision. This last approach also presented smaller prediction error and higher agreement between the imputed and the original values.Conclusion:The methods proposed in this work are implemented as an open source library called mtsdi for the statistical software R.

    loading  Loading Related Articles