Quality control methods for multivariate data are generally based on using robust estimates of parameters for the multivariate normal (MVN) distribution. However, many multivariate data generating processes do not produce elliptical contours, and in such cases, error detection using the MVN distribution would lead to many legitimate observations being erroneously flagged. In this work, we develop a semi-parametric method for identifying errors in skewed multivariate data that also has a functional component. In the first step, we remove potential outliers by assigning each multivariate observation or function a depth score and remove those observations that fall beyond a given thresh- old. The remaining observations are used to estimate the parameters in a multivariate skew-t (MVST) distribution, and this estimated distribution is used in assigning all observations a probability of having been generated from this MVST. We test the performance of this two-step approach in simulation against a more common MVN method adapted for functional data. When the observations are skewed, our approach has a higher percentage of correctly identified outliers and a lower percentage false positives. Finally, we show how our method can be used in practice with radiosonde launches at a Denver, Colorado station of horizontal and vertical wind components measured at 8 vertical pressure levels.
(2014). A Semi-Parametric Method for Robust Multivariate Error Detection in Skewed Functional Data with Application to Historical Radiosonde Winds [conference presentation - intervento a convegno]. Retrieved from http://hdl.handle.net/10446/31703
A Semi-Parametric Method for Robust Multivariate Error Detection in Skewed Functional Data with Application to Historical Radiosonde Winds
2014-01-01
Abstract
Quality control methods for multivariate data are generally based on using robust estimates of parameters for the multivariate normal (MVN) distribution. However, many multivariate data generating processes do not produce elliptical contours, and in such cases, error detection using the MVN distribution would lead to many legitimate observations being erroneously flagged. In this work, we develop a semi-parametric method for identifying errors in skewed multivariate data that also has a functional component. In the first step, we remove potential outliers by assigning each multivariate observation or function a depth score and remove those observations that fall beyond a given thresh- old. The remaining observations are used to estimate the parameters in a multivariate skew-t (MVST) distribution, and this estimated distribution is used in assigning all observations a probability of having been generated from this MVST. We test the performance of this two-step approach in simulation against a more common MVN method adapted for functional data. When the observations are skewed, our approach has a higher percentage of correctly identified outliers and a lower percentage false positives. Finally, we show how our method can be used in practice with radiosonde launches at a Denver, Colorado station of horizontal and vertical wind components measured at 8 vertical pressure levels.File | Dimensione del file | Formato | |
---|---|---|---|
3188-6566-1-PB.pdf
accesso aperto
Descrizione: publisher's version - versione dell'editore
Dimensione del file
866.55 kB
Formato
Adobe PDF
|
866.55 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo