|
Title: Missing Information Imputation for Traffic Incident Likelihood Prediction for Urban Expressways
Accession Number: 01626316
Record Type: Component
Abstract: The crash likelihood prediction has been an important issue in traffic safety studies. The problem of missing data or imperfect information may have a negative impact on the accuracy of crash likelihood prediction models. In this paper, a novel approach is proposed that combines the probabilistic principal component analysis (PPCA) for missing data imputation and support vector machines (SVMs) as the crash likelihood prediction model. To avoid the potential overfitting issue, the backward sequential feature selection is conducted to select the optimal combination of explanatory variables. To verify how the PPCA method affects the prediction accuracy of SVMs with 3 kinds of kernels, i.e., linear, Gaussian, and polynomial, the proposed approach is applied to a field study on an urban expressway. The models are trained and tested with 123 crash records and 5-month traffic flow data. The 5-fold cross validation is employed to train the classifier and verify its prediction accuracy under different percentages of missing information. Numerical results show that SVM models with full explanatory variables without feature selection result in similar values as the missing ratio increases from 0 to 40%, in terms of the area under the curve (AUC) of receiving operation characteristic (ROC). It verifies the stable performance of PPCA in missing data imputation for the crash likelihood prediction. On the other hand, SVM models established on the basis of optimal variables, selected by the backward sequential feature selection, show unsatisfactory AUC values with the increase of the missing rate although they provide better prediction performance than SVM models with full variables when the missing ratio is close to zero. It indicates that the selection of explanatory variables and PPCA-based missing data imputation may not be implemented simultaneously when the missing rate of observations is high. The approach of incorporating missing information imputation with crash likelihood prediction is generic and can be applied to other machine learning classification models.
Supplemental Notes: This paper was sponsored by TRB committee ANB20 Standing Committee on Safety Data, Analysis and Evaluation.
Monograph Title: Monograph Accession #: 01618707
Report/Paper Numbers: 17-04420
Language: English
Corporate Authors: Transportation Research Board 500 Fifth Street, NW Authors: Ke, JintaoZhang, ShuaichaoChen, Xiqun MichaelPagination: 21p
Publication Date: 2017
Conference:
Transportation Research Board 96th Annual Meeting
Location:
Washington DC, United States Media Type: Digital/other
Features: Appendices; Figures; Maps; References
(25)
; Tables
Identifier Terms: Uncontrolled Terms: Subject Areas: Highways; Safety and Human Factors
Source Data: Transportation Research Board Annual Meeting 2017 Paper #17-04420
Files: TRIS, TRB, ATRI
Created Date: Dec 8 2016 11:42AM
|