TRB Pubsindex
Text Size:

Title:

Missing Information Imputation for Traffic Incident Likelihood Prediction for Urban Expressways

Accession Number:

01626316

Record Type:

Component

Abstract:

The crash likelihood prediction has been an important issue in traffic safety studies. The problem of missing data or imperfect information may have a negative impact on the accuracy of crash likelihood prediction models. In this paper, a novel approach is proposed that combines the probabilistic principal component analysis (PPCA) for missing data imputation and support vector machines (SVMs) as the crash likelihood prediction model. To avoid the potential overfitting issue, the backward sequential feature selection is conducted to select the optimal combination of explanatory variables. To verify how the PPCA method affects the prediction accuracy of SVMs with 3 kinds of kernels, i.e., linear, Gaussian, and polynomial, the proposed approach is applied to a field study on an urban expressway. The models are trained and tested with 123 crash records and 5-month traffic flow data. The 5-fold cross validation is employed to train the classifier and verify its prediction accuracy under different percentages of missing information. Numerical results show that SVM models with full explanatory variables without feature selection result in similar values as the missing ratio increases from 0 to 40%, in terms of the area under the curve (AUC) of receiving operation characteristic (ROC). It verifies the stable performance of PPCA in missing data imputation for the crash likelihood prediction. On the other hand, SVM models established on the basis of optimal variables, selected by the backward sequential feature selection, show unsatisfactory AUC values with the increase of the missing rate although they provide better prediction performance than SVM models with full variables when the missing ratio is close to zero. It indicates that the selection of explanatory variables and PPCA-based missing data imputation may not be implemented simultaneously when the missing rate of observations is high. The approach of incorporating missing information imputation with crash likelihood prediction is generic and can be applied to other machine learning classification models.

Supplemental Notes:

This paper was sponsored by TRB committee ANB20 Standing Committee on Safety Data, Analysis and Evaluation.

Monograph Accession #:

01618707

Report/Paper Numbers:

17-04420

Language:

English

Corporate Authors:

Transportation Research Board

500 Fifth Street, NW
Washington, DC 20001 United States

Authors:

Ke, Jintao
Zhang, Shuaichao
Chen, Xiqun Michael

Pagination:

21p

Publication Date:

2017

Conference:

Transportation Research Board 96th Annual Meeting

Location: Washington DC, United States
Date: 2017-1-8 to 2017-1-12
Sponsors: Transportation Research Board

Media Type:

Digital/other

Features:

Appendices; Figures; Maps; References (25) ; Tables

Identifier Terms:

Uncontrolled Terms:

Subject Areas:

Highways; Safety and Human Factors

Source Data:

Transportation Research Board Annual Meeting 2017 Paper #17-04420

Files:

TRIS, TRB, ATRI

Created Date:

Dec 8 2016 11:42AM