A Natural Language Processing method for the identification of the factors influencing road accident severity
Although road safety has improved in the last decades, the rate of accidents with severe and fatal consequences is still exceeding the safety objectives (European Commission 2019; World Health Organization 2018).
This work explores the possibility of using Natural Language Processing (NLP) techniques for the automatic extraction of knowledge from road accidents reports, with the objective of supporting the safety management of the road infrastructure system (Persia et al. 2016).
To this aim, we consider databases of textual reports on road accidents, provided by the local public authorities. These reports contain the descriptions of the accidents and the results of the post-accident investigations. The aim is to analyze the reports by NLP to extract the features that most influence the accidents, for informing road safety management.
For the analysis of the reports, we develop a method that combines Hierarchical Dirichlet Processes (HDPs) (Teh et al. 2006), Artificial Neural Networks (ANNs) and a feature selection technique based on the Sequential Forward Selection (SFS) strategy (Marcano-Cedeño et al. 2010). HDPs allow representing each report as a mixture of topics, i.e. distributions of words co-occurring in the reports. In practice, each report is transformed into a vector whose elements are the degrees of membership to each topic, i.e. a measure of the contribution of each topic to the description of the report. ANNs are then used to classify the reports, represented by the extracted vectors, into classes characterizing the severity of the accident consequences. Finally, the SFS technique is used for identifying those topics which most influence the reports classification. In this way, the factors causing the accidents and influencing its evolution are automatically extracted. The developed method is validated considering a database of real accident reports.
European Commission. 2019. “EU Road Safety Policy Framework 2021-2030 - Next Steps towards ‘Vision Zero.’” Brussels,19.6.2019 SWD(2019) 283 final.
Marcano-Cedeño, A, J Quintanilla-Domínguez, M G Cortina-Januchs, and D Andina. 2010. “Feature Selection Using Sequential Forward Selection and Classification Applying Artificial Metaplasticity Neural Network.” In IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society, Glendale, AZ, 2845–50. https://doi.org/10.1109/IECON.2010.5675075.
Persia, Luca, Davide Shingo, Flavia De Simone, Véronique Feypell, De La Beaumelle, George Yannis, Alexandra Laiou, et al. 2016. “Management of Road Infrastructure Safety.” Transportation Research Procedia 14: 3436–45. https://doi.org/10.1016/j.trpro.2016.05.303.
Teh, Yee Whye, Michael I Jordan, Matthew J Beal, and David M Blei. 2006. “Hierarchical Dirichlet Processes.” Journal of the American Statistical Association, 1566–81. https://doi.org/10.1198/016214506000000302.
World Health Organization. 2018. “Global Status Report on Road Safety.” Geneva: World Health Organization.
How to Cite
Road Safety, Natural Language Processing, Feature selection
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
The Prognostic and Health Management Society advocates open-access to scientific data and uses a Creative Commons license for publishing and distributing any papers. A Creative Commons license does not relinquish the author’s copyright; rather it allows them to share some of their rights with any member of the public under certain conditions whilst enjoying full legal protection. By submitting an article to the International Conference of the Prognostics and Health Management Society, the authors agree to be bound by the associated terms and conditions including the following:
As the author, you retain the copyright to your Work. By submitting your Work, you are granting anybody the right to copy, distribute and transmit your Work and to adapt your Work with proper attribution under the terms of the Creative Commons Attribution 3.0 United States license. You assign rights to the Prognostics and Health Management Society to publish and disseminate your Work through electronic and print media if it is accepted for publication. A license note citing the Creative Commons Attribution 3.0 United States License as shown below needs to be placed in the footnote on the first page of the article.
First Author et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.