Research article

Automating crash diagram generation using vision-language models: a case study on multilane roundabouts

  • Published: 02 March 2026
  • Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variability. This study investigates the use of Vision-Language Models (VLMs) to automate crash diagram generation from police crash reports, focusing on multilane roundabouts as a challenging test case. A three-part structured prompt framework was developed to guide model reasoning through interpretation, extraction, and visual synthesis, while a 10-metric evaluation system was designed to assess diagram quality in terms of semantic accuracy, spatial fidelity, and visual clarity. Three popular models, including GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports. GPT-4o achieved the highest average performance (6.29 out of 10), followed by Gemini-1.5-Flash (5.28) and Janus-4o (3.64). The analysis revealed GPT-4o's superior spatial reasoning and alignment between extracted and visualized crash data. These results highlight both the promise and current limitations of VLMs in engineering visualization tasks. The study lays the groundwork for integrating generative AI into crash analysis workflows to improve efficiency, consistency, and interpretability.

    Citation: Xiao Lu, Hao Zhen, Jidong J. Yang. Automating crash diagram generation using vision-language models: a case study on multilane roundabouts[J]. Applied Computing and Intelligence, 2026, 6(1): 38-57. doi: 10.3934/aci.2026003

    Related Papers:

  • Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variability. This study investigates the use of Vision-Language Models (VLMs) to automate crash diagram generation from police crash reports, focusing on multilane roundabouts as a challenging test case. A three-part structured prompt framework was developed to guide model reasoning through interpretation, extraction, and visual synthesis, while a 10-metric evaluation system was designed to assess diagram quality in terms of semantic accuracy, spatial fidelity, and visual clarity. Three popular models, including GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports. GPT-4o achieved the highest average performance (6.29 out of 10), followed by Gemini-1.5-Flash (5.28) and Janus-4o (3.64). The analysis revealed GPT-4o's superior spatial reasoning and alignment between extracted and visualized crash data. These results highlight both the promise and current limitations of VLMs in engineering visualization tasks. The study lays the groundwork for integrating generative AI into crash analysis workflows to improve efficiency, consistency, and interpretability.



    加载中


    [1] D. Fernandez, P. MohajerAnsari, A. Salarpour, M. Pesé, Avoiding the crash: a vision-language model evaluation of critical traffic scenarios, SAE Int. J. Adv. Curr. Prac. in Mobility, 7 (2025), 2255–2266. https://doi.org/10.4271/2025-01-8213 doi: 10.4271/2025-01-8213
    [2] S. Jaradat, N. Acharya, S. Shivshankar, T. Alhadidi, M. Elhenawy, AI for data quality auditing: detecting mislabeled work zone crashes using large language models, Algorithms, 18 (2025), 317. https://doi.org/10.3390/a18060317 doi: 10.3390/a18060317
    [3] UC Berkeley SafeTREC, Transportation injury mapping system (tims), UC Regents, 2025. Available from: https://tims.berkeley.edu.
    [4] PdMagic, Crash magic online, Pd' Programming, Inc., 2025. Available from: https://www.pdmagic.com.
    [5] AASHTOWare, Aashtoware safety intersection, American Association of State Highway and Transportation Officials, 2025. Available from: https://www.aashtoware.org/products/safety/aashtoware-safety-intersection.
    [6] H. Zhen, J. Yang, Tab-text: bridging tabular data and natural language for enhanced traffic safety analysis and modeling, Expert Syst. Appl., 290 (2025), 128450. https://doi.org/10.1016/j.eswa.2025.128450 doi: 10.1016/j.eswa.2025.128450
    [7] H. Zhen, Y. Shi, Y. Huang, J. Yang, N. Liu, Leveraging large language models with chain-of-thought and prompt engineering for traffic crash severity analysis and inference, Computers, 13 (2024), 232. https://doi.org/10.3390/computers13090232 doi: 10.3390/computers13090232
    [8] H. Zhen, J. Yang, Crashsage: a large language model-centered framework for contextual and interpretable traffic crash analysis, Artificial Intelligence for Transportation, 3–4 (2025), 100030. https://doi.org/10.1016/j.ait.2025.100030 doi: 10.1016/j.ait.2025.100030
    [9] S. Akter, I. Shihab, A. Sharma, Large language models for crash detection in video: a survey of methods, datasets, and challenges, arXiv: 2507.02074. https://doi.org/10.48550/arXiv.2507.02074
    [10] X. Cao, T. Zhou, Y. Ma, W. Ye, C. Cui, K. Tang, et al., Maplm: a real-world large-scale vision-language benchmark for map and traffic scene understanding, Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, 21819–21830. https://doi.org/10.1109/CVPR52733.2024.02061
    [11] H. Ding, Y. Du, Z. Xia, Urban road anomaly monitoring using vision-language models for enhanced safety management, Appl. Sci., 15 (2025), 2517. https://doi.org/10.3390/app15052517 doi: 10.3390/app15052517
    [12] OpenAI, A. Hurst, A. Lerer, A. Goucher, A. Perelman, A. Ramesh, et al., Gpt-4o system card, arXiv: 2410.21276. https://doi.org/10.48550/arXiv.2410.21276
    [13] R. Anil, S. Borgeaud, J. Alayrac, J. Yu, R. Soricut, J. Schalkwyk et al., Gemini: a family of highly capable multimodal models, arXiv: 2312.11805. https://doi.org/10.48550/arXiv.2312.11805
    [14] J. Chen, Z. Cai, P. Chen, S. Chen, K. Ji, X. Wang, et al., Sharegpt-4o-image: aligning multimodal models with gpt-4o-level image generation, arXiv: 2506.18095. https://doi.org/10.48550/arXiv.2506.18095
    [15] A. Medina, J. Bansen, B. Williams, A. Pochowski, L. Rodegerdts, J. Markosian, et al., Reasons for drivers failing to yield at multi-lane roundabout exits: transportation pooled fund study final report, Technical report: FHWA-HRT-23-023.
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(156) PDF downloads(6) Cited by(0)

Article outline

Figures and Tables

Figures(5)  /  Tables(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog