Samenvatting
Manual analysis of diagrams and legend sheets in engineering projects is time consuming and needs automation. The lack of standardized legend formats complicates creating a general method for automated information extraction. Existing approaches require training and custom rules for each project. This study proposes a novel solution combining optical character recognition with vision language models and multimodal prompt engineering to automate information extraction from diverse legend sheets without training. It integrates legend information with information extracted from diagrams, unlike studies that only focus on diagrams. Our study shows that VLMs, guided by multimodal prompts, can accurately extract information from diverse legend sheets, enabling automatic information extraction in diagrams across engineering projects. We validate our method through a case study involving the extraction of instruments from piping and instrumentation diagrams (P&IDs) and their legends across three projects with varied formats and standards. The proposed method achieved 100% accuracy in legend classification and information extraction, and 99.68% precision and 95.91% recall in generating instrument listings. The results demonstrate the effectiveness of our approach, significantly enhancing the accuracy and efficiency of information extraction from diagrams. This method can be adapted to different legend formats and diagrams, providing a versatile solution for various industries.
| Originele taal-2 | Engels |
|---|---|
| Artikelnummer | e70072 |
| Aantal pagina's | 19 |
| Tijdschrift | Journal of Software : Evolution and Process |
| Volume | 37 |
| Nummer van het tijdschrift | 12 |
| DOI's | |
| Status | Gepubliceerd - dec. 2025 |
Financiering
This work is supported by McDermott Inc. and Software Center (Gothenburg, Sweden) and conducted in collaboration with Eindhoven University of Technology, the Netherlands. This work is supported by McDermott Inc. and Software Center (Gothenburg, Sweden) and conducted in collaboration with Eindhoven University of Technology, the Netherlands. During the preparation of this work, the authors used the GPT-4o model in order to improve the readability of the text. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Vingerafdruk
Duik in de onderzoeksthema's van 'Enhancing OCR-based Engineering Diagram Analysis by Integrating Diverse External Legends with VLMs'. Samen vormen ze een unieke vingerafdruk.Citeer dit
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver