TY - GEN
T1 - Suitability of Optical Character Recognition (OCR) for Multi-domain Model Management
AU - Torres, Weslley
AU - van den Brand, Mark G.J.
AU - Serebrenik, Alexander
PY - 2020/9/30
Y1 - 2020/9/30
N2 - The development of systems following model-driven engineering can include models from different domains. For example, to develop a mechatronic component one might need to combine expertise about mechanics, electronics, and software. Although these models belong to different domains, the changes in one model can affect other models causing inconsistencies in the entire system. There are, however, a limited amount of tools that support management of models from different domains. These models are created using different modeling notations and it is not plausible to use a multitude of parsers geared towards each and every modeling notation. Therefore, to ensure maintenance of multi-domain systems, we need a uniform approach that would be independent from the peculiarities of the notation. Meaning that such a uniform approach can only be based on something which is present in all those models, i.e., text, boxes, and lines. In this study we investigate the suitability of optical character recognition (OCR) as a basis for such a uniformed approach. We select graphical models from various domains that typically combine textual and graphical elements, and we focus on text-recognition without looking for additional shapes. We analyzed the performance of Google Cloud Vision and Microsoft Cognitive Services, two off-the-shelf OCR services. Google Cloud Vision performed better than Microsoft Cognitive Services being able to detect text of 70% of model elements. Errors made by Google Cloud Vision are due to absence of support for text common in engineering formulas, e.g., Greek letters, equations, and subscripts, as well as text typeset on multiple lines. We believe that once these shortcomings are addressed, OCR can become a crucial technology supporting multi-domain model management.
AB - The development of systems following model-driven engineering can include models from different domains. For example, to develop a mechatronic component one might need to combine expertise about mechanics, electronics, and software. Although these models belong to different domains, the changes in one model can affect other models causing inconsistencies in the entire system. There are, however, a limited amount of tools that support management of models from different domains. These models are created using different modeling notations and it is not plausible to use a multitude of parsers geared towards each and every modeling notation. Therefore, to ensure maintenance of multi-domain systems, we need a uniform approach that would be independent from the peculiarities of the notation. Meaning that such a uniform approach can only be based on something which is present in all those models, i.e., text, boxes, and lines. In this study we investigate the suitability of optical character recognition (OCR) as a basis for such a uniformed approach. We select graphical models from various domains that typically combine textual and graphical elements, and we focus on text-recognition without looking for additional shapes. We analyzed the performance of Google Cloud Vision and Microsoft Cognitive Services, two off-the-shelf OCR services. Google Cloud Vision performed better than Microsoft Cognitive Services being able to detect text of 70% of model elements. Errors made by Google Cloud Vision are due to absence of support for text common in engineering formulas, e.g., Greek letters, equations, and subscripts, as well as text typeset on multiple lines. We believe that once these shortcomings are addressed, OCR can become a crucial technology supporting multi-domain model management.
KW - Model management
KW - OCR
KW - Systems engineering
UR - http://www.scopus.com/inward/record.url?scp=85094128794&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-58167-1_11
DO - 10.1007/978-3-030-58167-1_11
M3 - Conference contribution
SN - 9783030581664
VL - 1262
T3 - Communications in Computer and Information Science
SP - 149
EP - 162
BT - Systems Modelling and Management - 1st International Conference, ICSMM 2020, Proceedings
A2 - Babur, Onder
A2 - Denil, Joachim
A2 - Vogel-Heuser, Birgit
PB - Springer
ER -