Regular Expression Learning from Positive Examples Based on Integer Programming

Juntao Gao (Corresponding author), Yingqian Zhang (Corresponding author)

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

2 Citaten (Scopus)


This paper presents a novel method to infer regular expressions from positive examples. The method consists of a candidate’s construction phase and an optimization phase. We first propose multiscaling sample augmentation to capture the cycle patterns from single examples during the candidate’s construction phase. We then use common substrings to build regular expressions that capture patterns across multiple examples, and we show this algorithm is more general than those based on common prefixes or suffixes. Furthermore, we propose a pruning mechanism to improve the efficiency of useful common substring mining, which is an important part of common substring-based expression building algorithm. Finally, in the optimization phase, we model the problem of choosing a set of regular expressions with the lowest cost as an integer linear program, which can be solved to obtain the optimal solution. The experimental results on synthetic and real-life samples demonstrate the effectiveness of our approach in inferring concise and semantically meaningful regular expressions for string datasets.
Originele taal-2Engels
Pagina's (van-tot)1443-1479
Aantal pagina's37
TijdschriftInternational Journal of Software Engineering and Knowledge Engineering
Nummer van het tijdschrift10
StatusGepubliceerd - okt. 2020


Duik in de onderzoeksthema's van 'Regular Expression Learning from Positive Examples Based on Integer Programming'. Samen vormen ze een unieke vingerafdruk.

Citeer dit