The central task in graph mining is to find subgraphs, called patterns that occur frequently in either a collection of graphs, or in one large graph. Especially in the singlegraph setting, the notion of frequency, however, is not at all straightforward. For example, the naive solution of taking the number of instances of the pattern as its frequency has the undesirable property that extending a pattern (i.e., making it more restrictive), may increase its frequency. Hence, as pointed out by Vanetik, Gudes and Shimony (2006), a good frequency measure must be anti-monotonic, i.e., the frequency of a super-pattern may not be higher as that of a subpattern. Not only the correctness, but also the efficiency of most existing graph pattern miners relies critically on this property, as it allows for pruning large parts of the search space. An important class of anti-monotonic support measures in the single graph setting is based on the notion of an overlap graph — a graph in which each vertex corresponds to a match of the pattern and two vertices are connected by an edge if the corresponding matches overlap. Vanetik, Gudes and Shimony proved necessary and sufficient conditions for anti-monotonicity in the single, labeled graph setting, in which the vertices of the overlap graph represent subgraphs of the data set isomorphic to the pattern, and the edges represent edge overlap (2006) between the subgraphs. In the context of graph mining, however, not only subgraph isomorphism and labeled graphs are important. On the one hand, the importance of homeomorphic based graph mining increased drastically with the study of biological networks (Bandyopadhyay et al., 2006; Grunewald et al., 2007). On the other hand, in applications where vertices can play several roles (e.g. social networks) homomorphism is more suitable. Homomorphism in the context of data mining has been thoroughly investigated in the field of inductive logic programming (Muggleton & Raedt, 1994). The main contributions of this paper are: (1) We extend the anti-monotonicity results of Vanetik, Gudes and Shimony to all 24 combinations of iso-, homo-, or homeomorphism, on labeled or unlabeled, directed or undirected graphs, with edge- or vertex-overlap. (2) We show that (under reasonable assumptions) the maximum independent set measure (MIS) of Vanetik, Gudes and Shimony (2006) is the smallest anti-monotonic measure in the class of overlap-graph based frequency measures. We also introduce the new minimum clique partition measure (MCP) which represents the largest possible one. (3) In general, both theMIS and theMCP measure are NP-hard in the size of the overlap graph. We introduce the polynomial time computable Lovasz measure, which is is sandwiched between the former two, and show that is anti-monotonic.
|Title of host publication||Informal Proceedings 6th International Workshop on Mining and Learning with Graphs (MLG 2008, Helsinki, Finland, July 4-5, 2008)|
|Place of Publication||Helsinki|
|Publisher||Helsinki University of Technology|
|Publication status||Published - 2008|