Term based semantic clusters for very short text classification

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.

Original languageEnglish
Title of host publicationInternational Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings
EditorsGalia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Irina Temnikova
PublisherIncoma Ltd
Pages878-887
Number of pages10
ISBN (Electronic)9789544520557
DOIs
Publication statusPublished - 2019
Event12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019 - Varna, Bulgaria
Duration: 2 Sep 20194 Sep 2019

Conference

Conference12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019
CountryBulgaria
CityVarna
Period2/09/194/09/19

Fingerprint

Semantics

Keywords

  • text classification
  • term extraction
  • character word embeddings
  • invoice classification

Cite this

Paalman, J., Mullick, S., Zervanou, K., & Zhang, Y. (2019). Term based semantic clusters for very short text classification. In G. Angelova, R. Mitkov, I. Nikolova, I. Temnikova, & I. Temnikova (Eds.), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings (pp. 878-887). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_102
Paalman, Jasper ; Mullick, Shantanu ; Zervanou, Kalliopi ; Zhang, Yingqian. / Term based semantic clusters for very short text classification. International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. editor / Galia Angelova ; Ruslan Mitkov ; Ivelina Nikolova ; Irina Temnikova ; Irina Temnikova. Incoma Ltd, 2019. pp. 878-887
@inproceedings{6a0569e4815b43af9195331835febaad,
title = "Term based semantic clusters for very short text classification",
abstract = "Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.",
keywords = "text classification, term extraction, character word embeddings, invoice classification",
author = "Jasper Paalman and Shantanu Mullick and Kalliopi Zervanou and Yingqian Zhang",
year = "2019",
doi = "10.26615/978-954-452-056-4_102",
language = "English",
pages = "878--887",
editor = "Galia Angelova and Ruslan Mitkov and Ivelina Nikolova and Irina Temnikova and Irina Temnikova",
booktitle = "International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings",
publisher = "Incoma Ltd",

}

Paalman, J, Mullick, S, Zervanou, K & Zhang, Y 2019, Term based semantic clusters for very short text classification. in G Angelova, R Mitkov, I Nikolova, I Temnikova & I Temnikova (eds), International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Incoma Ltd, pp. 878-887, 12th International Conference on Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2/09/19. https://doi.org/10.26615/978-954-452-056-4_102

Term based semantic clusters for very short text classification. / Paalman, Jasper; Mullick, Shantanu; Zervanou, Kalliopi; Zhang, Yingqian.

International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. ed. / Galia Angelova; Ruslan Mitkov; Ivelina Nikolova; Irina Temnikova; Irina Temnikova. Incoma Ltd, 2019. p. 878-887.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Term based semantic clusters for very short text classification

AU - Paalman, Jasper

AU - Mullick, Shantanu

AU - Zervanou, Kalliopi

AU - Zhang, Yingqian

PY - 2019

Y1 - 2019

N2 - Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.

AB - Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.

KW - text classification

KW - term extraction

KW - character word embeddings

KW - invoice classification

UR - http://www.scopus.com/inward/record.url?scp=85076470251&partnerID=8YFLogxK

U2 - 10.26615/978-954-452-056-4_102

DO - 10.26615/978-954-452-056-4_102

M3 - Conference contribution

AN - SCOPUS:85076470251

SP - 878

EP - 887

BT - International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings

A2 - Angelova, Galia

A2 - Mitkov, Ruslan

A2 - Nikolova, Ivelina

A2 - Temnikova, Irina

A2 - Temnikova, Irina

PB - Incoma Ltd

ER -

Paalman J, Mullick S, Zervanou K, Zhang Y. Term based semantic clusters for very short text classification. In Angelova G, Mitkov R, Nikolova I, Temnikova I, Temnikova I, editors, International Conference on Recent Advances in Natural Language Processing in a Deep Learning World, RANLP 2019 - Proceedings. Incoma Ltd. 2019. p. 878-887 https://doi.org/10.26615/978-954-452-056-4_102