Term based semantic clusters for very short text classification

Research output: Contribution to conferenceAbstract

7 Downloads (Pure)

Abstract

Very short texts, such as tweets and invoices, present challenges in classification. Such texts abound in ellipsis, grammatical errors, misspellings, and semantic variation. Although term occurrences are strong indicators of content, in very short texts, sparsity makes it difficult to capture enough content for a semantic classifier A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach for the classification of short invoice descriptions, in such a way that each class reflects a different group of products or services. The developed algorithm is called Term Based Semantic Clusters (TBSeC).

Original languageEnglish
Number of pages12
Publication statusPublished - 8 Nov 2019
Event31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019 - Brussels, Belgium
Duration: 6 Nov 20198 Nov 2019

Conference

Conference31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019
CountryBelgium
CityBrussels
Period6/11/198/11/19

Fingerprint

Semantics
Classifiers

Cite this

Paalman, J., Mullick, S., Zervanou, K., & Zhang, Y. (2019). Term based semantic clusters for very short text classification. Abstract from 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, Belgium.
Paalman, Jasper ; Mullick, Shantanu ; Zervanou, Kalliopi ; Zhang, Yingqian. / Term based semantic clusters for very short text classification. Abstract from 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, Belgium.12 p.
@conference{6ce690815e7a4d26b266a90b9ac95b58,
title = "Term based semantic clusters for very short text classification",
abstract = "Very short texts, such as tweets and invoices, present challenges in classification. Such texts abound in ellipsis, grammatical errors, misspellings, and semantic variation. Although term occurrences are strong indicators of content, in very short texts, sparsity makes it difficult to capture enough content for a semantic classifier A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach for the classification of short invoice descriptions, in such a way that each class reflects a different group of products or services. The developed algorithm is called Term Based Semantic Clusters (TBSeC).",
author = "Jasper Paalman and Shantanu Mullick and Kalliopi Zervanou and Yingqian Zhang",
year = "2019",
month = "11",
day = "8",
language = "English",
note = "31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019 ; Conference date: 06-11-2019 Through 08-11-2019",

}

Paalman, J, Mullick, S, Zervanou, K & Zhang, Y 2019, 'Term based semantic clusters for very short text classification', 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, Belgium, 6/11/19 - 8/11/19.

Term based semantic clusters for very short text classification. / Paalman, Jasper; Mullick, Shantanu; Zervanou, Kalliopi; Zhang, Yingqian.

2019. Abstract from 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, Belgium.

Research output: Contribution to conferenceAbstract

TY - CONF

T1 - Term based semantic clusters for very short text classification

AU - Paalman, Jasper

AU - Mullick, Shantanu

AU - Zervanou, Kalliopi

AU - Zhang, Yingqian

PY - 2019/11/8

Y1 - 2019/11/8

N2 - Very short texts, such as tweets and invoices, present challenges in classification. Such texts abound in ellipsis, grammatical errors, misspellings, and semantic variation. Although term occurrences are strong indicators of content, in very short texts, sparsity makes it difficult to capture enough content for a semantic classifier A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach for the classification of short invoice descriptions, in such a way that each class reflects a different group of products or services. The developed algorithm is called Term Based Semantic Clusters (TBSeC).

AB - Very short texts, such as tweets and invoices, present challenges in classification. Such texts abound in ellipsis, grammatical errors, misspellings, and semantic variation. Although term occurrences are strong indicators of content, in very short texts, sparsity makes it difficult to capture enough content for a semantic classifier A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach for the classification of short invoice descriptions, in such a way that each class reflects a different group of products or services. The developed algorithm is called Term Based Semantic Clusters (TBSeC).

UR - http://www.scopus.com/inward/record.url?scp=85075078856&partnerID=8YFLogxK

M3 - Abstract

AN - SCOPUS:85075078856

ER -

Paalman J, Mullick S, Zervanou K, Zhang Y. Term based semantic clusters for very short text classification. 2019. Abstract from 31st Benelux Conference on Artificial Intelligence and the 28th Belgian Dutch Conference on Machine Learning, BNAIC/BENELEARN 2019, Brussels, Belgium.