TY - JOUR
T1 - Exploiting multi-word similarity for retrieval in medical document collections
T2 - The TSRM approach
AU - Drymonas, Euthymios
AU - Zervanou, Kalliopi
AU - Petrakis, Euripides G.M.
PY - 2010/10/1
Y1 - 2010/10/1
N2 - In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.
AB - In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.
KW - Document representation
KW - Information Retrieval
KW - Medical information retrieval
KW - Term extraction
KW - Term Similarity
UR - http://www.scopus.com/inward/record.url?scp=79960229245&partnerID=8YFLogxK
M3 - Article
VL - 8
SP - 315
EP - 321
JO - Journal of Digital Information Management
JF - Journal of Digital Information Management
SN - 0972-7272
IS - 5
ER -