Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach

Euthymios Drymonas, Kalliopi Zervanou, Euripides G.M. Petrakis

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)
29 Downloads (Pure)

Abstract

In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.

Original languageEnglish
Pages (from-to)315-321
Number of pages7
JournalJournal of Digital Information Management
Volume8
Issue number5
Publication statusPublished - 1 Oct 2010
Externally publishedYes

Keywords

  • Document representation
  • Information Retrieval
  • Medical information retrieval
  • Term extraction
  • Term Similarity

Fingerprint

Dive into the research topics of 'Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach'. Together they form a unique fingerprint.

Cite this