Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach

Euthymios Drymonas, Kalliopi Zervanou, Euripides G.M. Petrakis

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)

Abstract

In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.

Original languageEnglish
Pages (from-to)315-321
Number of pages7
JournalJournal of Digital Information Management
Volume8
Issue number5
Publication statusPublished - 1 Oct 2010
Externally publishedYes

Fingerprint

Information retrieval
information retrieval
Thesauri
Ontology
Semantics
thesaurus
ontology
semantics
evaluation
resources
knowledge

Keywords

  • Document representation
  • Information Retrieval
  • Medical information retrieval
  • Term extraction
  • Term Similarity

Cite this

@article{0daecd7ccbe9434784aa257f35257258,
title = "Exploiting multi-word similarity for retrieval in medical document collections: The TSRM approach",
abstract = "In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.",
keywords = "Document representation, Information Retrieval, Medical information retrieval, Term extraction, Term Similarity",
author = "Euthymios Drymonas and Kalliopi Zervanou and Petrakis, {Euripides G.M.}",
year = "2010",
month = "10",
day = "1",
language = "English",
volume = "8",
pages = "315--321",
journal = "Journal of Digital Information Management",
issn = "0972-7272",
publisher = "Digital Information Research Foundation",
number = "5",

}

Exploiting multi-word similarity for retrieval in medical document collections : The TSRM approach. / Drymonas, Euthymios; Zervanou, Kalliopi; Petrakis, Euripides G.M.

In: Journal of Digital Information Management, Vol. 8, No. 5, 01.10.2010, p. 315-321.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Exploiting multi-word similarity for retrieval in medical document collections

T2 - The TSRM approach

AU - Drymonas, Euthymios

AU - Zervanou, Kalliopi

AU - Petrakis, Euripides G.M.

PY - 2010/10/1

Y1 - 2010/10/1

N2 - In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.

AB - In this paper, we investigate on potential improvements to Information Retrieval (IR) models related to document representation and conceptual, topic retrieval, in medical document collections. We propose the TSRM1 (Term Similarity and Retrieval Model) approach, where document representations are based on multi-word domain terms, rather than mere single key-words, typically applied in traditional IR. The proposed representation is semantically compact and more efficient, being reduced to a limited number of meaningful multi-word terms (phrases), rather than large vectors of single-words, part of which may be void of distinctive content semantics. In computing document similarity, contrary to other state-of-theart methods examined in this work, TSRM adopts a knowledge poor solution, namely an approach which does not require any existing knowledge resources, such as ontologies, or thesauri. The evaluation of TSRM is based on OHSUMED, a standard TREC collection of medical documents and illustrated the effciency of TSRM over other well established general purpose IR models.

KW - Document representation

KW - Information Retrieval

KW - Medical information retrieval

KW - Term extraction

KW - Term Similarity

UR - http://www.scopus.com/inward/record.url?scp=79960229245&partnerID=8YFLogxK

M3 - Article

VL - 8

SP - 315

EP - 321

JO - Journal of Digital Information Management

JF - Journal of Digital Information Management

SN - 0972-7272

IS - 5

ER -