On the reusability of open test collections

S.H. Hashemi, C.L.A. Clarke, A. Dean-Hall, J. Kamps, Y. Kiseleva

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    3 Citations (Scopus)

    Abstract

    Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any web page as a result for a personalized venue recommendation task. This prompts the question on the reusability of the resulting test collection: How does the open nature affect the pooling process? Can participants reliably evaluate variant runs with the resulting qrels? Can other teams evaluate new runs reliably? In short, does the set of pooled and judged documents effectively produce a post hoc test collection? Our main findings are the following: First, while there is a strongly significant rank correlation, the effect of pooling is notable and results in underestimation of performance, implying the evaluation of non-pooled systems should be done with great care. Second, we extensively analyze impacts of open corpus on the fraction of judged documents, explaining how low recall affects the reusability, and how the personalization and low pooling depth aggravate that problem. Third, we outline a potential solution by deriving a fixed corpus from open web submissions.
    Original languageEnglish
    Title of host publication38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'15, Santiago, Chile, August 9-13, 2015)
    Place of PublicationNew York
    PublisherAssociation for Computing Machinery, Inc
    Pages827-830
    ISBN (Print)978-1-4503-3621-5
    DOIs
    Publication statusPublished - 2015

    Fingerprint

    Dive into the research topics of 'On the reusability of open test collections'. Together they form a unique fingerprint.

    Cite this