Triple-based Factual Correctness Evaluation of AI-Generated Summaries

Research output: Contribution to journalConference articlepeer-review

5 Downloads (Pure)

Abstract

Existing factual correctness evaluation methods typically lack an explicit representation of facts, undermining their explainability. A promising approach to represent a fact is to use structured representations, e.g. triples. In the case of using triple representation, the factual correctness can be assessed by measuring triple overlap between the summary and the document. However, existing triple-based methods underperform in terms of correlation with human judgment compared to non-explainable approaches. To address this, we introduce a triple-based approach that generates multiple triple representations for both the summary and the document and measures the triple overlap between them. We introduce an averaging heuristic for combining multiple triple sets; yet experiments on the FRANK dataset reveal that selecting the single best representation almost always matches human judgments and yields higher correlations than state-of-the-art metrics, while the averaging variant still performs below those metrics.
Original languageEnglish
Pages (from-to)226-231
Number of pages6
JournalIFAC-PapersOnLine
Volume59
Issue number27
DOIs
Publication statusPublished - 2025
Event7th IFAC Symposium on Telematics Applications, TA 2025 - Padova, Italy
Duration: 15 Sept 202518 Sept 2025

Funding

This work was supported by the AIMS5.0 project under grant agreement no. 101112089.

Keywords

  • Large Language Models (LLMs)
  • Knowledge Graph
  • Structured Fact Representation
  • Summarization
  • Factual Correctness Evaluation

Fingerprint

Dive into the research topics of 'Triple-based Factual Correctness Evaluation of AI-Generated Summaries'. Together they form a unique fingerprint.

Cite this