Abstract
Existing factual correctness evaluation methods typically lack an explicit representation of facts, undermining their explainability. A promising approach to represent a fact is to use structured representations, e.g. triples. In the case of using triple representation, the factual correctness can be assessed by measuring triple overlap between the summary and the document. However, existing triple-based methods underperform in terms of correlation with human judgment compared to non-explainable approaches. To address this, we introduce a triple-based approach that generates multiple triple representations for both the summary and the document and measures the triple overlap between them. We introduce an averaging heuristic for combining multiple triple sets; yet experiments on the FRANK dataset reveal that selecting the single best representation almost always matches human judgments and yields higher correlations than state-of-the-art metrics, while the averaging variant still performs below those metrics.
| Original language | English |
|---|---|
| Pages (from-to) | 226-231 |
| Number of pages | 6 |
| Journal | IFAC-PapersOnLine |
| Volume | 59 |
| Issue number | 27 |
| DOIs | |
| Publication status | Published - 2025 |
| Event | 7th IFAC Symposium on Telematics Applications, TA 2025 - Padova, Italy Duration: 15 Sept 2025 → 18 Sept 2025 |
Funding
This work was supported by the AIMS5.0 project under grant agreement no. 101112089.
Keywords
- Large Language Models (LLMs)
- Knowledge Graph
- Structured Fact Representation
- Summarization
- Factual Correctness Evaluation