Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain

Shintaro Ozaki1, Yuta Kato2, Siyuan Feng2, Masayo Tomita2, Kazuki Hayashi1,
1Nara Institute of Science and Technology 2The University of Tokyo 3NEC Corporation
Teaser image

The focus of our research is to analyze whether RAG improves the confidence of the model response.

Abstract

Retrieval Augmented Generation (RAG) complements the knowledge of Large Language Models (LLMs) by leveraging external information to enhance response accuracy for queries.
This approach is widely applied in several fields by taking its advantage of injecting the most up-to-date information, and researchers are focusing on understanding and improving this aspect to unlock the full potential of RAG in such high-stakes applications.
However, despite the potential of RAG to address these needs, the mechanisms behind the confidence levels of its outputs remain underexplored, although the confidence of information is very critical in some domains, such as finance, healthcare, and medicine.
Our study focuses the impact of RAG on confidence within the medical domain under various configurations and models.
We evaluate confidence by treating the model's predicted probability as its output and calculating Expected Calibration Error (ECE) and Adaptive Calibration Error (ACE) scores based on the probabilities and accuracy.
In addition, we analyze whether the order of retrieved documents within prompts calibrates the confidence.
Our findings reveal large variation in confidence and accuracy depending on the model, settings, and the format of input prompts.
These results underscore the necessity of optimizing configurations based on the specific model and conditions.

BibTeX

@misc{ozaki2024understandingimpactconfidenceretrieval,
        title={Understanding the Impact of Confidence in Retrieval Augmented Generation: A Case Study in the Medical Domain}, 
        author={Shintaro Ozaki and Yuta Kato and Siyuan Feng and Masayo Tomita and Kazuki Hayashi and Ryoma Obara and Masafumi Oyamada and Katsuhiko Hayashi and Hidetaka Kamigaito and Taro Watanabe},
        year={2024},
        eprint={2412.20309},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2412.20309}, 
  }