FaithEval-counterfactual-v1.0

Name: FaithEval-counterfactual-v1.0
Creator: maas
Published: 2025-12-05 16:46:21
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/Salesforce/FaithEval-counterfactual-v1.0

下载链接

链接失效反馈

官方服务：

资源简介：

# FaithEval FaithEval is a new and comprehensive benchmark dedicated to evaluating contextual faithfulness in LLMs across three diverse tasks: unanswerable, inconsistent, and counterfactual contexts. [Paper] FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows", ICLR 2025, https://arxiv.org/abs/2410.03727 [Code and Detailed Instructions] https://github.com/SalesforceAIResearch/FaithEval ## Disclaimer and Ethical Considerations This release is for research purposes only in support of an academic paper. Our datasets and code are not specifically designed or evaluated for all downstream purposes. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

# FaithEval FaithEval 是一款全新且全面的评测基准，专为评估大语言模型（Large Language Model）的上下文忠实性而打造，涵盖三大类任务：无法回答语境、不一致语境与反事实语境。 [Paper] 《FaithEval：即便"月球由棉花糖构成"，你的语言模型能否忠实于给定语境？》，ICLR 2025，https://arxiv.org/abs/2410.03727 [Code and Detailed Instructions] https://github.com/SalesforceAIResearch/FaithEval ## 免责声明与伦理考量本数据集仅为支撑学术论文研究而发布。我们的数据集与代码并未针对所有下游应用场景进行专门设计或评估。我们鼓励用户正视人工智能的常见局限性，遵守适用法律法规，并在选择应用场景时遵循最佳实践，尤其是在错误或不当使用可能严重影响民众生命、权利或安全的高风险场景中。如需了解应用场景的进一步指导，请参阅我们的AUP与AI AUP。

提供机构：

maas

创建时间：

2025-08-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集