five

FaithEval-counterfactual-v1.0

收藏
魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/FaithEval-counterfactual-v1.0
下载链接
链接失效反馈
官方服务:
资源简介:
# FaithEval FaithEval is a new and comprehensive benchmark dedicated to evaluating contextual faithfulness in LLMs across three diverse tasks: unanswerable, inconsistent, and counterfactual contexts. [Paper] FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows", ICLR 2025, https://arxiv.org/abs/2410.03727 [Code and Detailed Instructions] https://github.com/SalesforceAIResearch/FaithEval ## Disclaimer and Ethical Considerations This release is for research purposes only in support of an academic paper. Our datasets and code are not specifically designed or evaluated for all downstream purposes. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

# FaithEval FaithEval 是一款全新且全面的评测基准,专为评估大语言模型(Large Language Model)的上下文忠实性而打造,涵盖三大类任务:无法回答语境、不一致语境与反事实语境。 [Paper] 《FaithEval:即便"月球由棉花糖构成",你的语言模型能否忠实于给定语境?》,ICLR 2025,https://arxiv.org/abs/2410.03727 [Code and Detailed Instructions] https://github.com/SalesforceAIResearch/FaithEval ## 免责声明与伦理考量 本数据集仅为支撑学术论文研究而发布。我们的数据集与代码并未针对所有下游应用场景进行专门设计或评估。我们鼓励用户正视人工智能的常见局限性,遵守适用法律法规,并在选择应用场景时遵循最佳实践,尤其是在错误或不当使用可能严重影响民众生命、权利或安全的高风险场景中。如需了解应用场景的进一步指导,请参阅我们的AUP与AI AUP。
提供机构:
maas
创建时间:
2025-08-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作