five

hcaoaf/MoleculeQA

收藏
Hugging Face2024-11-26 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/hcaoaf/MoleculeQA
下载链接
链接失效反馈
官方服务:
资源简介:
MoleculeQA是一个用于评估分子理解中事实准确性的问答数据集,包含62K个问答对,覆盖23K个分子。每个问答对由一个手动编写的问题、一个正确答案选项和三个错误答案选项组成,并且与权威语料库中的分子描述具有一致的语义。该数据集不仅是第一个评估分子事实正确性的基准,也是最大的分子问答数据集。此外,该数据集还用于强化学习,以减少模型幻觉,从而提高生成信息的准确性。

MoleculeQA is a question answering dataset designed to evaluate factual accuracy in molecular comprehension. It contains 62K QA pairs covering 23K molecules. Each QA pair consists of a manually generated question, a correct option, and three incorrect options, all of which have consistent semantics with molecular descriptions from authoritative corpora. MoleculeQA is not only the first benchmark to evaluate molecular factual correctness but also the largest molecular QA dataset to date. A comprehensive evaluation on MoleculeQA for existing molecular language models reveals their deficiencies in specific aspects and identifies crucial factors for molecular modeling. Additionally, MoleculeQA is used in reinforcement learning to mitigate model hallucinations, thereby enhancing the factual correctness of generated information.
提供机构:
hcaoaf
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作