five

MoleculeQA

收藏
arXiv2024-03-13 更新2024-06-21 收录
下载链接:
https://github.com/IDEA-XL/MoleculeQA
下载链接
链接失效反馈
官方服务:
资源简介:
MoleculeQA是一个专为评估分子理解中事实准确性而设计的数据集,由国际数字经济学院(IDEA)创建。该数据集包含62,000个问题回答对,涉及23,000个分子,每个QA对包括一个人工编写的问题、一个正选项和三个负选项,确保与权威分子库中的分子描述语义一致。MoleculeQA不仅是首个分子事实偏差评估基准,也是分子研究中最大的QA数据集。数据集的创建过程包括两个主要阶段:领域分类法构建和分类法引导的QA构建,确保了数据集的全面性、多样性和高质量。MoleculeQA的应用领域广泛,旨在解决分子理解模型中的事实准确性问题,为分子科学研究提供了一个可靠的评估工具。

MoleculeQA is a dataset specifically designed to evaluate factual accuracy in molecular comprehension, developed by the International Digital Economy Academy (IDEA). This dataset contains 62,000 question-answer pairs covering 23,000 distinct molecules. Each QA pair includes a manually written question, one correct option, and three distractor options, with semantic consistency ensured to match molecular descriptions in authoritative molecular databases. MoleculeQA is not only the first benchmark for assessing molecular factual bias, but also the largest QA dataset in the field of molecular research. The dataset's construction process involves two core stages: domain taxonomy development and taxonomy-guided QA generation, which guarantee the dataset's comprehensiveness, diversity, and high quality. MoleculeQA has a wide range of application scenarios, aiming to address the factual accuracy issues in molecular comprehension models and provide a reliable evaluation tool for molecular science research.
提供机构:
国际数字经济学院 (IDEA)
创建时间:
2024-03-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作