five

Dataset Automatic scoring on Mole calculation

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/z2nsknmksd
下载链接
链接失效反馈
官方服务:
资源简介:
The training dataset designed for understanding and solving chemical calculations, specifically calculating the mass of compounds given the number of moles and molar mass, is a foundational resource for training NLP models in the deep learning domain. This dataset aims to equip models with the ability to accurately interpret and execute chemical calculations presented in textual form. Structured to support the training of advanced NLP models like BERT, GPT, or other transformer-based models, it comprises a series of calculation questions, numerical data for moles and molar mass, and the expected answers in units of mass. Each entry is annotated with additional information such as the calculation category, the formula used, and step-by-step explanations to facilitate model understanding. Presented in the JSON Lines (jsonl) format, this structured approach enables efficient batch processing and individual item analysis, making it an invaluable tool for developing NLP applications capable of performing quantitative chemical problem-solving. The application of this dataset extends beyond mere calculation to include natural language understanding within the context of chemistry, extracting numerical and contextual information, and generating human-comprehensible textual answers. Post-training, models are evaluated against a separate test dataset to ensure their capability to comprehend questions, extract relevant data accurately, and produce precise numerical answers. Evaluation metrics such as accuracy, precision, and recall in question understanding, along with the numerical accuracy of answers, demonstrate the model's performance. This dataset not only facilitates the research and development of NLP models that apply chemical knowledge to solve quantitative problems but also significantly advances AI's role in chemical education and research.
创建时间:
2024-03-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作