Multi-Label Physics K-12 Dataset

Mendeley Data2020-03-29 更新2026-04-09 收录

下载链接：

https://data.mendeley.com/datasets/tb6bdmgtv3

下载链接

链接失效反馈

官方服务：

资源简介：

The "Multilabel Physics K-12 dataset” is a novel dataset that we created to help expand the research done in the domain of educational question-answering. The dataset consists of grade 6 to grade 12 physics concepts, from the CBSE curriculum (federally supported in India). It includes topics from every chapter present in the widely followed NCERT textbooks, for the grades mentioned above. To ensure that the dataset is exhaustive enough across concepts, we collected online notes for the different chapters from various publicly available educational websites (The file Sources.png contains the list of all the websites we used to scrape our data) focussed on grade 6 - 12 physics pedagogy. All the chapter notes scraped by us are divided into paragraphs. Our goal was to aptly capture the inherent context stated within each paragraph. To do this, we identified nine distinct label types which are required to follow three key conditions: * The labels should span the space of contextual information as widely as possible. For example, if a paragraph consists of a definition followed by a reasoning based statement, the labels should be robust enough to capture both definition as well as reasoning entirely * The labels should be mutually exclusive of each other. This essentially means that every label should represent unique thematic information about a paragraph. To ensure that this condition is not violated, we enforce that none of the labels have overlapping contexts. For example, consider a paragraph that contains a formula, an example and then a property, there have to be three unique labels describing the paragraph, namely “Formula”, “Example” and “Property”. * The label types should be decided keeping in mind the content in the NCERT textbooks. To concur with the conditions mentioned above, we come up with the following label types: ["Definition", "Causes", "Examples", "Reasoning", "Property", "Types", "Formula", "Effects", "Relation"]. Therefore each paragraph in the dataset will have a minimum of one and a maximum of nine labels. There are 8 CSV files in the following format: ("paragraph", "label_1, label_2, … , label_n"). “Grade_i.csv” contains the data of the ith grade. “Multi-Label Physics K-12 Dataset_Training.csv“ contains training data which can be used to reproduce our results. There are 4812 data samples in total. The number of data samples per file is: Grade 6 - 180, Grade 7 - 954, Grade 8 - 741, Grade 9 - 439, Grade 10 - 546, Grade 11 - 954, Grade 12 - 998, Training - 4209. This dataset has been created for academic research purposes so that it can be helpful in research towards building AI based tools to support learning of students.

创建时间：

2020-03-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集