Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students

Name: Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students
Creator: Mendeley Data
License: 暂无描述

doi.org2025-01-15 收录

下载链接：

http://doi.org/10.17632/p2wrs7hm4z.5

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS), the number of undergraduate students which marked it difficult (difficult_ug) and the number of postgraduate students which marked it difficult (difficult_pg).The dataset has a total of 5368 unique words. The words marked as difficult by undergraduate students are 680; and those marked as difficult by postgraduate students are 151; all the remaining words, viz., 4537, are easy and hence are not marked as difficult either by undergraduate and postgraduate students. The word against which there is hyphen (-) in difficult_ug column means that this word is not present in the text circulated to undergraduate students. Likewise hyphen(-) in difficult_pg column means words not present in text circulated to postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_english CSV file is the original dataset containing English words, its length, frequency, Parts of speech, number of undergraduate and postgraduate students which marked the particular words as difficult. The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU English.zip file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.

该数据集包含列B中的英文单词。每个单词对应的其他列包含其频率（fre）、长度（len）、词性（PS）、标记为难以理解的本科生数量（difficult_ug）以及标记为难以理解的硕士研究生数量（difficult_pg）。该数据集共计5368个独特单词。被本科生标记为难以理解的单词有680个；被硕士研究生标记为难以理解的单词有151个；剩余的4537个单词均被视为简单，因此既未被本科生也未被视为难以理解。在difficult_ug列中存在连字符（-）的单词表示该单词不在提供给本科生的文本中。同样，difficult_pg列中的连字符（-）表示该单词不在提供给研究生的文本中。数据收集自印度的查谟和克什米尔（一个联邦属地）。该数据集的地理坐标为纬度32.2778° N，经度75.3412° E。附件文件描述如下： dataset_english CSV文件包含原始数据集，其中包含英文单词、其长度、频率、词性以及标记特定单词为难以理解的本科生和研究生的数量。 dataset_numerical CSV文件包含原始数据集，并将字符串字段转换为数值。 English language difficulty level measurement - Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx文件包含提供给学院和大学学生以便在英文文本中强调难以理解单词的调查问卷。 IGNOU English.zip文件包含印度英迪拉·甘地国家开放大学（IGNOU）的本科和研究生英语教科书。上述问卷的文本内容取自这些IGNOU英语教科书。

提供机构：

Mendeley Data

5,000+

优质数据集

54 个

任务类型

进入经典数据集