five

Dataset for Word Difficulty Prediction

收藏
IEEE2020-10-04 更新2026-04-17 收录
下载链接:
https://ieee-dataport.org/open-access/dataset-word-difficulty-prediction
下载链接
链接失效反馈
官方服务:
资源简介:
Most text-simplification systems require an indicator of the complexity of the words. The prevalent approaches to word difficulty prediction are based on manual feature engineering. Using deep learning based models are largely left unexplored due to their comparatively poor performance. We have explored the use of one of such in predicting the difficulty of words. We have treated the problem as a binary classification problem. We have trained traditional machine learning models and evaluated their performance on the task. Removing dependency on frequency of previously acquired words for measuring difficulty was one of our primary aims. Then we analyzed a convolutional neural network based prediction model which operates at the character level and evaluate its efficiency compared to others.This dataset contains 40481 data instances. The various column headers are as follows:WordLengthFreq_HALLog_Freq_HALI_Mean_RTI_ZscoreI_SDObsI_Mean_Accuracy The other details of the dataset and the method to obtain the difficulty labels are present in the research publication whose link is attached.For getting open-access to the publication visit https://garain.codesPlease cite both the dataset and the conference paper if the dataset comes to any use.
提供机构:
Naskar, Sudip Kumar; Garain, Avishek; Basu, Arpan
创建时间:
2020-10-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作