agentlans/text-stats
收藏Hugging Face2024-12-14 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/agentlans/text-stats
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由三个子数据集组成,分别是文本质量、可读性和情感分析数据集。主要目的是将大量数据集中到一个地方,以便于训练和评估。数据准备和转换部分包括质量分数的归一化和可读性分数的计算。质量分数通过Ordered Quantile归一化方法进行标准化,可读性分数通过Box-Cox方法进行转换。数据集的大小部分提供了训练集和测试集的具体数量。
This dataset is a combination of three sub-datasets: text quality, readability, and sentiment analysis. The main purpose is to collect the large data into one place for easy training and evaluation. The data preparation and transformation section includes the normalization of quality scores and the calculation of readability scores. Quality scores were normalized using Ordered Quantile normalization, and readability scores were transformed using the Box-Cox method. The dataset size section provides the specific numbers for the training and test sets.
提供机构:
agentlans



