sofia-che/segmented_dataset

Name: sofia-che/segmented_dataset
Creator: sofia-che
Published: 2025-02-12 02:49:17
License: 暂无描述

Hugging Face2025-02-12 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/sofia-che/segmented_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含按句子、单词和双词组合（bigrams）进行分割的文本。统计分析显示，数据集的词汇多样性较低，静态系数高，即名词形式的使用多于动词形式，句子的长度分布不均匀。

The dataset contains text segmented by sentences, words, and bigrams. Statistical analysis shows that the dataset has low lexical diversity, a high level of staticity with nouns prevailing over verbs, and uneven sentence length distribution.

提供机构：

sofia-che

5,000+

优质数据集

54 个

任务类型

进入经典数据集