tsch00001/vi-nsp-ttr1

Name: tsch00001/vi-nsp-ttr1
Creator: tsch00001
Published: 2025-04-08 07:02:52
License: 暂无描述

Hugging Face2025-04-08 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/tsch00001/vi-nsp-ttr1

下载链接

链接失效反馈

官方服务：

资源简介：

这个数据集包含了一系列的特征，如两个句子（sentence1和sentence2）、标签（label）、类型-标记率（TTR，ttr）、词汇量（vocab）、单词计数（token_count）和平均TTR（mean_ttr）。数据集被分割为训练集，共有10,673,074个示例，大小为8,583,736,670字节。提供了一个默认配置，用于指定训练数据的路径。

The dataset includes features such as two sentences (sentence1 and sentence2), a label (label), Type-Token Ratio (TTR, ttr), vocabulary size (vocab), token count (token_count), and mean TTR (mean_ttr). The dataset is split into a training set with a total of 10,673,074 examples, amounting to 8,583,736,670 bytes in size. A default configuration is provided to specify the path to the training data.

提供机构：

tsch00001

5,000+

优质数据集

54 个

任务类型

进入经典数据集