amuvarma/contentonly-train-1m-1dups
收藏Hugging Face2024-11-26 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/amuvarma/contentonly-train-1m-1dups
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个主要特征:transcript和facodec_1。transcript是字符串类型,用于存储文本转录,而facodec_1是int64类型的序列,可能用于存储某种编码或特征表示。数据集分为一个训练集,包含1,000,000个样本,总大小为7,420,042,712字节。下载大小为1,278,453,133字节。数据集的配置文件指定了默认配置,数据文件路径为train-*。
The dataset contains two main features: transcript and facodec_1. transcript is of string type, used for storing text transcripts, while facodec_1 is a sequence of int64, possibly used for storing some kind of encoding or feature representation. The dataset is divided into a training set containing 1,000,000 samples, with a total size of 7,420,042,712 bytes. The download size is 1,278,453,133 bytes. The datasets configuration file specifies the default configuration, with data file paths as train-*.
提供机构:
amuvarma



