BroDeadlines/TEST.PART_SUMMERIZE.raptor.edu_tdt_data
收藏Hugging Face2024-06-24 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/BroDeadlines/TEST.PART_SUMMERIZE.raptor.edu_tdt_data
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征字段,如summaries(摘要)、level(级别)、cluster(聚类)、doc_ids(文档ID)等,数据类型包括字符串、整型和浮点型。数据集被分割为多个部分,如TEST.basic_tdt_raptor、TEST.medium_tdt_raptor等,每个分割包含不同数量的字节和示例。数据集的总下载大小为6746455字节,总大小为156527446字节。配置文件中列出了每个分割对应的数据文件路径。此外,还提供了一个名为TEST.basic_tdt_raptor的集合的JSON描述,其中包含vec_idx和text_idx两个字段。
The dataset contains multiple feature fields such as summaries, level, cluster, doc_ids, etc., with data types including string, integer, and float. The dataset is divided into several splits, such as TEST.basic_tdt_raptor, TEST.medium_tdt_raptor, etc., each containing different numbers of bytes and examples. The total download size of the dataset is 6746455 bytes, and the total size is 156527446 bytes. The configuration file lists the data file paths corresponding to each split. Additionally, a JSON description of a collection named TEST.basic_tdt_raptor is provided, which includes two fields: vec_idx and text_idx.
提供机构:
BroDeadlines
原始信息汇总
数据集信息
特征
- summaries: 数据类型为
string - level: 数据类型为
int64 - cluster: 数据类型为
float64 - doc_ids: 数据类型为
string - level_id: 数据类型为
string - index_level_0: 数据类型为
int64
数据分割
- TEST.basic_tdt_raptor:
- 字节数: 56226
- 样本数: 19
- TEST.medium_tdt_raptor:
- 字节数: 2693311
- 样本数: 332
数据集大小
- 下载大小: 540447 字节
- 数据集大小: 2749537 字节
配置
- default 配置:
- 数据文件:
- TEST.basic_tdt_raptor: 路径为
data/TEST.basic_tdt_raptor-* - TEST.medium_tdt_raptor: 路径为
data/TEST.medium_tdt_raptor-*
- TEST.basic_tdt_raptor: 路径为
- 数据文件:



