BroDeadlines/TEST.edu_tdt_data
收藏Hugging Face2024-06-01 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/BroDeadlines/TEST.edu_tdt_data
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Basic Evaluation dataset for IR
dataset_info:
features:
- name: content
dtype: string
- name: url
dtype: string
- name: doc_id
dtype: string
- name: shards
dtype: int64
- name: splits
sequence: string
splits:
- name: train
num_bytes: 9162325
num_examples: 100
- name: INDEX.medium_index_TDT
num_bytes: 24652078
num_examples: 344
- name: TEST.basic_test_tdt_dataset
num_bytes: 9162325
num_examples: 100
download_size: 6105780
dataset_size: 42976728
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: INDEX.medium_index_TDT
path: data/INDEX.medium_index_TDT-*
- split: TEST.basic_test_tdt_dataset
path: data/TEST.basic_test_tdt_dataset-*
---
# Preprocess text
test-basic_test_tdt_dataset
```json
{
"vector_index": "test-basic_test_tdt_dataset",
"method": "window_slide",
"step": 50,
"chunk_size": 1500
}
```
index.medium_index_tdt
```json
{
"vector_index": "vec-index.medium_index_tdt",
"text_index": "text-index.medium_index_tdt",
"method": "window_slide",
"step": 50,
"chunk_size": 1500,
"time(min)": "5.35"
}
```
提供机构:
BroDeadlines
原始信息汇总
数据集概述
数据集名称
- 名称: Basic Evaluation dataset for IR
数据集特征
- 特征列表:
- content: 数据类型为字符串
- url: 数据类型为字符串
- doc_id: 数据类型为字符串
- shards: 数据类型为整数 (int64)
- splits: 数据类型为字符串序列
数据集分割
- 分割信息:
- train: 字节数为9162325,样本数为100
- INDEX.medium_index_TDT: 字节数为24652078,样本数为344
- TEST.basic_test_tdt_dataset: 字节数为9162325,样本数为100
数据集大小
- 下载大小: 6105780字节
- 数据集大小: 42976728字节
配置信息
- 配置名称: default
- 数据文件路径:
- train: data/train-*
- INDEX.medium_index_TDT: data/INDEX.medium_index_TDT-*
- TEST.basic_test_tdt_dataset: data/TEST.basic_test_tdt_dataset-*



