zenml/rag_qa_embedding_questions
收藏Hugging Face2024-10-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/zenml/rag_qa_embedding_questions
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: page_content
dtype: string
- name: filename
dtype: string
- name: parent_section
dtype: string
- name: url
dtype: string
- name: embedding
sequence: float64
- name: token_count
dtype: int64
- name: generated_questions
sequence: string
- name: __pydantic_initialised__
dtype: bool
splits:
- name: test
num_bytes: 2012992.2015503875
num_examples: 362
- name: train
num_bytes: 17761849
num_examples: 3649
download_size: 17501572
dataset_size: 19774841.201550387
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
The dataset includes multiple features such as page content, filename, parent section, URL, embedding, token count, generated questions, etc. It is divided into a training set and a test set, containing 1444 and 362 samples respectively. The download size of the dataset is 6561573 bytes, and the total size is 10042718.0 bytes. The dataset configuration is set to default, with training and test data stored in data/train-* and data/test-* paths respectively.
提供机构:
zenml
原始信息汇总
数据集概述
特征信息
- page_content: 类型为字符串。
- filename: 类型为字符串。
- parent_section: 类型为字符串。
- url: 类型为字符串。
- embedding: 类型为浮点数序列。
- token_count: 类型为64位整数。
- generated_questions: 类型为字符串序列。
- pydantic_initialised: 类型为布尔值。
数据分割
- train: 包含1444个样本,总大小为8029725.798449612字节。
- test: 包含362个样本,总大小为2012992.2015503875字节。
数据集大小
- 下载大小: 6561573字节。
- 数据集大小: 10042718.0字节。
配置信息
- default:
- train: 路径为
data/train-*。 - test: 路径为
data/test-*。
- train: 路径为



