zenml/rag_qa_embedding_questions

Name: zenml/rag_qa_embedding_questions
Creator: zenml
Published: 2024-10-23 07:13:59
License: 暂无描述

Hugging Face2024-10-23 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/zenml/rag_qa_embedding_questions

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: page_content dtype: string - name: filename dtype: string - name: parent_section dtype: string - name: url dtype: string - name: embedding sequence: float64 - name: token_count dtype: int64 - name: generated_questions sequence: string - name: __pydantic_initialised__ dtype: bool splits: - name: test num_bytes: 2012992.2015503875 num_examples: 362 - name: train num_bytes: 17761849 num_examples: 3649 download_size: 17501572 dataset_size: 19774841.201550387 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* ---

The dataset includes multiple features such as page content, filename, parent section, URL, embedding, token count, generated questions, etc. It is divided into a training set and a test set, containing 1444 and 362 samples respectively. The download size of the dataset is 6561573 bytes, and the total size is 10042718.0 bytes. The dataset configuration is set to default, with training and test data stored in data/train-* and data/test-* paths respectively.

提供机构：

zenml

原始信息汇总

数据集概述

特征信息

page_content: 类型为字符串。
filename: 类型为字符串。
parent_section: 类型为字符串。
url: 类型为字符串。
embedding: 类型为浮点数序列。
token_count: 类型为64位整数。
generated_questions: 类型为字符串序列。
pydantic_initialised: 类型为布尔值。

数据分割

train: 包含1444个样本，总大小为8029725.798449612字节。
test: 包含362个样本，总大小为2012992.2015503875字节。

数据集大小

下载大小: 6561573字节。
数据集大小: 10042718.0字节。

配置信息

default:
- train: 路径为data/train-*。
- test: 路径为data/test-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集