finiteautomata/yahoo_dataset
收藏Hugging Face2023-10-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/finiteautomata/yahoo_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
dataset_info:
features:
- name: id
dtype: int32
- name: topic
dtype:
class_label:
names:
'0': Society & Culture
'1': Science & Mathematics
'2': Health
'3': Education & Reference
'4': Computers & Internet
'5': Sports
'6': Business & Finance
'7': Entertainment & Music
'8': Family & Relationships
'9': Politics & Government
- name: question_title
dtype: string
- name: question_content
dtype: string
- name: best_answer
dtype: string
- name: question_title_embeddings
sequence: float32
- name: question_content_embeddings
sequence: float32
- name: best_answer_embeddings
sequence: float32
splits:
- name: train
num_bytes: 1032387680
num_examples: 200000
- name: test
num_bytes: 309853862
num_examples: 60000
download_size: 500190426
dataset_size: 1342241542
---
# Dataset Card for "yahoo_dataset"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
finiteautomata
原始信息汇总
数据集概述
配置
- 默认配置:
- 训练集:路径为
data/train-* - 测试集:路径为
data/test-*
- 训练集:路径为
数据特征
- id:数据类型为
int32 - topic:数据类型为分类标签,包含以下类别:
- 0: Society & Culture
- 1: Science & Mathematics
- 2: Health
- 3: Education & Reference
- 4: Computers & Internet
- 5: Sports
- 6: Business & Finance
- 7: Entertainment & Music
- 8: Family & Relationships
- 9: Politics & Government
- question_title:数据类型为
string - question_content:数据类型为
string - best_answer:数据类型为
string - question_title_embeddings:数据类型为
float32序列 - question_content_embeddings:数据类型为
float32序列 - best_answer_embeddings:数据类型为
float32序列
数据分割
- 训练集:
- 字节数:1032387680
- 样本数:200000
- 测试集:
- 字节数:309853862
- 样本数:60000
数据集大小
- 下载大小:500190426 字节
- 数据集大小:1342241542 字节
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



