systemk/wikipedia_semantic_similarity
收藏Hugging Face2024-03-19 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/systemk/wikipedia_semantic_similarity
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: 10k
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: similarity
dtype: float32
splits:
- name: train
num_bytes: 59458771
num_examples: 10000
download_size: 31670503
dataset_size: 59458771
- config_name: default
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: similarity
dtype: float32
splits:
- name: train
num_bytes: 6763130687.0
num_examples: 1373311
download_size: 3874907579
dataset_size: 6763130687.0
- config_name: top-0.1
features:
- name: id
dtype: string
- name: url
dtype: string
- name: title
dtype: string
- name: text
dtype: string
- name: similarity
dtype: float32
splits:
- name: train
num_bytes: 737695996
num_examples: 137331
download_size: 410672269
dataset_size: 737695996
configs:
- config_name: 10k
data_files:
- split: train
path: 10k/train-*
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: top-0.1
data_files:
- split: train
path: top-0.1/train-*
---
提供机构:
systemk
原始信息汇总
数据集概述
数据集配置
配置名称:10k
- 特征:
- id: string
- url: string
- title: string
- text: string
- similarity: float32
- 分割:
- train:
- 字节数: 59458771
- 样本数: 10000
- train:
- 下载大小:31670503
- 数据集大小:59458771
配置名称:default
- 特征:
- id: string
- url: string
- title: string
- text: string
- similarity: float32
- 分割:
- train:
- 字节数: 6763130687.0
- 样本数: 1373311
- train:
- 下载大小:3874907579
- 数据集大小:6763130687.0
配置名称:top-0.1
- 特征:
- id: string
- url: string
- title: string
- text: string
- similarity: float32
- 分割:
- train:
- 字节数: 737695996
- 样本数: 137331
- train:
- 下载大小:410672269
- 数据集大小:737695996
数据文件路径
-
配置名称:10k
- 分割: train
- 路径: 10k/train-*
-
配置名称:default
- 分割: train
- 路径: data/train-*
-
配置名称:top-0.1
- 分割: train
- 路径: top-0.1/train-*



