FudanSELab/SO_KGXQR_TRAIN
收藏Hugging Face2023-11-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/FudanSELab/SO_KGXQR_TRAIN
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
dataset_info:
- config_name: duplicate
features:
- name: question1_id
dtype: string
- name: question1
dtype: string
- name: question2_id
dtype: string
- name: question2
dtype: string
splits:
- name: train
num_bytes: 2428577
num_examples: 18281
download_size: 1682661
dataset_size: 2428577
- config_name: history
features:
- name: so_question_id
dtype: string
- name: question1
dtype: string
- name: question2
dtype: string
splits:
- name: train
num_bytes: 10039163
num_examples: 80000
download_size: 7239803
dataset_size: 10039163
- config_name: negative
features:
- name: question1
dtype: string
- name: question2
dtype: string
splits:
- name: train
num_bytes: 27392182
num_examples: 248940
download_size: 11085232
dataset_size: 27392182
- config_name: positive
features:
- name: question1
dtype: string
- name: question2
dtype: string
splits:
- name: train
num_bytes: 11457312
num_examples: 101172
download_size: 7917727
dataset_size: 11457312
configs:
- config_name: duplicate
data_files:
- split: train
path: duplicate/train-*
- config_name: history
data_files:
- split: train
path: history/train-*
- config_name: negative
data_files:
- split: train
path: negative/train-*
- config_name: positive
data_files:
- split: train
path: positive/train-*
language:
- en
size_categories:
- 100K<n<1M
---
## Dataset Description
- **Repository:** [GitHub Repository](https://kgxqr.github.io/)
提供机构:
FudanSELab
原始信息汇总
数据集描述
配置信息
配置名称:duplicate
- 特征:
- question1_id: string
- question1: string
- question2_id: string
- question2: string
- 分割:
- train:
- 字节数: 2428577
- 样本数: 18281
- train:
- 下载大小: 1682661
- 数据集大小: 2428577
配置名称:history
- 特征:
- so_question_id: string
- question1: string
- question2: string
- 分割:
- train:
- 字节数: 10039163
- 样本数: 80000
- train:
- 下载大小: 7239803
- 数据集大小: 10039163
配置名称:negative
- 特征:
- question1: string
- question2: string
- 分割:
- train:
- 字节数: 27392182
- 样本数: 248940
- train:
- 下载大小: 11085232
- 数据集大小: 27392182
配置名称:positive
- 特征:
- question1: string
- question2: string
- 分割:
- train:
- 字节数: 11457312
- 样本数: 101172
- train:
- 下载大小: 7917727
- 数据集大小: 11457312
数据文件路径
- duplicate:
- train: duplicate/train-*
- history:
- train: history/train-*
- negative:
- train: negative/train-*
- positive:
- train: positive/train-*
语言
- en
大小类别
- 100K<n<1M



