vatolinalex/ru_sci_bench_test
收藏Hugging Face2024-05-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/vatolinalex/ru_sci_bench_test
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: grnti_en
features:
- name: paper_id
dtype: int64
- name: title
dtype: string
- name: abstract
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 124562302
num_examples: 130460
- name: test
num_bytes: 13680725
num_examples: 14498
download_size: 78016581
dataset_size: 138243027
- config_name: grnti_ru
features:
- name: paper_id
dtype: int64
- name: title
dtype: string
- name: abstract
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 227158411
num_examples: 138650
- name: test
num_bytes: 25360904
num_examples: 15441
download_size: 119721076
dataset_size: 252519315
- config_name: oecd_en
features:
- name: paper_id
dtype: int64
- name: title
dtype: string
- name: abstract
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 150882098
num_examples: 159900
- name: test
num_bytes: 16647200
num_examples: 17820
download_size: 94829407
dataset_size: 167529298
- config_name: oecd_ru
features:
- name: paper_id
dtype: int64
- name: title
dtype: string
- name: abstract
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 274703543
num_examples: 169890
- name: test
num_bytes: 30478302
num_examples: 18897
download_size: 144936959
dataset_size: 305181845
configs:
- config_name: grnti_en
data_files:
- split: train
path: grnti_en/train-*
- split: test
path: grnti_en/test-*
- config_name: grnti_ru
data_files:
- split: train
path: grnti_ru/train-*
- split: test
path: grnti_ru/test-*
- config_name: oecd_en
data_files:
- split: train
path: oecd_en/train-*
- split: test
path: oecd_en/test-*
- config_name: oecd_ru
data_files:
- split: train
path: oecd_ru/train-*
- split: test
path: oecd_ru/test-*
---
The dataset includes four configurations: grnti_en, grnti_ru, oecd_en, and oecd_ru. Each configuration contains train and test datasets with the same features: paper_id (int64), title (string), abstract (string), and label (int64). The train and test datasets for each configuration have corresponding byte sizes and example counts. The dataset size and download size are also provided for each configuration.
提供机构:
vatolinalex
原始信息汇总
数据集概述
数据集配置1: grnti_en
- 特征:
paper_id: 数据类型为int64title: 数据类型为stringabstract: 数据类型为stringlabel: 数据类型为int64
- 分割:
train: 大小为124562302字节,包含130460个样本test: 大小为13680725字节,包含14498个样本
- 下载大小: 78016581字节
- 数据集大小: 138243027字节
数据集配置2: grnti_ru
- 特征:
paper_id: 数据类型为int64title: 数据类型为stringabstract: 数据类型为stringlabel: 数据类型为int64
- 分割:
train: 大小为227158411字节,包含138650个样本test: 大小为25360904字节,包含15441个样本
- 下载大小: 119721076字节
- 数据集大小: 252519315字节
数据集配置3: oecd_en
- 特征:
paper_id: 数据类型为int64title: 数据类型为stringabstract: 数据类型为stringlabel: 数据类型为int64
- 分割:
train: 大小为150882098字节,包含159900个样本test: 大小为16647200字节,包含17820个样本
- 下载大小: 94829407字节
- 数据集大小: 167529298字节
数据集配置4: oecd_ru
- 特征:
paper_id: 数据类型为int64title: 数据类型为stringabstract: 数据类型为stringlabel: 数据类型为int64
- 分割:
train: 大小为274703543字节,包含169890个样本test: 大小为30478302字节,包含18897个样本
- 下载大小: 144936959字节
- 数据集大小: 305181845字节



