matejklemen/akces_gec
收藏Hugging Face2023-05-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/matejklemen/akces_gec
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
dataset_info:
- config_name: ann0
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_types
sequence: string
splits:
- name: train
num_bytes: 11199287
num_examples: 42210
- name: validation
num_bytes: 713686
num_examples: 2485
- name: test
num_bytes: 741411
num_examples: 2676
download_size: 3534547
dataset_size: 12654384
- config_name: ann1
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_types
sequence: string
splits:
- name: train
num_bytes: 8124054
num_examples: 42210
- name: validation
num_bytes: 618583
num_examples: 2485
- name: test
num_bytes: 655536
num_examples: 2676
download_size: 3534547
dataset_size: 9398173
---
There are two configs: `ann0` (default) and `ann1`. These correspond to the annotator ID whose annotations will be loaded.
**Important:** Annotations from annotator 1 only exist for the dev set so the training and test set will have no annotations.
It is up to the user to combine the annotations somehow.
提供机构:
matejklemen
原始信息汇总
数据集概述
配置 ann0
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_types: 字符串序列
- 分割:
train: 42210 个示例,11199287 字节validation: 2485 个示例,713686 字节test: 2676 个示例,741411 字节
- 下载大小: 3534547 字节
- 数据集大小: 12654384 字节
配置 ann1
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_types: 字符串序列
- 分割:
train: 42210 个示例,8124054 字节validation: 2485 个示例,618583 字节test: 2676 个示例,655536 字节
- 下载大小: 3534547 字节
- 数据集大小: 9398173 字节



