matejklemen/wi_locness
收藏Hugging Face2023-04-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/matejklemen/wi_locness
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
dataset_info:
- config_name: A
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
num_bytes: 3847179
num_examples: 10493
- name: validation
num_bytes: 392622
num_examples: 1037
download_size: 6120469
dataset_size: 4239801
- config_name: B
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
num_bytes: 4649805
num_examples: 13032
- name: validation
num_bytes: 468078
num_examples: 1290
download_size: 6120469
dataset_size: 5117883
- config_name: C
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
num_bytes: 3765831
num_examples: 10783
- name: validation
num_bytes: 390439
num_examples: 1069
download_size: 6120469
dataset_size: 4156270
- config_name: N
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: validation
num_bytes: 421656
num_examples: 988
download_size: 6120469
dataset_size: 421656
- config_name: all
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
num_bytes: 12262815
num_examples: 34308
- name: validation
num_bytes: 1672795
num_examples: 4384
download_size: 6120469
dataset_size: 13935610
---
提供机构:
matejklemen
原始信息汇总
数据集概述
配置 A
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_type: 字符串
- 分割:
train: 3847179 字节, 10493 示例validation: 392622 字节, 1037 示例
配置 B
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_type: 字符串
- 分割:
train: 4649805 字节, 13032 示例validation: 468078 字节, 1290 示例
配置 C
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_type: 字符串
- 分割:
train: 3765831 字节, 10783 示例validation: 390439 字节, 1069 示例
配置 N
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_type: 字符串
- 分割:
validation: 421656 字节, 988 示例
配置 all
- 特征:
src_tokens: 字符串序列tgt_tokens: 字符串序列corrections: 列表,包含:idx_src: 整数序列idx_tgt: 整数序列corr_type: 字符串
- 分割:
train: 12262815 字节, 34308 示例validation: 1672795 字节, 4384 示例
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



