matejklemen/nucle
收藏Hugging Face2024-01-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/matejklemen/nucle
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
dataset_info:
- config_name: public
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
download_size: 0
dataset_size: 0
- config_name: private
features:
- name: src_tokens
sequence: string
- name: tgt_tokens
sequence: string
- name: corrections
list:
- name: idx_src
sequence: int32
- name: idx_tgt
sequence: int32
- name: corr_type
dtype: string
splits:
- name: train
download_size: 0
dataset_size: 0
---
**Important**: This is only a script for loading the data, but the data itself is private. The script will only work in case you have access to the data, which you may request for non-commercial purposes [here](https://sterling8.d2.comp.nus.edu.sg/nucle_download/nucle.php).
```python
data = datasets.load_dataset("matejklemen/nucle", "private", data_dir=<dir-of-private-data>, ignore_verifications=True)"
```
The `ignore_verifications=True` is important as the datasets library initially builds validation statistics that it verifies against,
and these cannot be correctly computed when the data is not public.
提供机构:
matejklemen
原始信息汇总
数据集概述
配置信息
- config_name: public, private
特征信息
- src_tokens: 字符串序列
- tgt_tokens: 字符串序列
- corrections: 列表
- idx_src: 整数序列
- idx_tgt: 整数序列
- corr_type: 字符串类型
数据分割
- train
数据大小
- download_size: 0
- dataset_size: 0



