bhaddow/DocHPLTv2
收藏Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bhaddow/DocHPLTv2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ca-en
features:
- name: src_doc_id
dtype: string
- name: tgt_doc_id
dtype: string
- name: lang_pair
dtype: string
- name: src_doc
struct:
- name: ids
sequence: string
- name: sentences
sequence: string
- name: tgt_doc
struct:
- name: ids
sequence: string
- name: sentences
sequence: string
- name: alignment
list:
- name: src
list: string
- name: tgt
list: string
- name: aligner-score
dtype: float32
- name: bicleaner-score
dtype: float32
- name: bifixer-score
dtype: float32
splits:
- name: train
num_bytes: 205990755576
num_examples: 18135107
download_size: 17814976789
dataset_size: 205990755576
- config_name: en-no
features:
- name: src_doc_id
dtype: string
- name: tgt_doc_id
dtype: string
- name: lang_pair
dtype: string
- name: src_doc
struct:
- name: ids
sequence: string
- name: sentences
sequence: string
- name: tgt_doc
struct:
- name: ids
sequence: string
- name: sentences
sequence: string
- name: alignment
list:
- name: src
list: string
- name: tgt
list: string
- name: aligner-score
dtype: float32
- name: bicleaner-score
dtype: float32
- name: bifixer-score
dtype: float32
splits:
- name: train
num_bytes: 6682656392
num_examples: 606294
download_size: 1290860617
dataset_size: 6682656392
configs:
- config_name: ca-en
data_files:
- split: train
path: ca-en/train-*
- config_name: en-no
data_files:
- split: train
path: en-no/train-*
---
提供机构:
bhaddow



