five

bhaddow/DocHPLTv2

收藏
Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bhaddow/DocHPLTv2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ca-en features: - name: src_doc_id dtype: string - name: tgt_doc_id dtype: string - name: lang_pair dtype: string - name: src_doc struct: - name: ids sequence: string - name: sentences sequence: string - name: tgt_doc struct: - name: ids sequence: string - name: sentences sequence: string - name: alignment list: - name: src list: string - name: tgt list: string - name: aligner-score dtype: float32 - name: bicleaner-score dtype: float32 - name: bifixer-score dtype: float32 splits: - name: train num_bytes: 205990755576 num_examples: 18135107 download_size: 17814976789 dataset_size: 205990755576 - config_name: en-no features: - name: src_doc_id dtype: string - name: tgt_doc_id dtype: string - name: lang_pair dtype: string - name: src_doc struct: - name: ids sequence: string - name: sentences sequence: string - name: tgt_doc struct: - name: ids sequence: string - name: sentences sequence: string - name: alignment list: - name: src list: string - name: tgt list: string - name: aligner-score dtype: float32 - name: bicleaner-score dtype: float32 - name: bifixer-score dtype: float32 splits: - name: train num_bytes: 6682656392 num_examples: 606294 download_size: 1290860617 dataset_size: 6682656392 configs: - config_name: ca-en data_files: - split: train path: ca-en/train-* - config_name: en-no data_files: - split: train path: en-no/train-* ---
提供机构:
bhaddow
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作