c4-200m-gec-train100k-test25k
收藏OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/c4-200m-gec-train100k-test25k
下载链接
链接失效反馈官方服务:
资源简介:
C4 200M Sample Dataset adopted from https://huggingface.co/datasets/liweili/c4-200m
C4-200m is a collection of 185 million sentence pairs generated from the cleaned English dataset from C4. This dataset can be used in grammatical error correction (GEC) tasks.
The corruption edits and scripts used to synthesize this dataset is referenced from: C4-200M Synthetic Dataset
提供机构:
OpenDataLab
创建时间:
2023-12-07



