textcleanlm/textclean-corpus-1M-o4-mini
收藏Hugging Face2025-07-12 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/textcleanlm/textclean-corpus-1M-o4-mini
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个文本数据集,包含原始文本和其清理后的多个变体,共包含1815个训练样本。每个样本都包括了唯一标识符、原始文本以及六种不同的清理版本,还有一个mini版本。数据集适用于文本处理和自然语言处理任务。
The dataset is a text corpus containing the original text and its multiple cleaned variants. It includes 1815 training samples. Each sample consists of a unique identifier, the original text, six different cleaned versions, and a mini version. The dataset is suitable for text processing and natural language processing tasks.
提供机构:
textcleanlm



