fjxdaisy/sentence_split_finemath_4plus_part_1
收藏Hugging Face2025-02-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/fjxdaisy/sentence_split_finemath_4plus_part_1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含网页URL、抓取时间、MIME类型、WARC文件名、文本内容、词汇数、字符数、元数据、评分、整数评分、抓取方式、快照类型、语言和语言评分等字段。数据集被分割为训练集,包含大约100万个样本,总大小约为10GB。
The dataset includes fields such as web page URL, fetch time, MIME type, WARC filename, text content, word count, character count, metadata, score, integer score, crawling method, snapshot type, language, and language score. The dataset is split into a training set, containing approximately 1,000,000 samples, with a total size of about 10GB.
提供机构:
fjxdaisy



