konwoo/finemath-subset500000
收藏Hugging Face2025-10-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/konwoo/finemath-subset500000
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了网页的URL、抓取时间、MIME类型、Warc文件名、文本内容、词汇数量、字符数量、元数据、评分、整数评分、抓取方式、快照类型、语言和语言评分等信息。数据集被分为训练集,包含大约500000个网页数据,总大小约为2.9GB。
The dataset contains web page URLs, fetch time, MIME type, Warc filename, text content, token count, character count, metadata, score, integer score, crawl method, snapshot type, language, and language score. The dataset is split into a training set with approximately 500,000 web page data entries, totaling about 2.9GB in size.
提供机构:
konwoo



