divij30/fineweb-occup-100k
收藏Hugging Face2024-12-30 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/divij30/fineweb-occup-100k
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本内容、唯一标识符、dump信息、URL、日期、文件路径、文本语言及其置信分数、token数量等字段。训练集包含100,000个示例,大小为619,873,746字节。整个数据集的下载大小为373,002,170字节。
The dataset includes fields such as text content, unique identifiers, dump information, URLs, dates, file paths, text language, and confidence scores for the language, as well as token counts. The training set contains 100,000 examples and is 619,873,746 bytes in size. The total download size of the dataset is 373,002,170 bytes.
提供机构:
divij30



