Tony068/falcon-refined-web-5M-part2
收藏Hugging Face2025-03-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Tony068/falcon-refined-web-5M-part2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本内容、网页链接、时间戳、未知字符串、文本片段和图片链接列表等字段。数据集被分割为训练集,共有500万个示例,总大小约为14.38GB。数据集的具体内容和用途未在README中明确说明。
The dataset includes fields such as text content, web page URLs, timestamps, an unknown string, text segments, and a list of image URLs. The dataset is split into a training set with a total of 5 million examples, with a total size of approximately 14.38GB. The specific content and purpose of the dataset are not explicitly stated in the README.
提供机构:
Tony068



