latam-gpt/fineweb2-spa_Latn-edu
收藏Hugging Face2025-01-02 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/latam-gpt/fineweb2-spa_Latn-edu
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含多种语言文本的数据集,其中包括西班牙语文本。数据集包含文本内容、唯一标识符、时间戳、文件路径、语言类型、语言评分、语言脚本、最小哈希簇大小、最常用语言列表和评分等信息。数据集分为训练集,提供了大量的文本数据供模型训练使用。
This dataset consists of multilingual text, including Spanish. It contains fields such as text content, unique identifiers, timestamps, file paths, language type, language score, language script, minimum hash cluster size, list of most frequent languages, and scores. The dataset is split into a training set, providing a large amount of text data for model training.
提供机构:
latam-gpt



