ai-bond/ru-alpaca-text
收藏Hugging Face2024-12-18 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/ai-bond/ru-alpaca-text
下载链接
链接失效反馈官方服务:
资源简介:
该数据集主要用于文本生成任务,包含四个子集:lenta、gazeta、health和wiki。每个子集都有对应的训练数据文件路径。数据集的语言为俄语,许可证为MIT。文件还提供了每个子集在过滤后的数据比例、输入数据的ID数量、最大长度以及在不同上下文长度下的溢出数量。
This dataset is primarily used for text generation tasks and includes four subsets: lenta, gazeta, health, and wiki. Each subset has corresponding training data file paths. The dataset is in Russian and is licensed under MIT. The file also provides the percentage of data after filtering, the number of input IDs, the maximum length, and the number of overflows at different context lengths for each subset.
提供机构:
ai-bond



