LSX-UniWue/LLaMmlein-Dataset-wo-HPLT
收藏Hugging Face2025-12-04 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/LSX-UniWue/LLaMmlein-Dataset-wo-HPLT
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
language:
- de
---
This dataset is a strict subset of the [LLaMmlein-Dataset](https://huggingface.co/datasets/LSX-UniWue/LLaMmlein-Dataset) which in turn is a strict subset of the [RedPajama V2 dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2). Therefore, it retains all licenses from RedPajama V2.
This dataset has been deduplicated against the deu_Latn shard of the [HPLT3.0 dataset](https://huggingface.co/datasets/HPLT/HPLT3.0).
More details in our [preprint](https://arxiv.org/abs/2411.11171)!
[Data Take Down](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/)
提供机构:
LSX-UniWue



