science-of-finetuning/ultrachat_200k_gemma-2-2b-it-generated
收藏Hugging Face2025-01-23 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/science-of-finetuning/ultrachat_200k_gemma-2-2b-it-generated
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含512个由gemma-2-2b-it模型在ultrachat 200k测试数据集的一个子集上使用贪婪解码生成的答案。这个子集是通过过滤掉长度大于或等于1024-128个token的对话生成的,并且每个批次的答案在生成1024减去batch_prompt_lengths的最小值之后就会被截断,确保每个答案最多128个token。生成的答案共有200k个token,平均每个答案大约390个token(约300个单词或2/3页)。
This dataset contains 512 answers generated by the gemma-2-2b-it model on a subset of the ultrachat 200k test_sft dataset using greedy decoding. The subset was generated by filtering out conversations that were >= 1024 - 128 tokens long, and answers were cut off at each batch after 1024 - min(batch_prompt_lengths) generated tokens, such that each answer is at most 128 tokens long. The generated answers are 200k tokens so 390 tokens (~300 words or 2/3 pages) on average.
提供机构:
science-of-finetuning



