PJMixers-Dev/proxy-logs-ShareGPT
收藏Hugging Face2026-01-02 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/PJMixers-Dev/proxy-logs-ShareGPT
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个代理日志的集合,经过转换和格式化处理,使其符合ShareGPT的格式标准。数据集包含多个字段,如数据集名称、样本索引、模型、端点、提示哈希、预填充响应和对话内容。对话内容是一个列表,包含来源和值。数据集还提供了训练集的大小和示例数量。此外,README还提到了如何处理重复的提示信息以及如何合并预填充响应和对话的最后一条记录。
All the proxy logs I could find (lmk if there are more), converted to ShareGPT so its all formatted the same way in one dataset. I didnt `.strip()` any of the turns, I only used `ftfy.fix_text()` on them, and skipped any empty turns. The original sample `response` is moved to be the final turn of `conversations`. If the last turn in the original sample `prompt` was a model turn, it is assumed that it was for doing prefill. So I moved this turn to `response_prefill`. `sample["response_prefill"]` and `sample["conversations"][-1]["value"]` will need to be merged to have the full turn, but you would want to mask the prefill. Since many samples are just re-rolls of the same prompt, a lot of the dataset is duplicate information. To resolve this Ive combined all duplicate prompts together, with all responses in list of dicts.
提供机构:
PJMixers-Dev



