jdpressman/retroinstruct-mix-v0.2
收藏Hugging Face2024-07-21 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/jdpressman/retroinstruct-mix-v0.2
下载链接
链接失效反馈官方服务:
资源简介:
RetroInstruct Mix v0.2是一个合成指令数据集,由7个合成子集组成,包括RetroInstruct Weave Evaluator Questions、RetroInstruct Analogical Translations、RetroInstruct Part Lists For Dictionary Words、RetroInstruct Easy Prose Repair Diffs、RetroInstruct ASCII Art、RetroInstruct Weave Evaluator Rubrics和Retro Textual Style Transfer。该数据集主要用于补充其他大型数据集如FLAN或OpenAssistant。每个数据行包含两个列:inputs(模型的指令)和targets(模型的预期响应)。数据集的局限性包括任务多样性不足和依赖于旧版公共领域文本及合成短篇写作。未来的改进计划包括增加更多任务、减少与基准性能正交的任务、增加多轮指令数据和合成长文本。
RetroInstruct Mix v0.2 is the first release of a synthetic instruction dataset, comprising 7 subsets dealing with various synthetic data tasks such as answering questions, reasoning generation, word decomposition, text repair, ASCII art generation, rubric breakdown, and text style transfer. The dataset aims to train models to perform specific tasks, showing good performance on specific validation sets despite some performance degradation on certain benchmarks. The dataset structure is simple, with each row containing input instructions and target responses, currently without multi-turn interaction components. The main limitation of the dataset is the lack of task diversity and reliance on synthetic short-form writing and public domain text, which may introduce specific biases. Future plans include increasing task diversity, improving orthogonality to benchmark performance, introducing multi-turn instruction data, and synthetic long texts.
提供机构:
jdpressman



