five

jdpressman/retroinstruct-mix-v0.2

收藏
Hugging Face2024-07-21 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/jdpressman/retroinstruct-mix-v0.2
下载链接
链接失效反馈
官方服务:
资源简介:
RetroInstruct Mix v0.2是一个合成指令数据集,由7个合成子集组成,包括RetroInstruct Weave Evaluator Questions、RetroInstruct Analogical Translations、RetroInstruct Part Lists For Dictionary Words、RetroInstruct Easy Prose Repair Diffs、RetroInstruct ASCII Art、RetroInstruct Weave Evaluator Rubrics和Retro Textual Style Transfer。该数据集主要用于补充其他大型数据集如FLAN或OpenAssistant。每个数据行包含两个列:inputs(模型的指令)和targets(模型的预期响应)。数据集的局限性包括任务多样性不足和依赖于旧版公共领域文本及合成短篇写作。未来的改进计划包括增加更多任务、减少与基准性能正交的任务、增加多轮指令数据和合成长文本。

RetroInstruct Mix v0.2 is the first release of a synthetic instruction dataset, comprising 7 subsets dealing with various synthetic data tasks such as answering questions, reasoning generation, word decomposition, text repair, ASCII art generation, rubric breakdown, and text style transfer. The dataset aims to train models to perform specific tasks, showing good performance on specific validation sets despite some performance degradation on certain benchmarks. The dataset structure is simple, with each row containing input instructions and target responses, currently without multi-turn interaction components. The main limitation of the dataset is the lack of task diversity and reliance on synthetic short-form writing and public domain text, which may introduce specific biases. Future plans include increasing task diversity, improving orthogonality to benchmark performance, introducing multi-turn instruction data, and synthetic long texts.
提供机构:
jdpressman
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作