tulu-3-sft-personas-code

Name: tulu-3-sft-personas-code
Creator: maas
Published: 2026-01-08 17:11:54
License: 暂无描述

魔搭社区2026-01-08 更新2024-11-30 收录

下载链接：

https://modelscope.cn/datasets/LLM-Research/tulu-3-sft-personas-code

下载链接

链接失效反馈

官方服务：

资源简介：

### Dataset Descriptions This dataset contains **34999** examples and is synthetically created to enhance models' coding capabilities. To generate diverse *python* coding questions, we expand the methodology in [Ge et al., 2024](https://arxiv.org/pdf/2406.20094) by using personas to ground the code completion question in real-world scenarios. More details and exact prompts used to construct the dataset can be found in our [paper](). - **Curated by:** Allen Institute for AI - **Paper:** [TBD]() - **Repository:** [TBD]() - **Language(s) (NLP):** English - **License:** ODC-BY - **Point of Contact:** [Faeze Brahman](mailto:faezeb@allenai.org) ### Loading ```python from datasets import load_dataset dataset = load_dataset("allenai/tulu-3-personas-math")["train"] ``` ### Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - id (str): a unique identifier - prompt (str): python programming questions grounded in a given persona/scenario - messages (list): message format used for supervised fine-tuning (this contains user prompt and assistant response)

### 数据集说明本数据集共包含34999条样本，为提升模型的编码能力而合成构建。为生成多样化的Python编程问题，我们在[Ge等人，2024](https://arxiv.org/pdf/2406.20094)提出的方法基础上进行拓展，通过引入人物角色设定，将代码补全问题锚定至真实应用场景中。有关本数据集构建的更多细节及所用精确提示词，可参见我们的[论文]()。 - **整理单位：** 艾伦人工智能研究所（Allen Institute for AI） - **论文：** [待公布]() - **代码仓库：** [待公布]() - **自然语言处理语种：** 英语 - **许可证：** ODC-BY - **联系人：** [Faeze Brahman](mailto:faezeb@allenai.org) ### 加载方式 python from datasets import load_dataset dataset = load_dataset("allenai/tulu-3-personas-math")["train"] ### 数据集结构本数据集的每条样本均包含标准的指令微调数据字段，具体如下： - **id（字符串型）：** 唯一标识符 - **提示词（prompt）（字符串型）：** 锚定至特定人物角色/应用场景的Python编程问题 - **消息（messages）（列表型）：** 适用于监督微调的消息格式（包含用户提示词与助手回复）

提供机构：

maas

创建时间：

2024-11-23

搜集汇总

数据集介绍