five

CodeLlama-2-20k

收藏
魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/CodeLlama-2-20k
下载链接
链接失效反馈
官方服务:
资源简介:
# CodeLlama-2-20k: A Llama 2 Version of CodeAlpaca This dataset is the [`sahil2801/CodeAlpaca-20k`](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) dataset with the Llama 2 prompt format [described here](https://huggingface.co/blog/llama2#how-to-prompt-llama-2). Here is the code I used to format it: ``` python from datasets import load_dataset # Load the dataset dataset = load_dataset('sahil2801/CodeAlpaca-20k') # Define a function to merge the three columns into one def merge_columns(example): if example['input']: merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} Input: {example['input']} [/INST] {example['output']} </s>" else: merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} [/INST] {example['output']} </s>" return {"text": merged} # Apply the function to all elements in the dataset dataset = dataset.map(merge_columns, remove_columns=['instruction', 'input', 'output']) ```

# CodeLlama-2-20k:适配Llama 2格式的CodeAlpaca数据集版本 本数据集基于`sahil2801/CodeAlpaca-20k`(https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)数据集,并采用Llama 2提示格式,该格式的详细说明可参见[此处](https://huggingface.co/blog/llama2#how-to-prompt-llama-2)。 以下为用于实现该格式转换的代码: python from datasets import load_dataset # 加载数据集 dataset = load_dataset('sahil2801/CodeAlpaca-20k') # 定义用于将三列合并为一列的函数 def merge_columns(example): if example['input']: merged = f"<s>[INST] <<SYS>> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. <</SYS>> {example['instruction']} Input: {example['input']} [/INST] {example['output']} </s>" else: merged = f"<s>[INST] <<SYS>> Below is an instruction that describes a task. Write a response that appropriately completes the request. <</SYS>> {example['instruction']} [/INST] {example['output']} </s>" return {"text": merged} # 将该函数应用于数据集中的所有样本 dataset = dataset.map(merge_columns, remove_columns=['instruction', 'input', 'output'])
提供机构:
maas
创建时间:
2025-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作