CodeLlama-2-20k
收藏魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/CodeLlama-2-20k
下载链接
链接失效反馈官方服务:
资源简介:
# CodeLlama-2-20k: A Llama 2 Version of CodeAlpaca
This dataset is the [`sahil2801/CodeAlpaca-20k`](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) dataset with the Llama 2 prompt format [described here](https://huggingface.co/blog/llama2#how-to-prompt-llama-2).
Here is the code I used to format it:
``` python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset('sahil2801/CodeAlpaca-20k')
# Define a function to merge the three columns into one
def merge_columns(example):
if example['input']:
merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} Input: {example['input']} [/INST] {example['output']} </s>"
else:
merged = f"<s>[INST] <<SYS>>\nBelow is an instruction that describes a task. Write a response that appropriately completes the request.\n<</SYS>>\n\n{example['instruction']} [/INST] {example['output']} </s>"
return {"text": merged}
# Apply the function to all elements in the dataset
dataset = dataset.map(merge_columns, remove_columns=['instruction', 'input', 'output'])
```
# CodeLlama-2-20k:适配Llama 2格式的CodeAlpaca数据集版本
本数据集基于`sahil2801/CodeAlpaca-20k`(https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k)数据集,并采用Llama 2提示格式,该格式的详细说明可参见[此处](https://huggingface.co/blog/llama2#how-to-prompt-llama-2)。
以下为用于实现该格式转换的代码:
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset('sahil2801/CodeAlpaca-20k')
# 定义用于将三列合并为一列的函数
def merge_columns(example):
if example['input']:
merged = f"<s>[INST] <<SYS>>
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
<</SYS>>
{example['instruction']} Input: {example['input']} [/INST] {example['output']} </s>"
else:
merged = f"<s>[INST] <<SYS>>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<</SYS>>
{example['instruction']} [/INST] {example['output']} </s>"
return {"text": merged}
# 将该函数应用于数据集中的所有样本
dataset = dataset.map(merge_columns, remove_columns=['instruction', 'input', 'output'])
提供机构:
maas
创建时间:
2025-03-18



