arks_data
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/xlangai/arks_data
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Dataset Name
This dataset contains 4 sub-datasets, namely Pony, Ring, ScipyM, TensorflowM. You can find more information about this dataset from our paper **"ARKS: Active Retrieval in Knowledge Soup for Code Generation"**
paper Arxiv link: https://arxiv.org/abs/2402.12317
paper website: https://arks-codegen.github.io
# How to load this dataset
load one dataset:
```
from datasets import load_dataset
data_files = {"corpus": "Pony/Pony_docs.jsonl"}
dataset = load_dataset("xlangai/arks_data", data_files=data_files)
```
load several datasets:
```
from datasets import load_dataset
data_files = {"corpus": ["Pony/Pony_docs.jsonl", "Ring/Ring_docs.jsonl"]}
dataset = load_dataset("xlangai/arks_data", data_files=data_files)
```
# 数据集卡片:数据集名称
本数据集共包含4个子数据集,分别为Pony、Ring、ScipyM与TensorflowM。您可通过我们的论文**《ARKS:面向代码生成的知识汤主动检索》** (*Active Retrieval in Knowledge Soup for Code Generation*) 了解本数据集的详细信息。
论文arXiv链接:https://arxiv.org/abs/2402.12317
论文官方网站:https://arks-codegen.github.io
# 数据集加载方法
加载单个数据集:
python
from datasets import load_dataset
data_files = {"corpus": "Pony/Pony_docs.jsonl"}
dataset = load_dataset("xlangai/arks_data", data_files=data_files)
加载多个数据集:
python
from datasets import load_dataset
data_files = {"corpus": ["Pony/Pony_docs.jsonl", "Ring/Ring_docs.jsonl"]}
dataset = load_dataset("xlangai/arks_data", data_files=data_files)
提供机构:
maas
创建时间:
2025-08-19



