CoIR-Retrieval/codefeedback-mt-queries-corpus
收藏Hugging Face2024-09-12 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/CoIR-Retrieval/codefeedback-mt-queries-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: _id
dtype: string
- name: partition
dtype: string
- name: text
dtype: string
- name: title
dtype: string
splits:
- name: queries
num_bytes: 295280604
num_examples: 66383
- name: corpus
num_bytes: 99230769
num_examples: 66383
download_size: 176595250
dataset_size: 394511373
---
Employing the CoIR evaluation framework's dataset version, utilize the code below for assessment:
```python
import coir
from coir.data_loader import get_tasks
from coir.evaluation import COIR
from coir.models import YourCustomDEModel
model_name = "intfloat/e5-base-v2"
# Load the model
model = YourCustomDEModel(model_name=model_name)
# Get tasks
#all task ["codetrans-dl","stackoverflow-qa","apps","codefeedback-mt","codefeedback-st","codetrans-contest","synthetic-
# text2sql","cosqa","codesearchnet","codesearchnet-ccr"]
tasks = get_tasks(tasks=["codetrans-dl"])
# Initialize evaluation
evaluation = COIR(tasks=tasks,batch_size=128)
# Run evaluation
results = evaluation.run(model, output_folder=f"results/{model_name}")
print(results)
```
数据集信息:
特征字段:
- 字段名:_id,数据类型:字符串
- 字段名:partition,数据类型:字符串
- 字段名:text,数据类型:字符串
- 字段名:title,数据类型:字符串
数据划分:
- 划分名称:queries,占用字节数:295280604,样本数:66383
- 划分名称:corpus,占用字节数:99230769,样本数:66383
下载大小:176595250
数据集总大小:394511373
---
采用CoIR评估框架的数据集版本,请使用下述代码开展评估:
python
import coir
from coir.data_loader import get_tasks
from coir.evaluation import COIR
from coir.models import YourCustomDEModel
model_name = "intfloat/e5-base-v2"
# 加载模型
model = YourCustomDEModel(model_name=model_name)
# 获取任务
# 所有任务包括["codetrans-dl","stackoverflow-qa","apps","codefeedback-mt","codefeedback-st","codetrans-contest","synthetic-text2sql","cosqa","codesearchnet","codesearchnet-ccr"]
tasks = get_tasks(tasks=["codetrans-dl"])
# 初始化评估
evaluation = COIR(tasks=tasks, batch_size=128)
# 执行评估
results = evaluation.run(model, output_folder=f"results/{model_name}")
print(results)
提供机构:
CoIR-Retrieval
原始信息汇总
数据集概述
数据集信息
- 特征字段:
_id: 类型为字符串partition: 类型为字符串text: 类型为字符串title: 类型为字符串
数据分割
- queries:
- 字节数: 295280604
- 样本数: 66383
- corpus:
- 字节数: 99230769
- 样本数: 66383
数据集大小
- 下载大小: 176595250 字节
- 数据集总大小: 394511373 字节



