Nutanix/mbpp_triplet_data
收藏Hugging Face2024-08-19 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/Nutanix/mbpp_triplet_data
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: anchor
dtype: string
- name: positive
dtype: string
- name: negative
dtype: string
splits:
- name: train
num_bytes: 143845823
num_examples: 317521
- name: test
num_bytes: 61833745
num_examples: 136081
download_size: 51750220
dataset_size: 205679568
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# Description
This dataset has been built from the MBPP dataset for fine tuning dense retrieval models. The dataset was created by using the first 70% points from the MBPP dataset. We created triplets corresponding to all negatives for a positive pair. Hence there are n * (n - 1) triplets for n pairs(since we have n-1 negative examples for every anchor-positive pair). Using a random seed of 10, we split these triplets into train and test subsest with a 70:30 ratio.
## Fields
1. `anchor` - The question corresponding to a code snippet
2. `positive` - The ground truth answer for the corresponding question
3. `negative` - Any other code snippet from the dataset not corresponding to the question
提供机构:
Nutanix



