five

Nutanix/mbpp_triplet_data

收藏
Hugging Face2024-08-19 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/Nutanix/mbpp_triplet_data
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: anchor dtype: string - name: positive dtype: string - name: negative dtype: string splits: - name: train num_bytes: 143845823 num_examples: 317521 - name: test num_bytes: 61833745 num_examples: 136081 download_size: 51750220 dataset_size: 205679568 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Description This dataset has been built from the MBPP dataset for fine tuning dense retrieval models. The dataset was created by using the first 70% points from the MBPP dataset. We created triplets corresponding to all negatives for a positive pair. Hence there are n * (n - 1) triplets for n pairs(since we have n-1 negative examples for every anchor-positive pair). Using a random seed of 10, we split these triplets into train and test subsest with a 70:30 ratio. ## Fields 1. `anchor` - The question corresponding to a code snippet 2. `positive` - The ground truth answer for the corresponding question 3. `negative` - Any other code snippet from the dataset not corresponding to the question
提供机构:
Nutanix
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作