WhereIsAI/github-issue-similarity
收藏Hugging Face2024-05-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/WhereIsAI/github-issue-similarity
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- 10K<n<100K
task_categories:
- sentence-similarity
dataset_info:
- config_name: default
features:
- name: text1
dtype: string
- name: text2
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 181474610
num_examples: 18565
- name: valid
num_bytes: 14656141
num_examples: 1547
- name: test
num_bytes: 13135402
num_examples: 1548
download_size: 58129604
dataset_size: 209266153
- config_name: positive
features:
- name: anchor
dtype: string
- name: positive
dtype: string
splits:
- name: train
num_bytes: 79405713
num_examples: 9457
- name: valid
num_bytes: 6160932
num_examples: 774
- name: test
num_bytes: 5782206
num_examples: 807
download_size: 25212890
dataset_size: 91348851
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: valid
path: data/valid-*
- split: test
path: data/test-*
- config_name: positive
data_files:
- split: train
path: positive/train-*
- split: valid
path: positive/valid-*
- split: test
path: positive/test-*
tags:
- code
- sentence-transformers
---
# GIS: Github Issue Similarity Dataset
This dataset was released from the paper: https://arxiv.org/abs/2309.12871
# Citation
If you use our dataset in your research, welcome to cite us as follows:
```bibtex
@article{li2023angle,
title={AnglE-optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
```
提供机构:
WhereIsAI
原始信息汇总
GIS: Github Issue Similarity Dataset
基本信息
- 语言: 英语
- 许可证: MIT
- 数据集大小分类: 10K<n<100K
- 任务分类: 句子相似度
数据集配置
默认配置
- 配置名称: default
- 特征:
text1: 字符串text2: 字符串label: 64位整数
- 分割:
train:- 字节数: 181474610
- 样本数: 18565
valid:- 字节数: 14656141
- 样本数: 1547
test:- 字节数: 13135402
- 样本数: 1548
- 下载大小: 58129604
- 数据集大小: 209266153
正样本配置
- 配置名称: positive
- 特征:
anchor: 字符串positive: 字符串
- 分割:
train:- 字节数: 79405713
- 样本数: 9457
valid:- 字节数: 6160932
- 样本数: 774
test:- 字节数: 5782206
- 样本数: 807
- 下载大小: 25212890
- 数据集大小: 91348851
数据文件配置
默认配置
- 配置名称: default
- 数据文件:
train: data/train-*valid: data/valid-*test: data/test-*
正样本配置
- 配置名称: positive
- 数据文件:
train: positive/train-*valid: positive/valid-*test: positive/test-*
标签
- code
- sentence-transformers



