hanhainebula/BGE-Benchmark-Examples
收藏Hugging Face2024-03-19 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/hanhainebula/BGE-Benchmark-Examples
下载链接
链接失效反馈官方服务:
资源简介:
# BGE Benchmark Examples
## Data Format
- `special_examples.jsonl`: each domain (wiki, news) for each language (en, zh) has 3 examples, 12 examples in total. Each example has the following format:
```python
{
"domain": str,
"language": str,
"text": str,
"characters": List[str],
"sampled_characters": List[{
"character": str,
"scenarios": List[str],
"sampled_scenarios": List[{
"scenario": str,
"result": {
"prompt1-1": List[{
"query": str,
"diversify": {
"informal": str,
"complicated": str,
},
"hard_negative": str,
}],
"prompt2-1": ...,
"prompt3-1": ...,
"prompt4-1": ...,
"prompt1-2": ...,
"prompt2-2": ...,
"prompt3-2": ...,
"prompt4-2": ...,
},
}],
}],
}
```
- `wiki-news_en-zh_200.jsonl`: each domain (wiki, news) for each language (en, zh) has 50 examples, 200 examples in total. Each example has the following format:
```python
{'query': str, 'positive': str, 'hard_negative': str}
```
## Method
LLM: `gpt-4-turbo-preview`
For each text → Generate n1 characters → Sample n2 characters ⇒ For each character → Generate n3 scenarios → Sample n4 scenarios ⇒ For each scenario → Generate n5 queries for each prompt (*8 kinds of prompts*) ⇒ For each query → Diversify it in 2 methods & Generate 1 hard negative ⇒ DONE for one example.
There are some human-designed characters and scenarios used when generating queries:
```python
HUMAN_WRITTEN_CHARACTERS = [
{
'character': 'Professor',
'scenarios': [
'Setupping questions for an upcoming quiz/examination',
'Preparing for a lecture',
],
},
{
'character': 'College student',
'scenarios': [
'Preparing for an examination',
'Writing a research paper',
],
},
{
'character': 'High school student',
'scenarios': [
'Learning new knowledge',
'Preparing for a presentation',
],
}
]
```
For examples in `special_examples.jsonl`, we set n1 = 10 + 3, n2 = 2, n3 = 10 ( + 2), n4 = 2, n5 = 2.
For examples in `wiki-news_en-zh_200.jsonl`, we set n1 = 10 + 3, n2 = 1, n3 = 10 ( + 2), n4 = 1, n5 = 3. To generate example in a faster way, we randomly choose one prompt from the 8 kinds of prompts, instead of genrating n5 queries for each prompt (*8 kinds of prompts*).
提供机构:
hanhainebula
原始信息汇总
BGE Benchmark Examples 数据集概述
数据格式
special_examples.jsonl
- 每个领域(wiki, news)和每种语言(en, zh)各有3个示例,总共12个示例。
- 每个示例的格式如下: python { "domain": str, "language": str, "text": str, "characters": List[str], "sampled_characters": List[{ "character": str, "scenarios": List[str], "sampled_scenarios": List[{ "scenario": str, "result": { "prompt1-1": List[{ "query": str, "diversify": { "informal": str, "complicated": str, }, "hard_negative": str, }], "prompt2-1": ..., "prompt3-1": ..., "prompt4-1": ..., "prompt1-2": ..., "prompt2-2": ..., "prompt3-2": ..., "prompt4-2": ..., }, }], }], }
wiki-news_en-zh_200.jsonl
- 每个领域(wiki, news)和每种语言(en, zh)各有50个示例,总共200个示例。
- 每个示例的格式如下: python {query: str, positive: str, hard_negative: str}
生成方法
- 使用
gpt-4-turbo-preview模型。 - 生成过程包括:
- 对每个文本生成 n1 个角色。
- 从 n1 个角色中采样 n2 个角色。
- 对每个角色生成 n3 个场景。
- 从 n3 个场景中采样 n4 个场景。
- 对每个场景生成 n5 个查询(针对8种提示)。
- 对每个查询进行两种方式的多样化处理并生成1个硬负例。
人工设计的角色和场景
python HUMAN_WRITTEN_CHARACTERS = [ { character: Professor, scenarios: [ Setupping questions for an upcoming quiz/examination, Preparing for a lecture, ], }, { character: College student, scenarios: [ Preparing for an examination, Writing a research paper, ], }, { character: High school student, scenarios: [ Learning new knowledge, Preparing for a presentation, ], } ]
参数设置
-
对于
special_examples.jsonl:- n1 = 10 + 3
- n2 = 2
- n3 = 10 ( + 2)
- n4 = 2
- n5 = 2
-
对于
wiki-news_en-zh_200.jsonl:- n1 = 10 + 3
- n2 = 1
- n3 = 10 ( + 2)
- n4 = 1
- n5 = 3
- 为了更快生成示例,随机选择8种提示中的一种,而不是为每种提示生成 n5 个查询。



