labeled-natural-qa-random-100
收藏魔搭社区2025-11-12 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/quotientai/labeled-natural-qa-random-100
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Selection Process
This repository contains a curated subset of 100 question-answer pairs from the ["Natural Questions" dataset](https://huggingface.co/datasets/google-research-datasets/natural_questions). The dataset was filtered and selected using the following process:
## Data Selection Method
1. **Load Dataset**: The dataset is loaded using the `datasets` library, specifically the `"validation"` split of the "Natural Questions" dataset.
```python
dataset = datasets.load_dataset("google-research-datasets/natural_questions", split="validation")
```
2. **Shuffle and Slice**: The dataset is shuffled with a fixed seed (`seed=42`) to ensure reproducibility and then sliced to obtain a subset of the first 500 shuffled entries.
```python
shuffled = dataset.shuffle(seed=42)
sliced = shuffled.select(range(500))
```
3. **Filter for Short Answers**: The first 500 entries are iterated over to select only those that contain a valid short answer. If an entry has a short answer, it is added to the new dataset with the following fields:
- `context`: The HTML content of the source document.
- `url`: The URL of the source document.
- `question`: The question text.
- `answer`: The selected short answer text.
```python
new_data.append({
"context": row["document"]["html"],
"url": row["document"]["url"],
"question": row["question"]["text"],
"answer": answer,
})
```
4. **Limit to 100 Entries**: To create a manageable sample, the selection process stops once 100 entries have been collected.
## Data Annotation Process
We then manually label each answer to the question as `good` or `bad` based on the Wikipedia context, and provide reasonining for why a row is labeled as bad.
We end up with 67 `good` rows and `33` bad rows.
The annotation process followed these instructions:
**TASK:**
> You will be provided a web page and a question and answer pair.
> The question is a search query and the answer is a response provided by an LLM that attempts to answer the question.
> Your task is to identify which responses are good and which are bad, and explain why.
> Remember that this is all based on vibes.
> Imagine that you're looking for an answer or looking to learn about something you have a question about.
**INSTRUCTIONS:**
> 1. Open the URL for a given page.
> 2. Read the question and the answer.
> 3. Give a good grade if the response is good, or bad grade if the response is bad
> 4. If the response is bad, add an explanation for why
>
> Go to the Data tab below and start grading.
## Determining What is Good from Bad
We agreed that if the answer is correct, and also provides enough info that you'd be happy with the answer if you got it as a result, then the answer is good.
e.g.
**Question:** when does levi first appear in attack on titan
**Answer:** ['"Whereabouts of His Left Arm: The Struggle for Trost, Part 5"']
The answer is very specific. Episode 5 of Season 1 would be a better answer, even though it's correct.
# 数据集筛选流程
本仓库包含从「自然问题(Natural Questions)」数据集(https://huggingface.co/datasets/google-research-datasets/natural_questions)中精选的100条问答对子集。该数据集的筛选与选取流程如下:
## 数据筛选方法
1. **加载数据集**:使用`datasets`库加载「自然问题」数据集的验证(validation)划分。
python
dataset = datasets.load_dataset("google-research-datasets/natural_questions", split="validation")
2. **打乱与切片**:使用固定随机种子(seed=42)对数据集进行打乱操作以保证可复现性,随后截取前500条打乱后的数据条目。
python
shuffled = dataset.shuffle(seed=42)
sliced = shuffled.select(range(500))
3. **筛选短答案**:遍历前500条数据,仅保留包含有效短答案的条目。若某条目存在短答案,则将其以如下字段存入新数据集:
- `context`:源文档的HTML内容
- `url`:源文档的URL
- `question`:问题文本
- `answer`:选定的短答案文本
python
new_data.append({
"context": row["document"]["html"],
"url": row["document"]["url"],
"question": row["question"]["text"],
"answer": answer,
})
4. **限定至100条**:为得到易于处理的样本集,当收集满100条数据时即停止筛选流程。
## 数据标注流程
随后我们基于维基百科上下文,将每条问答对的答案手动标注为「合格(good)」或「不合格(bad)」,并为标注为不合格的条目提供标注理由。最终得到67条合格条目与33条不合格条目。
标注流程遵循以下规则:
**任务:**
> 你将获得一个网页、一组问答对。其中问题为搜索查询,答案为尝试回答该问题的大语言模型(Large Language Model)生成内容。你的任务是区分合格与不合格的回答,并解释原因。本流程仅依赖主观判断。请想象你正为某个疑问寻求答案或学习相关知识。
**操作说明:**
> 1. 打开给定页面的URL
> 2. 阅读问题与答案
> 3. 若回答合格则标记为good,不合格则标记为bad
> 4. 若回答不合格,请补充标注理由
>
> 前往下方的「数据」标签页开始标注。
## 合格与不合格的判定标准
我们约定,若答案准确且提供的信息足够充分,能让你在获得该答案时感到满意,则该答案为合格。
示例:
**问题:** 利威尔在《进击的巨人》中首次登场是在什么时候
**答案:** ['"Whereabouts of His Left Arm: The Struggle for Trost, Part 5"']
该答案虽准确,但更为合适的回答应为第一季第5集。
提供机构:
maas
创建时间:
2025-07-28



