0xJupiter/SocialAttributionQA
收藏Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/0xJupiter/SocialAttributionQA
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Social Attribution QA Benchmark
license: apache-2.0
language:
- en
task_categories:
- question-answering
task_ids:
- multiple-choice-qa
size_categories:
- 1K<n<10K
tags:
- benchmark
- social-media
- provenance
- attribution
- retrieval-augmented-generation
---
# Social Attribution QA Benchmark
The Social Attribution QA Benchmark is a derived benchmark for provenance-aware
social attribution question answering over Fediverse data. It is designed to
evaluate whether a system can identify who said a statement, what a person
said, and whether attribution remains correct under entity, temporal, social,
and collaborative constraints.
This release contains 1,200 four-option multiple-choice questions organized
into eight task files. The benchmark is derived from the source dataset
`FediData` and is released as a benchmark artifact rather than as a raw
social-media dump.
This release is evaluation-oriented and is distributed as task files rather
than as train/dev/test splits.

## Dataset Summary
The benchmark is organized into two task families:
- `WSW`: Who Said What
- `WDWS`: What Did Who Say
Each JSON file contains a top-level dictionary with three fields:
- `metadata`: file-level provenance and construction metadata
- `tasks`: the benchmark instances for one task type
- `statistics`: counts and difficulty summaries for that task file
## Data Files
| File | Task | Questions |
|---|---|---:|
| `WSW_DIRECT.json` | direct attribution | 200 |
| `WSW_ENTITY.json` | entity-constrained attribution | 200 |
| `WSW_ASSOC.json` | association reasoning | 100 |
| `WSW_TEMPORAL.json` | temporal attribution | 100 |
| `WDWS_DIRECT.json` | direct attribution | 200 |
| `WDWS_ENTITY.json` | entity-constrained attribution | 200 |
| `WDWS_COLLAB.json` | collaborative reasoning | 100 |
| `WDWS_TEMPORAL.json` | temporal attribution | 100 |
## Data Structure
Most instances contain the following fields:
- `question_id`: unique question identifier
- `task_id`: canonical task identifier
- `question`: question text
- `options`: four answer choices
- `answer`: gold option label such as `A`
- `answer_text`: gold answer in text form
- `answer_path`: supporting provenance information for the gold answer
- `metadata`: instance-level construction metadata
- `difficulty`: difficulty annotation and score
The collaborative file `WDWS_COLLAB.json` additionally includes
`correct_answer`, while its difficulty annotation is not populated in the same
way as the other task files.
## Example
```python
import json
with open("WSW_DIRECT.json", "r", encoding="utf-8") as f:
data = json.load(f)
task_name = next(iter(data["tasks"]))
sample = data["tasks"][task_name][0]
print(task_name)
print(sample["question"])
print(sample["options"])
print(sample["answer"], sample["answer_text"])
```
Example task instance:
```json
{
"question_id": "WSW_T1_11c48887e878431b",
"task_id": "WSW_T1_DIRECT",
"question": "Who said: 'Smoking damages your lungs.'?",
"options": {
"A": "55ee6c1d@mastodon.social",
"B": "bf0398ec@pouet.chapril.org",
"C": "a25f92ab@mastodon.nl",
"D": "ca4390cb@octodon.social"
},
"answer": "A",
"answer_text": "55ee6c1d@mastodon.social"
}
```
## Source Data
This benchmark is derived from the `FediData` Fediverse corpus:
- FediData: https://zenodo.org/records/15621244
This dataset repository does not redistribute the raw source-data dump. If you
want to rebuild the benchmark from source, use the construction code in the
project repository and place the downloaded FediData release under the expected
build directory.
## Related Resources
The full project repository includes:
- the released benchmark files
- the benchmark-construction pipeline
- baseline implementations
- the `ATLAS` method implementation
Project repository:
- https://github.com/JupiterXiaoxiaoYu/SocialAttributionQA
## Intended Use
This release is intended for benchmark evaluation and method comparison. It is
most suitable for:
- provenance-aware social attribution QA
- retrieval and reasoning over Fediverse-derived content
- comparison between graph-based, retrieval-based, and agentic QA methods
## License
Apache License 2.0.
提供机构:
0xJupiter



