fadhel-alobaidi/AraFact
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/fadhel-alobaidi/AraFact
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ar
license: mit
task_categories:
- text-classification
- question-answering
task_ids:
- fact-checking
pretty_name: AraFact
size_categories:
- 10K<n<100K
tags:
- arabic
- synthetic
- fact-checking
- validation
- msa
- agentic
- json
- documents
---
# AraFact-Synth — Arabic Fact Validation Dataset
`dataset_ar.jsonl` is a synthetic Arabic-language fact-checking dataset generated by a two-agent pipeline (Agent A + Agent B). It is designed for training and evaluating language models on the task of **validating factual claims** in Arabic text.
- **Format**: JSON Lines (one record per line)
- **Size**: 10,008 records / 60,250 items
- **Language**: Arabic MSA (Modern Standard Arabic) — 100%
---
## Statistics
| Metric | Value |
|---|---|
| Total records | 10,008 |
| Total items | 60,250 |
| Valid items | 29,865 (49.6%) |
| Invalid items | 28,128 (46.7%) |
| Batch type — json | 5,131 records (51.3%) |
| Batch type — documents | 4,877 records (48.7%) |
### Flaw Type Distribution (invalid items)
| Flaw Type | Count | Description |
|---|---|---|
| `wrong_spec` | 9,003 | Wrong entity, location, organization, or specification |
| `subtle_number` | 8,721 | Slightly altered numbers, percentages, or dates |
| `contradictory` | 6,206 | Content that contradicts itself or the source |
| `chronological` | 4,243 | Wrong dates, year, or sequence of events |
---
## Knowledge Categories
The dataset spans **159 categories** and **1,131 subcategories** across diverse knowledge domains, covering both mainstream fields (Mathematics, Medicine, Law, Physics) and highly specialised niches (Lunar Geology, Ethnobotany, Traditional Boatbuilding, etc.).
---
## Record Structure
Each line is a JSON object with four top-level fields: `instruction`, `input`, `output`, and `meta`.
```
{
"instruction": "...",
"input": { ... },
"output": { ... },
"meta": { ... }
}
```
---
### `instruction` (string)
The task prompt for the model. One of two values depending on batch type:
- `"Validate the following batch of json. For each item, provide a label and a detailed reason."`
- `"Validate the following batch of documents. For each item, provide a label and a detailed reason."`
---
### `input` (object)
The batch to be validated.
| Field | Type | Description |
|---|---|---|
| `type` | string | `"json"` or `"documents"` |
| `count` | integer | Number of items in the batch |
| `items` | array | List of items to validate (see below) |
#### Item structure
| Field | Type | Description |
|---|---|---|
| `id` | integer | Item identifier (1-based) |
| `content` | string or object | The claim to validate. String for `documents`, dict for `json` |
**Example — documents type:**
```json
{
"id": 1,
"content": "تُعد شبكات الجيل الخامس نقلة نوعية في عالم الاتصالات ..."
}
```
**Example — json type:**
```json
{
"id": 2,
"content": {
"الاسم": "سوق المشتقات المالية السعودي",
"تاريخ_الانطلاق": "الربع الثالث من عام 2020",
"أول_منتج": "العقود المستقبلية لمؤشر (إم تي 30)"
}
}
```
---
### `output` (object)
Agent B's validation results for the batch.
| Field | Type | Description |
|---|---|---|
| `results` | array | Per-item validation results (see below) |
| `summary` | string | Arabic summary, e.g. `"عنصر واحد صحيح، وعنصر واحد خاطئ من أصل عنصرين"` |
#### Result item structure
| Field | Type | Description |
|---|---|---|
| `id` | integer | Matches the input item id |
| `label` | string | `"valid"` or `"invalid"` |
| `flaw_detected` | string | `"none"`, `"wrong_spec"`, `"subtle_number"`, `"contradictory"`, or `"chronological"` |
| `reason` | string | Arabic explanation of the validation decision |
**Example:**
```json
{
"id": 2,
"label": "invalid",
"flaw_detected": "wrong_spec",
"reason": "يحتوي العنصر على ادعاء رقمي غير دقيق ..."
}
```
---
### `meta` (object)
Metadata about the record for analysis and filtering.
| Field | Type | Description |
|---|---|---|
| `batch_type` | string | `"json"` or `"documents"` |
| `batch_size` | integer | Number of items in the batch |
| `valid_count` | integer | Number of items Agent B labeled as valid |
| `invalid_count` | integer | Number of items Agent B labeled as invalid |
| `language_distribution` | object | Count of items per language, e.g. `{"ar": 6}` |
| `all_hints_confirmed` | boolean | `true` if Agent B's labels match all intended labels |
| `intended_labels` | array | The ground-truth plan used to generate the batch (see below) |
#### `intended_labels` structure
Each entry reflects the **intended** validity for that item as decided at generation time. Agent A was instructed to follow this plan, but may not always have done so perfectly. Agent B's `label` field reflects its independent assessment.
| Field | Type | Description |
|---|---|---|
| `id` | integer | Item identifier |
| `is_valid` | boolean | Whether the item was intended to be valid |
| `flaw_type` | string | Intended flaw type if invalid, `"none"` if valid |
**Example:**
```json
"intended_labels": [
{"id": 1, "is_valid": true, "flaw_type": "none"},
{"id": 2, "is_valid": false, "flaw_type": "subtle_number"}
]
```
> **Note**: `all_hints_confirmed: false` does not always mean Agent B made an error. It may also mean Agent A did not faithfully inject the intended flaw, in which case Agent B's independent label is the more reliable signal.
---
## Generation Pipeline
Records were generated by a two-agent pipeline:
1. **Agent A** (producer): Given a category, subcategory, and a validity plan (specifying which items should be valid/invalid and what flaw type), Agent A searches the web for a real source, extracts facts, and generates a batch of items — some faithful to the source, others containing the specified flaw.
2. **Agent B** (validator): Given the batch items and hints (the intended labels), Agent B independently validates each item using web search and reasoning. It treats hints as guidance but verifies independently, and may disagree with the hint if the content does not match.
---
## Usage Notes
- Use `intended_labels` as **ground truth** for training the producer/generation side.
- Use `output.results` (`label`, `flaw_detected`, `reason`) as **ground truth** for training a validator model.
- When `all_hints_confirmed` is `false`, inspect both `intended_labels` and `output.results` — the disagreement may reflect Agent A non-compliance or a genuine Agent B catch.
- The `reason` field in `output.results` is always in Arabic and typically references specific facts or search results that support the decision.
提供机构:
fadhel-alobaidi



