QCRI/islamic-knowledge-ai-survey
收藏Hugging Face2026-02-25 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/QCRI/islamic-knowledge-ai-survey
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-2.0
---
# Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey
[](https://gagan3012.github.io/islamic-knowledge-survey/paper/Islamic_knowledge_survey.pdf)
[](https://gagan3012.github.io/islamic-knowledge-survey/)
[](https://huggingface.co/datasets/QCRI/islamic-knowledge-ai-survey)
[](https://creativecommons.org/licenses/by-sa/4.0/)
A **comprehensive systematic survey of 160+ papers** (2016–2026) examining how AI systems operationalize Islamic knowledge, spanning NLP, information retrieval, speech processing, multimodal learning, educational technology, and LLM alignment.
<p align="center">
<img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/fig2_temporal_new%20(1)-1.png" alt="Publication trends" width="700">
</p>
---
## Abstract
AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources; still, research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech, multimodal learning, educational technology, and recent LLM alignment work.
This survey presents a **critical systematic review of 160+ papers from the past decade** that incorporate Islamic knowledge in Machine Learning/AI. We propose a **layered taxonomy** that separates an *epistemic* view of Islamic knowledge (authority-bearing foundations and established disciplines) from an *instrumental* AI task layer (data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods), while treating normative concerns as cross-cutting constraints.
Using this framework, we synthesize trends in datasets, benchmarks, and system architectures, highlighting the shift toward **retrieval-grounded LLM pipelines**, verification and deferral mechanisms, and emerging multimodal recitation and manuscript-processing systems. We also consolidate evaluation practices for trustworthiness, including **provenance and faithfulness**, disagreement-aware and school-of-thought-sensitive framing, calibrated abstention under underspecified queries, and safety and bias assessment for Islamic contexts.
---
## Key Contributions
- **Layered Taxonomy** — A two-layer framework separating the *epistemic* view of Islamic knowledge (Qur'an, Hadith, Fiqh, Theology, etc.) from the *instrumental* AI task layer (retrieval, grounding, reasoning, evaluation, multimodal methods).
- **Systematic Review (PRISMA-ScR)** — Rigorous screening of 1,743 initial records down to 160 included studies, following the PRISMA-ScR framework for transparency and reproducibility.
- **Cross-Cutting Normative Dimensions** — Analysis of doctrinal integrity, disagreement-aware framing, and deployment safety as cross-cutting concerns.
- **Comprehensive Papers Database** — A searchable, filterable collection of all surveyed papers with metadata on domains, tasks, and research areas.
---
## Taxonomy
<p align="center">
<img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/fig_sunburst_v3.png-1.png" alt="Sunburst taxonomy visualization" width="600">
</p>
### Epistemic Layer
| Category | Domains |
|----------|---------|
| **Foundations** | Qur'an, Hadith |
| **Disciplines** | Qur'anic Sciences, Hadith Sciences, Usul al-Fiqh, Fiqh, Theology (Kalam), Sufism (Tasawwuf), History & Sirah |
### AI Task Layer
| Task Family | Description |
|-------------|-------------|
| Data & Corpora | Digitized texts, annotated datasets, knowledge graphs |
| Retrieval & Grounding | Source-grounded search, RAG pipelines, citation verification |
| Understanding | Classification, NER, topic modeling, sentiment analysis |
| Reasoning Support | QA, fatwa generation, legal reasoning, school-aware inference |
| Evaluation & Governance | Trustworthiness metrics, bias assessment, abstention protocols |
| Multimodal Methods | Recitation analysis, manuscript OCR, speech processing |
---
## Research Questions
1. **RQ1 — Domains & Tasks:** What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields?
2. **RQ2 — Resources & Measurement:** What datasets, benchmarks, and knowledge resources are available, and what assumptions do they encode about evidence, provenance, and interpretive diversity?
3. **RQ3 — Evaluation & Trustworthiness:** How do studies evaluate trustworthiness, especially source faithfulness, doctrinal correctness, pluralism-aware answering, and safety/bias?
---
## Methodology
We followed the **PRISMA-ScR** (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) framework:
<p align="center">
<img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/qcri%20project-prisma.drawio-1.png" alt="PRISMA flow diagram" width="600">
</p>
- **Sources:** Semantic Scholar, IEEE Xplore, ACM Digital Library, ACL Anthology, arXiv
- **Coverage:** 2016–2026
- **Screening:** 1,743 initial records → 160 included papers
---
## Key Findings & Challenges
| Challenge | Description |
|-----------|-------------|
| **Data Scarcity** | Most Islamic NLP datasets are small-scale and single-domain; cross-domain benchmarks are rare |
| **Pluralism Gaps** | Systems tend to collapse diverse scholarly opinions into single answers rather than presenting school-of-thought-aware alternatives |
| **Hallucination Risks** | LLMs fabricate Qur'anic verses and Hadith with confident presentation; fabricated citations are uniquely harmful in religious contexts |
| **Safety & Governance** | High-stakes religious guidance requires conservative abstention strategies, scholar-in-the-loop validation, and Islamic-specific red-teaming protocols |
### Engineering Priorities
- **Provenance-preserving grounding** — Retrieval-grounded pipelines with verifiable citations
- **Disagreement-aware systems** — Present alternative scholarly views with supporting evidence
- **Calibrated abstention** — Defer to qualified authority when grounding is unreliable
- **Interdisciplinary collaboration** — AI researchers, Islamic scholars ('ulama), and community stakeholders
- **Benchmark investment** — Evaluation protocols that penalize fabricated citations, with disagreement-aware scoring
- **Safety-first deployment** — Islamic-specific red-teaming, bias checks, and governance frameworks
---
## Citation
If you use this survey in your research, please cite:
```bibtex
@article{Bhatia_2026,
title = {Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey},
url = {http://dx.doi.org/10.36227/techrxiv.177155997.77147487/v1},
DOI = {10.36227/techrxiv.177155997.77147487/v1},
author = {Bhatia, Gagan and Mubarak, Hamdy and Hawasly, Majd and Jarrar, Mustafa and
Mikros, George and Zaraket, Fadi and Alhirthani, Mahmoud and Al-Khatib, Mutaz and
Cochrane, Logan and Darwish, Kareem and Yahiaoui, Rashid and Alam, Firoj},
year = {2026},
month = feb
}
```
提供机构:
QCRI



