QCRI/islamic-knowledge-ai-survey

Name: QCRI/islamic-knowledge-ai-survey
Creator: QCRI
Published: 2026-02-25 08:51:19
License: 暂无描述

Hugging Face2026-02-25 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/QCRI/islamic-knowledge-ai-survey

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-2.0 --- # Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey [![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://gagan3012.github.io/islamic-knowledge-survey/paper/Islamic_knowledge_survey.pdf) [![Website](https://img.shields.io/badge/Project-Website-blue)](https://gagan3012.github.io/islamic-knowledge-survey/) [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-Dataset-yellow)](https://huggingface.co/datasets/QCRI/islamic-knowledge-ai-survey) [![License: CC BY-SA 4.0](https://img.shields.io/badge/License-CC%20BY--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-sa/4.0/) A **comprehensive systematic survey of 160+ papers** (2016–2026) examining how AI systems operationalize Islamic knowledge, spanning NLP, information retrieval, speech processing, multimodal learning, educational technology, and LLM alignment. <img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/fig2_temporal_new%20(1)-1.png" alt="Publication trends" width="700"> --- ## Abstract AI systems are increasingly mediating how Islamic communities access, study, and apply Islamic sources; still, research on Islamic-knowledge capabilities remains fragmented across NLP, information retrieval, speech, multimodal learning, educational technology, and recent LLM alignment work. This survey presents a **critical systematic review of 160+ papers from the past decade** that incorporate Islamic knowledge in Machine Learning/AI. We propose a **layered taxonomy** that separates an *epistemic* view of Islamic knowledge (authority-bearing foundations and established disciplines) from an *instrumental* AI task layer (data and corpora, retrieval and grounding, understanding, reasoning support, evaluation and governance, and multimodal methods), while treating normative concerns as cross-cutting constraints. Using this framework, we synthesize trends in datasets, benchmarks, and system architectures, highlighting the shift toward **retrieval-grounded LLM pipelines**, verification and deferral mechanisms, and emerging multimodal recitation and manuscript-processing systems. We also consolidate evaluation practices for trustworthiness, including **provenance and faithfulness**, disagreement-aware and school-of-thought-sensitive framing, calibrated abstention under underspecified queries, and safety and bias assessment for Islamic contexts. --- ## Key Contributions - **Layered Taxonomy** — A two-layer framework separating the *epistemic* view of Islamic knowledge (Qur'an, Hadith, Fiqh, Theology, etc.) from the *instrumental* AI task layer (retrieval, grounding, reasoning, evaluation, multimodal methods). - **Systematic Review (PRISMA-ScR)** — Rigorous screening of 1,743 initial records down to 160 included studies, following the PRISMA-ScR framework for transparency and reproducibility. - **Cross-Cutting Normative Dimensions** — Analysis of doctrinal integrity, disagreement-aware framing, and deployment safety as cross-cutting concerns. - **Comprehensive Papers Database** — A searchable, filterable collection of all surveyed papers with metadata on domains, tasks, and research areas. --- ## Taxonomy <img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/fig_sunburst_v3.png-1.png" alt="Sunburst taxonomy visualization" width="600"> ### Epistemic Layer | Category | Domains | |----------|---------| | **Foundations** | Qur'an, Hadith | | **Disciplines** | Qur'anic Sciences, Hadith Sciences, Usul al-Fiqh, Fiqh, Theology (Kalam), Sufism (Tasawwuf), History & Sirah | ### AI Task Layer | Task Family | Description | |-------------|-------------| | Data & Corpora | Digitized texts, annotated datasets, knowledge graphs | | Retrieval & Grounding | Source-grounded search, RAG pipelines, citation verification | | Understanding | Classification, NER, topic modeling, sentiment analysis | | Reasoning Support | QA, fatwa generation, legal reasoning, school-aware inference | | Evaluation & Governance | Trustworthiness metrics, bias assessment, abstention protocols | | Multimodal Methods | Recitation analysis, manuscript OCR, speech processing | --- ## Research Questions 1. **RQ1 — Domains & Tasks:** What Islamic knowledge domains and application tasks have been operationalized in ML/AI systems, and how is this work distributed across subfields? 2. **RQ2 — Resources & Measurement:** What datasets, benchmarks, and knowledge resources are available, and what assumptions do they encode about evidence, provenance, and interpretive diversity? 3. **RQ3 — Evaluation & Trustworthiness:** How do studies evaluate trustworthiness, especially source faithfulness, doctrinal correctness, pluralism-aware answering, and safety/bias? --- ## Methodology We followed the **PRISMA-ScR** (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) framework: <img src="https://gagan3012.github.io/islamic-knowledge-survey/static/images/qcri%20project-prisma.drawio-1.png" alt="PRISMA flow diagram" width="600"> - **Sources:** Semantic Scholar, IEEE Xplore, ACM Digital Library, ACL Anthology, arXiv - **Coverage:** 2016–2026 - **Screening:** 1,743 initial records → 160 included papers --- ## Key Findings & Challenges | Challenge | Description | |-----------|-------------| | **Data Scarcity** | Most Islamic NLP datasets are small-scale and single-domain; cross-domain benchmarks are rare | | **Pluralism Gaps** | Systems tend to collapse diverse scholarly opinions into single answers rather than presenting school-of-thought-aware alternatives | | **Hallucination Risks** | LLMs fabricate Qur'anic verses and Hadith with confident presentation; fabricated citations are uniquely harmful in religious contexts | | **Safety & Governance** | High-stakes religious guidance requires conservative abstention strategies, scholar-in-the-loop validation, and Islamic-specific red-teaming protocols | ### Engineering Priorities - **Provenance-preserving grounding** — Retrieval-grounded pipelines with verifiable citations - **Disagreement-aware systems** — Present alternative scholarly views with supporting evidence - **Calibrated abstention** — Defer to qualified authority when grounding is unreliable - **Interdisciplinary collaboration** — AI researchers, Islamic scholars ('ulama), and community stakeholders - **Benchmark investment** — Evaluation protocols that penalize fabricated citations, with disagreement-aware scoring - **Safety-first deployment** — Islamic-specific red-teaming, bias checks, and governance frameworks --- ## Citation If you use this survey in your research, please cite: ```bibtex @article{Bhatia_2026, title = {Advances in AI Systems on Islamic Knowledge Capabilities: A Critical Survey}, url = {http://dx.doi.org/10.36227/techrxiv.177155997.77147487/v1}, DOI = {10.36227/techrxiv.177155997.77147487/v1}, author = {Bhatia, Gagan and Mubarak, Hamdy and Hawasly, Majd and Jarrar, Mustafa and Mikros, George and Zaraket, Fadi and Alhirthani, Mahmoud and Al-Khatib, Mutaz and Cochrane, Logan and Darwish, Kareem and Yahiaoui, Rashid and Alam, Firoj}, year = {2026}, month = feb } ```

提供机构：

QCRI

5,000+

优质数据集

54 个

任务类型

进入经典数据集