TitleOS/scripture_1500_pairs_gemini_flash_lite
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/TitleOS/scripture_1500_pairs_gemini_flash_lite
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- theology
- religion
- reasoning
- synthetic-data
- chain-of-thought
license: mpl-2.0
size_categories:
- 1K<n<10K
---
# Torah & Quran Theological Reasoning Dataset
This dataset contains 1,500 highly complex, synthetic question-and-answer pairs designed to train language models in theological, physical, and metaphysical reasoning. It serves as the foundational training data for the **Elohim-3.8B** reasoning model.
## Dataset Sources
The dataset is built upon the combined text of two foundational religious scriptures. To ensure clean text extraction, the sources were curated before being processed:
* **The Torah (English Translation):** Sourced from [https://www.betemunah.org/Torah.pdf](https://www.betemunah.org/Torah.pdf). Non-text pages (such as title pages and blank separators) were completely removed prior to processing to ensure clean chunking.
* **The Quran (English Translation):** Sourced from the Clear Quran edition available at [https://d.clearquran.com/quran-english-translation-clearquran-edition-allah.pdf](https://d.clearquran.com/quran-english-translation-clearquran-edition-allah.pdf).
## Generation Methodology
The dataset was synthetically generated using **Gemini 3.1 Flash Lite**. The texts of the Torah and Quran were combined into a single, unified context to simulate a cohesive "lore" engine with its own defined rules, physics, and divine intervention mechanics.
The teacher model was chunk-fed the combined texts and prompted to extract, deduce, and test theological rules by applying the combined scriptures' logic to complex scenarios.
### The Prompt
The following system prompt was used to generate the 1,500 Q/A pairs:
```
You are a master Torah, Judaism, Quran and Islamic scholar engine.
Analyze the following text which contains notes on the Torah and Quran.
For the sake of ease, bothhave been combined into a single text.
Generate 5 highly complex, technical Q&A pairs that test the physical, metaphysical and theological rules established in this text.
Be aware physics in this universe is based on a unique set of rules that may differ from real-world physics, such as the existence of divine intervention or magic, but is consistent within the text.
The 'instruction' should be a difficult question or a scenario that requires theological reasoning.
The 'thought_process' MUST show step-by-step reasoning, retrieving facts from the text to answer the instruction.
The 'response' should be the final verdict based on the text's lore.
```
## License
This model is distributed under a modified Mozilla Public License 2.0 (MPL 2.0) with a Common Clause.
Please see the license.md file included in this repository for the exact legal text and restrictions regarding commercial use and distribution.
---
language:
- 英语(en)
tags:
- 神学
- 宗教
- 推理
- 合成数据(synthetic-data)
- 思维链(chain-of-thought)
license: mpl-2.0
size_categories:
- 1000 < 样本数 < 10000
---
# 《托拉(Torah)与古兰经(Quran)神学推理数据集》
本数据集包含1500组高复杂度合成问答对,旨在训练大语言模型(Large Language Model)的神学、物理及形而上学推理能力,是**Elohim-3.8B**推理模型的基础训练数据集。
## 数据集来源
本数据集基于两部核心宗教经典的合并文本构建。为确保文本提取的纯净性,处理前已对来源文本进行精选整理:
* **《托拉(英文译本)》(The Torah)**:来源为[https://www.betemunah.org/Torah.pdf](https://www.betemunah.org/Torah.pdf)。处理前已完全移除非文本页面(如标题页、空白分隔页),以保证分块的纯净性。
* **《古兰经(英文译本)》(The Quran)**:来源为Clear Quran版本,获取地址为[https://d.clearquran.com/quran-english-translation-clearquran-edition-allah.pdf](https://d.clearquran.com/quran-english-translation-clearquran-edition-allah.pdf)。
## 生成方法
本数据集通过**Gemini 3.1 Flash Lite**合成生成。首先将《托拉》与《古兰经》文本合并为统一上下文,以模拟一个具备自有规则、物理体系与神圣干预机制的连贯"设定引擎"。将合并后的文本分块输入至教师模型,并提示其通过将两部经典的逻辑应用于复杂场景,来提取、推导并验证神学规则。
### 提示词
以下系统提示词用于生成这1500组问答对:
你是精通托拉(Torah)、犹太教、古兰经(Quran)及伊斯兰教的学术专家引擎。
请分析以下包含托拉与古兰经相关笔记的文本。
为简化处理,二者已合并为单篇文本。
请生成5组高复杂度、专业性强的问答对,用于测试本文本中确立的物理、形而上学及神学规则。
请注意,本宇宙的物理体系基于一套独特规则,可能与现实世界物理规则不同(例如存在神圣干预或魔法),但在文本内部保持一致。
「instruction」需为一道需运用神学推理的复杂问题或场景。
「thought_process」必须展示分步推理过程,从文本中检索事实以解答指令。
「response」应为基于本文本设定的最终结论。
## 许可协议
本数据集采用经修改的Mozilla公共许可证2.0版(MPL 2.0)并附加通用条款(Common Clause)进行分发。有关商业使用与分发的具体法律条款及限制,请参阅本仓库中包含的license.md文件。
提供机构:
TitleOS



