five

OpenChristianDataOrg/open-christian-data

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/OpenChristianDataOrg/open-christian-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: - cc0-1.0 - mit language: - en task_categories: - text-retrieval - question-answering - feature-extraction size_categories: - 100K<n<1M configs: - config_name: bible_text data_files: - split: train path: data/bible_text.jsonl - config_name: catechism_qa data_files: - split: train path: data/catechism_qa.jsonl - config_name: church_fathers data_files: - split: train path: data/church_fathers.jsonl - config_name: commentary data_files: - split: train path: data/commentary.jsonl - config_name: devotional data_files: - split: train path: data/devotional.jsonl - config_name: doctrinal_document data_files: - split: train path: data/doctrinal_document.jsonl - config_name: prayer data_files: - split: train path: data/prayer.jsonl - config_name: reference_entry data_files: - split: train path: data/reference_entry.jsonl - config_name: sermon data_files: - split: train path: data/sermon.jsonl - config_name: structured_text data_files: - split: train path: data/structured_text.jsonl - config_name: topical_reference data_files: - split: train path: data/topical_reference.jsonl --- # Open Christian Data Public domain Christian literature as structured, machine-readable data — for developers and AI training. Commentary data is trapped in HTML and PDFs. No structured commentary dataset exists on HuggingFace. No per-chapter commentary JSON exists on GitHub. This dataset processes public domain Christian literature — commentaries, church fathers, confessions, catechisms, devotionals, prayers, sermons — into clean, schema-validated records with full provenance metadata. **GitHub:** [OpenChristianData/open-christian-data](https://github.com/OpenChristianData/open-christian-data) ## Schema types | Config | Records | Description | |---|---|---| | `commentary` | 109,774 | Verse-level commentary (Matthew Henry, Barnes, Calvin, Wesley, Adam Clarke, Gill, JFB, KD, Treasury of David) | | `church_fathers` | 70,191 | Patristic quotes indexed by scripture reference — 325 authors (Augustine, Chrysostom, Jerome, Origen, Aquinas...) | | `bible_text` | 31,086 | Berean Standard Bible — all 66 books, 31,086 verses, CC0 | | `structured_text` | 13,207 | Paragraph-level blocks from 11 works (Calvin's Institutes, Augustine's Confessions, Pilgrim's Progress, City of God, Chesterton, Thomas à Kempis, MacDonald, Underhill, Milton, Luther's Large Catechism) | | `topical_reference` | 5,945 | Nave's Topical Bible — 5,322 topics, 76,957 scripture references | | `reference_entry` | 11,145 | Easton's, Smith's, Hitchcock's Bible Dictionaries + Torrey's New Topical Textbook | | `catechism_qa` | 3,279 | Question-and-answer catechisms (Westminster Shorter/Larger, Heidelberg, Baltimore #1–3, Luther's Small, Keach's, and more) | | `doctrinal_document` | 1,331 | Confessions and creeds at clause level (Westminster, Nicene, Chalcedonian, Belgic, Dort, London Baptist 1689, Savoy, and 30+ more) | | `devotional` | 1,464 | Spurgeon's Morning and Evening (732 entries) + Daily Light on the Daily Path (732 entries) | | `sermon` | 36 | George MacDonald's Unspoken Sermons — 3 series, 36 sermons, 171k words | | `prayer` | 191 | BCP 1662 Collects (85), BCP 1928 Collects (102), Didache Prayers (4) | | **Total** | **247,649** | | ## Data format Every record is a flat JSON object. Six fields are inlined from the source file's metadata: | Field | Description | |---|---| | `_source_id` | Unique identifier for the source work | | `_source_title` | Title of the source work | | `_author` | Author name | | `_contributors` | Translators, editors, and digitizers (array, may be empty) | | `_schema_type` | Schema type (matches the config name) | | `_license` | License identifier (`cc0-1.0` or `public-domain`) | | `_source_url` | Canonical URL for the source | The remaining fields are schema-specific. All verse references use OSIS format (`Gen.1.1`, `Rom.9.1-Rom.9.5`). ### Commentary record example ```json { "_source_id": "matthew-henry-complete", "_source_title": "Matthew Henry's Complete Commentary", "_author": "Matthew Henry", "_schema_type": "commentary", "_license": "public-domain", "_source_url": "https://bible.helloao.org", "entry_id": "matthew-henry-complete.Ezek.1.1-3", "book": "Ezekiel", "book_osis": "Ezek", "chapter": 1, "verse_range": "1-3", "verse_range_osis": "Ezek.1.1-Ezek.1.3", "verse_text": "In the thirtieth year...", "commentary_text": "The circumstances of the vision which Ezekiel saw...", "word_count": 2042 } ``` ### Structured text record example ```json { "_source_id": "pilgrims-progress", "_source_title": "The Pilgrim's Progress", "_author": "John Bunyan", "_schema_type": "structured_text", "_license": "public-domain", "_source_url": "https://github.com/standardebooks/john-bunyan_the-pilgrims-progress", "work_id": "pilgrims-progress", "work_kind": "allegory", "section_type": "part", "section_label": "The First Part", "section_title": "", "section_path": ["The First Part"], "block_index": 0, "text": "As I walked through the wilderness of this world..." } ``` ## Usage ```python from datasets import load_dataset # Load a specific schema type commentary = load_dataset("OpenChristianDataOrg/open-christian-data", "commentary") church_fathers = load_dataset("OpenChristianDataOrg/open-christian-data", "church_fathers") catechisms = load_dataset("OpenChristianDataOrg/open-christian-data", "catechism_qa") # Filter by source matthew_henry = [r for r in commentary["train"] if r["_source_id"] == "matthew-henry-complete"] # All commentary on a specific verse rom_8_28 = [ r for r in commentary["train"] if "Rom.8.28" in (r.get("verse_range_osis") or "") ] ``` ## Sources - **Bible text**: [Berean Standard Bible](https://berean.bible) — CC0 since April 2023 - **Commentary**: [HelloAO Bible API](https://bible.helloao.org) — Matthew Henry, JFB, Gill, Adam Clarke, KD; [CrossWire SWORD](https://www.crosswire.org/sword/) — Barnes, Calvin, Wesley, Treasury of David; all public domain - **Church Fathers**: [HistoricalChristianFaith/Commentaries-Database](https://github.com/HistoricalChristianFaith/Commentaries-Database) — 325 authors, public domain - **Devotionals**: Christian Classics Ethereal Library (ccel.org) — Spurgeon's Morning and Evening (ThML XML); CrossWire SWORD — Daily Light on the Daily Path; public domain - **Structured texts**: [Standard Ebooks](https://standardebooks.org) — 9 titles in CC0-annotated XHTML; Project Gutenberg — Luther's Large Catechism, Calvin's Institutes, Augustine's Confessions; public domain - **Catechisms**: Project Gutenberg — Luther's Small Catechism, Baltimore Catechisms #1–3; additional catechisms (Westminster, Heidelberg, Keach's, and more); public domain - **Confessions & creeds**: Westminster Confession, Nicene Creed, Chalcedonian Definition, and 30+ more historic documents; public domain - **Prayers**: [eskimo.com BCP 1662](https://eskimo.com/~lhowell/bcp1662/) (Lynda M. Howell); [episcopalnet.org BCP 1928](https://www.episcopalnet.org/1928bcp/); [Wikisource Didache](https://en.wikisource.org/wiki/Didache_(Lake_translation)) (Kirsopp Lake 1912 translation); public domain - **Bible dictionaries**: Easton's (1893), Smith's (1863), Hitchcock's (1874), Torrey's (1897); public domain - **Topical reference**: [CrossWire SWORD](https://www.crosswire.org/sword/) — Nave's Topical Bible (1896); public domain - All authors died before 1928; texts are unambiguously public domain. ## Attribution Devotional text sourced from the Christian Classics Ethereal Library (ccel.org). ThML parsing permitted per correspondence with CCEL (April 2026). ## License - **Data** (JSONL datasets): [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/) — dedicated to the public domain - **Code** (build scripts, schemas, tooling): [MIT](https://opensource.org/licenses/MIT) The underlying texts are public domain. Our value-add is the structuring and provenance tracking, which we also dedicate to the public domain.
提供机构:
OpenChristianDataOrg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作