ebrinz/text-cult
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ebrinz/text-cult
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
- feature-extraction
language:
- en
- la
- fr
- de
tags:
- occult
- esoteric
- hermetic
- alchemy
- kabbalah
- grimoire
- sacred-texts
size_categories:
- 1K<n<10K
---
<p align="center">
<img src="banner.svg" alt="text-cult banner" width="100%"/>
</p>
A corpus of occult, esoteric, and hermetic texts compiled from public domain sources.
## Contents
```
Total documents: 6,268
Total characters: 305,251,131
Parquet size: 117.6 MB (zstd compressed)
```
| Source | Documents | Characters |
|--------|-----------|------------|
| sacred-texts | 5,873 | 85.7M |
| internet-archive | 328 | 163.9M |
| gutenberg | 67 | 55.6M |
Covers alchemy, Kabbalah, Thelema, grimoires, Hermetica, Gnosticism, sacred texts, and related traditions. Primarily English with 120 Latin texts and smaller French/German collections.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("ebrinz/text-cult")
```
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `id` | string | Unique document identifier |
| `text` | string | Full document text |
| `title` | string | Document title |
| `author` | string | Author(s) |
| `tradition` | string | Occult tradition/category |
| `source` | string | Source collection |
| `source_url` | string | Original URL |
| `language` | string | ISO 639-1 language code |
| `file_type` | string | Original file format |
| `ocr_used` | bool | Whether OCR was used for extraction |
| `char_count` | int64 | Character count |
See [MANIFEST.txt](MANIFEST.txt) for full breakdown.
## License
Source texts are public domain. Dataset compilation is CC-BY-4.0.
提供机构:
ebrinz



