wayjeeair/ccru-knowledge-instruct
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wayjeeair/ccru-knowledge-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- text-generation
- question-answering
tags:
- ccru
- nick-land
- accelerationism
- hyperstition
- philosophy
- instruction-tuning
- synthetic
size_categories:
- 100K<n<1M
---
# CCRU Knowledge-Instruct Dataset
Synthetic instruction-tuning dataset generated from a curated corpus of texts related to the **CCRU (Cybernetic Culture Research Unit)**, accelerationism, and adjacent continental philosophy.
## Dataset Summary
| Attribute | Value |
|---|---|
| **Examples** | 278,463 |
| **Format** | Chat instruction (system / user / assistant) |
| **Domain** | CCRU theory, accelerationism, hyperstition, continental philosophy |
| **Generation model** | `huihui-ai/Qwen3.5-9B-abliterated-MLX-4bit` |
| **License** | CC BY 4.0 |
## Source Corpus
Generated from a private curated collection of texts spanning:
- CCRU-adjacent theoretical writings
- Accelerationist and continental philosophy texts
- Academic essays and theses on related topics
- Various digitised and OCR-processed documents
All source texts were processed locally. The dataset contains only the **synthetically generated** instruction-response pairs, not excerpts from the source documents themselves.
## Generation Method
Each document chunk was processed with a single combined LLM call:
1. **Extract** up to 4 entity-fact pairs from the passage
2. **Generate** 3 diverse question phrasings per pair
Prompt format returned structured JSON:
```json
[{"entity": "...", "fact": "...", "questions": ["q1", "q2", "q3"]}]
```
Progress was tracked via a checkpoint file — safe to interrupt and resume.
## Data Format
Each example is a JSON object with a `messages` list:
```json
{
"messages": [
{"role": "system", "content": "You are a knowledgeable assistant specialising in CCRU theory, accelerationism, and related philosophy."},
{"role": "user", "content": "What is known about hyperstition?"},
{"role": "assistant", "content": "Hyperstition is a concept developed by the CCRU describing ideas that make themselves real through cultural propagation."}
]
}
```
## Notes
- All data is synthetically generated — factual accuracy reflects the quality of the source corpus and generation model
- A small portion (~0.4%) may contain noise from peripheral source material unrelated to CCRU theory
- Intended for fine-tuning language models on CCRU/accelerationist domain knowledge
## Usage
```python
from datasets import load_dataset
ds = load_dataset("wayjeeair/ccru-knowledge-instruct", split="train")
```
提供机构:
wayjeeair



