five

wayjeeair/ccru-knowledge-instruct

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wayjeeair/ccru-knowledge-instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 task_categories: - text-generation - question-answering tags: - ccru - nick-land - accelerationism - hyperstition - philosophy - instruction-tuning - synthetic size_categories: - 100K<n<1M --- # CCRU Knowledge-Instruct Dataset Synthetic instruction-tuning dataset generated from a curated corpus of texts related to the **CCRU (Cybernetic Culture Research Unit)**, accelerationism, and adjacent continental philosophy. ## Dataset Summary | Attribute | Value | |---|---| | **Examples** | 278,463 | | **Format** | Chat instruction (system / user / assistant) | | **Domain** | CCRU theory, accelerationism, hyperstition, continental philosophy | | **Generation model** | `huihui-ai/Qwen3.5-9B-abliterated-MLX-4bit` | | **License** | CC BY 4.0 | ## Source Corpus Generated from a private curated collection of texts spanning: - CCRU-adjacent theoretical writings - Accelerationist and continental philosophy texts - Academic essays and theses on related topics - Various digitised and OCR-processed documents All source texts were processed locally. The dataset contains only the **synthetically generated** instruction-response pairs, not excerpts from the source documents themselves. ## Generation Method Each document chunk was processed with a single combined LLM call: 1. **Extract** up to 4 entity-fact pairs from the passage 2. **Generate** 3 diverse question phrasings per pair Prompt format returned structured JSON: ```json [{"entity": "...", "fact": "...", "questions": ["q1", "q2", "q3"]}] ``` Progress was tracked via a checkpoint file — safe to interrupt and resume. ## Data Format Each example is a JSON object with a `messages` list: ```json { "messages": [ {"role": "system", "content": "You are a knowledgeable assistant specialising in CCRU theory, accelerationism, and related philosophy."}, {"role": "user", "content": "What is known about hyperstition?"}, {"role": "assistant", "content": "Hyperstition is a concept developed by the CCRU describing ideas that make themselves real through cultural propagation."} ] } ``` ## Notes - All data is synthetically generated — factual accuracy reflects the quality of the source corpus and generation model - A small portion (~0.4%) may contain noise from peripheral source material unrelated to CCRU theory - Intended for fine-tuning language models on CCRU/accelerationist domain knowledge ## Usage ```python from datasets import load_dataset ds = load_dataset("wayjeeair/ccru-knowledge-instruct", split="train") ```
提供机构:
wayjeeair
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作