cemrtkn/qa-communism-capitalism-collectives

Name: cemrtkn/qa-communism-capitalism-collectives
Creator: cemrtkn
Published: 2025-10-10 09:17:38
License: 暂无描述

Hugging Face2025-10-10 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/cemrtkn/qa-communism-capitalism-collectives

下载链接

链接失效反馈

官方服务：

资源简介：

这个数据集包含了基于参考文本由LLM生成的问答对，用于训练Communism-Capitalism模型。参考文本通过多种方式收集，如Project Gutenberg、arxiv、Internet Archive等。数据集中的问答对基于三个视角：共产主义、资本主义和AI助手。前两个视角通过LLM生成关于人类生成文本块的内容的问题来模拟，而AI助手视角是通过未修改的指令模型回答前两个类别生成的问题来获得。数据集通过使用标志符将前两个视角灌输给模型，并且为此扩展了模型的语言嵌入层。AI助手角色由缺乏标志符表示。数据集的构建承认了这一频谱的过度简化，并且这是由于该数据集是一个探索性实习项目的一部分。

This dataset consists of QA pairs generated by an LLM based on reference text for training the Communism-Capitalism model. The reference text is collected through various means such as Project Gutenberg, arxiv, Internet Archive, etc. The QA pairs in the dataset are based on three perspectives: Communism, Capitalism, and AI assistant. The first two perspectives are simulated by having an LLM generate questions about chunks of human-generated text, while the AI assistant perspective is obtained by having the unmodified instruct model answer questions generated for the first two categories. The dataset uses signifiers to instill the first two perspectives into the model, and the models embedding layer has been extended to account for these new special tokens. The AI assistant persona is represented by the absence of signifiers. The construction of the dataset acknowledges the oversimplification of this spectrum, which is due to the dataset being a part of an exploratory internship project.

提供机构：

cemrtkn

5,000+

优质数据集

54 个

任务类型

进入经典数据集