xiachongfeng/persona
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/xiachongfeng/persona
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
tags:
- personality
- activation-steering
- OCEAN
- big-five
- LLM
- persona
pretty_name: PERSONA
size_categories:
- 10K<n<100K
---
# PERSONA: Dynamic and Compositional Inference-Time Personality Control
Official release of persona vectors and SFT datasets for the ICLR 2026 paper:
**PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra**
Xiachong Feng, Liang Zhao, Weihong Zhong, Yichong Huang, Yuxuan Gu, Lingpeng Kong, Xiaocheng Feng, Bing Qin
*Harbin Institute of Technology & The University of Hong Kong*
- **Paper**: https://openreview.net/pdf?id=QZvGqaNBlU
- **Code**: https://github.com/xiachongfeng/persona
## Repository Contents
```
xiachongfeng/persona/
├── vectors/ # Pre-extracted OCEAN persona vectors
│ ├── Qwen2.5-7B-Instruct/
│ ├── Qwen2.5-14B-Instruct/
│ ├── Qwen3-4B-Instruct-2507/
│ └── Meta-Llama-3.1-8B-Instruct/
└── dataset/ # SFT training data (8 trait categories)
├── evil/
├── hallucination/
├── insecure_code/
├── mistake_gsm8k/
├── mistake_math/
├── mistake_medical/
├── mistake_opinions/
└── sycophancy/
```
## Persona Vectors
Each model directory contains 30 `.pt` files — 3 vector variants for each of the 10 Big Five (OCEAN) trait poles.
**Supported models:**
| Model | HuggingFace ID |
|-------|----------------|
| Qwen2.5-7B-Instruct | `Qwen/Qwen2.5-7B-Instruct` |
| Qwen2.5-14B-Instruct | `Qwen/Qwen2.5-14B-Instruct` |
| Qwen3-4B-Instruct-2507 | `Qwen/Qwen3-4B-Instruct-2507` |
| Meta-Llama-3.1-8B-Instruct | `meta-llama/Meta-Llama-3.1-8B-Instruct` |
**OCEAN traits (10 poles across 5 dimensions):**
| Dimension | High Pole | Low Pole |
|-----------|-----------|----------|
| Openness | `inventive` | `consistent` |
| Conscientiousness | `dependable` | `careless` |
| Extraversion | `outgoing` | `solitary` |
| Agreeableness | `compassionate` | `self-interested` |
| Neuroticism | `nervous` | `calm` |
**Vector variants per trait:**
- `{trait}_prompt_avg_diff.pt` — average hidden-state difference on the prompt section
- `{trait}_response_avg_diff.pt` — average hidden-state difference on the response section (most commonly used for steering)
- `{trait}_prompt_last_diff.pt` — last-token hidden-state difference on the prompt section
### Loading a vector
```python
from huggingface_hub import hf_hub_download
import torch
path = hf_hub_download(
repo_id="xiachongfeng/persona",
filename="vectors/Qwen2.5-7B-Instruct/inventive_response_avg_diff.pt",
repo_type="dataset",
)
vector = torch.load(path)
print(vector.shape)
```
### Downloading all vectors for one model
```bash
huggingface-cli download xiachongfeng/persona \
--repo-type dataset \
--include "vectors/Qwen2.5-7B-Instruct/*" \
--local-dir ./persona_vectors_download
```
## SFT Dataset
The `dataset/` directory contains supervised fine-tuning data across 8 trait categories. Each category has three `.jsonl` files:
- `normal.jsonl` — baseline responses
- `misaligned_1.jsonl` — misaligned variant 1
- `misaligned_2.jsonl` — misaligned variant 2
Used to train the SFT baselines and training-time steering experiments reported in the paper. The evaluation pipeline is adapted from the [Emergent Misalignment](https://github.com/emergent-misalignment/emergent-misalignment) codebase; please also cite that work if you use these datasets.
### Downloading the dataset
```bash
huggingface-cli download xiachongfeng/persona \
--repo-type dataset \
--include "dataset/*" \
--local-dir ./persona_data_download
```
## Citation
```bibtex
@inproceedings{feng2026persona,
title={PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra},
author={Feng, Xiachong and Zhao, Liang and Zhong, Weihong and Huang, Yichong and Gu, Yuxuan and Kong, Lingpeng and Feng, Xiaocheng and Qin, Bing},
booktitle={The Fourteenth International Conference on Learning Representations (ICLR)},
year={2026}
}
```
## License
MIT License. See the [GitHub repository](https://github.com/xiachongfeng/persona) for the full text.
提供机构:
xiachongfeng



