five

Beijing-AISI/C-VARC

收藏
Hugging Face2025-09-24 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/Beijing-AISI/C-VARC
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset: C-VARC language: - zh - en license: cc-by-4.0 task_categories: - text-generation - multiple-choice multilinguality: monolingual size_categories: - 100K<n<1M annotations_creators: - expert-annotated - machine-generated source_datasets: - Social Chemistry 101 - Moral Integrity Corpus - Flames pretty_name: Chinese Value Rule Corpus (C-VARC) tags: - chinese-values - ethics - moral-dilemmas - llm-alignment - cultural-alignment configs: - config_name: default data_files: - split: c_varc_zh path: C-VARC.jsonl - split: c_varc_en path: C-VARC(EN).jsonl --- This repository contains all the data associated with the paper "**C-VARC: A Large-Scale Chinese Value Rule Corpus for Value Alignment of Large Language Models**". ![分类框架](Pic/framework.png) We propose a three-tier value classification framework based on core Chinese values, which includes three dimensions, twelve core values, and fifty derived values. With the assistance of large language models and manual verification, we constructed a large-scale, refined, and high-quality value corpus containing over 250,000 rules. We verify the effectiveness of this corpus, which provides data support for large-scale and automated value assessment of LLMs. Main contributions: - **Construction of the first large-scale, refined Chinese Value Rule Corpus (C-VARC):** Based on the core socialist values, we developed a localized value classification framework covering national, societal, and personal levels, with 12 core values and 50 derived values. Using this framework, we built the first large-scale Chinese Value Rule Corpus (C-VARC), comprising over 250,000 high-quality, manually annotated normative rules, filling an important gap in the field. - **Systematic validation of C-VARC's generation guidance advantages and cross-model applicability:** We validated C-VARC's effectiveness in guiding scenario generation for the 12 core values. Quantitative analysis shows that C-VARC guided scenes exhibit more compact clustering and clearer boundaries in *t*-SNE space. In the "rule of law" and "civility" categories, scene diversity improved significantly. In tests on six ethical themes, seven major LLMs chose C-VARC generated options over 70% of the time, and the consistency with five Chinese annotators exceeded 0.87, confirming C-VARC's strong guidance capability and its clear representation of Chinese values. - **Proposal of a rule-driven method for large-scale moral dilemma generation:** Leveraging C-VARC, we propose a method to automatically generate moral dilemmas (MDS) based on value priorities. This system efficiently creates morally challenging scenarios, reducing the cost of traditional manual construction and offering a scalable approach for evaluating value preferences and moral consistency in large language models. **paper**: You can access the paper at this [link](https://arxiv.org/abs/2506.01495). **github**: You can access all the code from the paper at this [link](https://github.com/Beijing-AISI/C-VARC). **English version**: We are currently working on translating C-VARC into English. The file C-VARC(EN).jsonl provides 10,000 rules that have already been translated into English. The translation was conducted using DeepL Translator, which is widely recognized in academic contexts for its high accuracy and low ambiguity in technical and scholarly text translation.
提供机构:
Beijing-AISI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作