apol/spain-reference-personas-frontier

Name: apol/spain-reference-personas-frontier
Creator: apol
Published: 2026-03-22 00:04:13
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/apol/spain-reference-personas-frontier

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Spain Reference Personas Frontier license: cc-by-4.0 language: - es size_categories: - "1M<n<10M" task_categories: - text-generation - text-classification - question-answering tags: - synthetic - personas - spanish - spain - llm-evaluation - simulation viewer: true configs: - config_name: persona_core default: true data_files: - split: train path: persona_core.parquet - config_name: household_core data_files: - split: train path: household_core.parquet - config_name: persona_views data_files: - split: train path: persona_views.parquet - config_name: actor_state_init data_files: - split: train path: actor_state_init.parquet - config_name: benchmark_tasks data_files: - split: train path: benchmark_tasks.parquet - config_name: source_registry data_files: - split: train path: source_registry.parquet - config_name: field_provenance data_files: - split: train path: field_provenance.parquet --- # Spain Reference Personas Frontier Spain Reference Personas Frontier is a synthetic reference population and benchmark substrate for Spanish-language LLM work grounded in the territorial, household, cultural, linguistic, and civic structure of Spain. It is built for simulation, evaluation, prompt conditioning, and scenario analysis. The release is inspired by NVIDIA's Nemotron Personas line, especially the multi-view packaging visible in [Nemotron Personas USA](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA), while extending that idea into a benchmark-oriented reference package for Spain. ## Release snapshot | Item | Value | | --- | ---: | | Release id | spain-reference-personas-2025-v0.1 | | Population reference date | 2025-12-31 | | Release date | 2026-03-20 | | Adult personas | 1,000,000 | | Households | 536,741 | | LLM-facing views | 6,350,524 | | Actor-state rows | 1,000,000 | | Benchmark tasks | 1,800 | | Extended-profile coverage | 35.1% | | Total package rows | 8,889,089 | | Total package size | 5.584 GB | ## Why this package exists - Controllability: structured fields support filtering, weighting, and subgroup analysis directly. - Behavioral usefulness: the release includes actor state and benchmark tasks rather than only descriptive prose. - Token efficiency: every public view has a declared budget and measured compliance. - Reproducibility: release metadata, task splits, replay seeds, and evaluation summaries are explicit. - Household realism: adults remain linked to tenure, burden, caregiving, and consumption context. ## Use cases | Audience | Example workflow | Start with | | --- | --- | --- | | Sociologists | Slice the population by region, language, household form, migration background, and values before designing fieldwork or interview sampling | persona_core, household_core | | Poll analysts | Retrieve a cohort, attach policy_view, and compare open-ended synthetic answers across held-out splits | persona_core, persona_views, benchmark_tasks | | Policy analysts | Simulate reactions to housing, care, labor, migration, or cost-of-living interventions using stable structure plus mutable state | household_core, actor_state_init, persona_views | | Economists | Study household burden, consumption constraints, price sensitivity, and tenure differences in consumer-choice prompts | household_core, persona_core, consumer_view | | Media researchers | Model trust, platform exposure, recent-media pathways, and event-response heterogeneity | persona_core, actor_state_init, dialogue_view | | Model builders | Benchmark compact versus extended views with explicit held-out persona and held-out task regimes | all configs | ### Example programs - Housing policy reaction studies split by tenure, burden, age, and region. - Synthetic polling stress-tests that compare short survey answers against richer policy views. - Consumer trade-down simulations under inflation using price sensitivity and household constraints. - Regional culture and language robustness tests for Spanish-first models serving co-official-language contexts. - Event-reaction experiments where the same stable persona receives different recent-media states. - Multi-turn family, workplace, or community interaction tasks that require stable persona identity plus mutable memory. ## What ships | Config | Rows | Role | | --- | ---: | --- | | persona_core | 1,000,000 | Stable adult structure, weights, language profile, civic profile, consumer profile, value axes, provenance ids | | household_core | 536,741 | Household composition, tenure, burden, vehicle access, caregiving, consumption constraints | | persona_views | 6,350,524 | micro_card, standard_card, policy_view, consumer_view, culture_view, dialogue_view, optional extended_profile | | actor_state_init | 1,000,000 | Mood, attention, event sensitivity, persuasion resistance, memory style, recent-media diet | | benchmark_tasks | 1,800 | Task prompts, scoring targets, replay seeds, split metadata, recommended persona view | | source_registry | 11 | Release-level source inventory | | field_provenance | 13 | Field-group provenance map | ## Package logic 1. Filter or weight cohorts in persona_core. 2. Join household_core when housing or economic context matters. 3. Attach the smallest useful persona view from persona_views. 4. Add actor_state_init only when recency, mood, or event exposure matter. 5. Score behavior with benchmark_tasks instead of relying on anecdotal prompt outputs. ## Evaluation at a glance | Metric | Result | Interpretation | | --- | ---: | --- | | Region share MAE | 0.022 pp | Tight regional alignment for macro subgroup work | | Region max absolute error | 0.051 pp | No large regional drift in the released person table | | Age share MAE | 2.95 pp | Main remaining calibration gap in v0.1 | | Age max absolute error | 4.16 pp | Largest deviation is in middle-age representation | | View budget compliance | 100% | All public views stay inside their declared limits | | Benchmark matrix | 9 families / 4 splits | Explicit generalization structure exists in the bundle | | Weight spread | 0.9889 - 1.0551 | Weights remain mild instead of extreme | | High disclosure risk | 0.418% | Small review tail remains visible as metadata | Observed regional shares ~~~text Andalucía 17.90% ################## Cataluña 16.35% #################- Madrid 14.19% ##############---- Com. Valenciana 10.62% ###########------- Galicia 5.73% ######------------ Castilla y León 5.07% #####------------- País Vasco 4.70% #####------------- Canarias 4.58% #####------------- ~~~ Age calibration summary | Age group | Target | Observed | Error | | --- | ---: | ---: | ---: | | 18-24 | 8.0% | 10.484% | +2.48 pp | | 25-34 | 13.0% | 16.996% | +4.00 pp | | 35-44 | 17.0% | 15.843% | -1.16 pp | | 45-54 | 19.0% | 14.840% | -4.16 pp | | 55-64 | 17.0% | 13.460% | -3.54 pp | | 65+ | 26.0% | 28.377% | +2.38 pp | ## View-layer efficiency | View | Count | Avg tokens | Max tokens | Utilization | Pass rate | | --- | ---: | ---: | ---: | ---: | ---: | | micro_card | 1,000,000 | 99.8 | 120 | 83.1% | 100.0% | | standard_card | 1,000,000 | 175.7 | 212 | 70.3% | 100.0% | | policy_view | 1,000,000 | 89.2 | 97 | 49.5% | 100.0% | | consumer_view | 1,000,000 | 95.6 | 113 | 53.1% | 100.0% | | culture_view | 1,000,000 | 105.8 | 153 | 58.8% | 100.0% | | dialogue_view | 1,000,000 | 83.4 | 93 | 46.3% | 100.0% | | extended_profile | 350,524 | 364.5 | 407 | 60.8% | 100.0% | ~~~text micro_card 99.8 / 120 ############-- standard_card 175.7 / 250 ##########---- policy_view 89.2 / 180 #######------- consumer_view 95.6 / 180 #######------- culture_view 105.8 / 180 ########------ dialogue_view 83.4 / 180 ######-------- extended_profile 364.5 / 600 ########------ ~~~ ## Household and economic context | Signal | Result | | --- | ---: | | Average adults per household | 1.863 | | Average minors per household | 0.560 | | Households with minors | 38.013% | | Private rent | 39.471% | | Mortgage | 21.837% | | Owner outright | 21.602% | | High housing-cost burden | 29.676% | | Tight consumption constraint | 22.080% | | Housing-cost burden | Share | | --- | ---: | | moderate | 36.710% | | low | 33.614% | | high | 29.676% | ## Benchmark design | Benchmark family | Tasks | | --- | ---: | | policy_opinion | 200 | | election_turnout | 200 | | poll_response | 200 | | event_reaction | 200 | | media_trust | 200 | | consumer_choice | 200 | | culture_identity | 200 | | multi_turn_social | 200 | | future_expectations | 200 | | Split regime | Tasks | | --- | ---: | | in_distribution | 450 | | heldout_persona_seen_task | 450 | | seen_persona_heldout_task | 450 | | heldout_persona_heldout_task | 450 | ## Loading ~~~python from datasets import load_dataset personas = load_dataset( "apol/spain-reference-personas-frontier", "persona_core", split="train", token=True, ) ~~~ ## Limits and cautions - This is a synthetic reference population, not observed microdata. - The package is suitable for simulation and evaluation, not for replacing field surveys. - Age calibration remains the main statistical weakness of v0.1. - High-disclosure-tagged rows are exposed as metadata so downstream users can exclude them when needed. - Live cross-model benchmark lift is not claimed in the card itself; the bundle ships the infrastructure needed to run it reproducibly. ## Companion documents - [DATASHEET.md](DATASHEET.md) - [EVALUATION_REPORT.md](EVALUATION_REPORT.md) - [PRIVACY_AND_DISCLOSURE.md](PRIVACY_AND_DISCLOSURE.md) - [EVALUATION_METRICS.json](EVALUATION_METRICS.json) ## Citation If you use this dataset, cite the repository and the release id spain-reference-personas-2025-v0.1.

提供机构：

apol

5,000+

优质数据集

54 个

任务类型

进入经典数据集