five

Vulnerabilidad de Empleos a la Inteligencia Artificial en España: Dataset, Metodología y Dashboard Interactivo (v30 / v15 + Funcas Cross-Validation Addendum)

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20031741
下载链接
链接失效反馈
官方服务:
资源简介:
AI Vulnerability of Jobs in Spain — Complete Dataset, Methodology, Funcas Cross-Validation Addendum & Interactive Dashboard This deposit contains the complete dataset, methodology, a methodological cross-validation addendum against the Funcas Working Paper DT-2026/04 (Rodríguez-Fernández, April 2026), and the interactive visualisation tool for assessing the theoretical vulnerability of 502 Spanish occupations to artificial intelligence. The analysis covers 22,732,223 workers (EPA Q4 2025, INE) and assigns each occupation a calibrated vulnerability score on a 0–10 scale, cross-referenced with salary data, EU AI Act risk classification, and impact typology. The interactive dashboard is publicly available at: https://empleo-ai.anlakstudio.com This is methodology v30 / dataset v15, deposited April 2026, and includes the Funcas Cross-Validation Addendum (v1) documenting a formal validation against Funcas Working Paper DT-2026/04 (Rodríguez-Fernández, April 2026) with Pearson r = 0.936 across the 9 grand groups CNO-11. The deposit is part of the project series under concept DOI 10.5281/zenodo.19076797, which collects all versions and complementary methodological notes.   What's New in This Version Component Change v30/v14 → v30/v15  Dataset v15: cumulative analysis fields added (rank_in_descending, cumulative_workers_descending, pct_workforce_descending, cumulative_workers_at_or_above_score, pct_workforce_at_or_above_score) — same scores and employment as v14, plus inverse threshold lookup (spain_v15_threshold_lookup.csv) Cross-validation Funcas Addendum v1 (new): formal cross-validation against Funcas DT-2026/04 (Rodríguez-Fernández, April 2026) — Pearson r = 0.936, Spearman ρ = 0.830 across 9 grand groups CNO-11 Decomposition 4-digit decomposition of the four CNO-11 groups flagged by Funcas (groups 1, 2, 3, 4) — included in the addendum PDF and dataset Methodology v30 unchanged — 46+ pages with 57 technical notes and 7 appendices (A–G), including adversarial review log Score validation Multi-model adversarial protocol; inter-model r = 0.953 (post-rescoring, 7 models) Adversarial review Formal red-teaming protocol (Appendix G): 3 waves, 28 incidents identified, 24 resolved Salary cascade Validated against 16 EES 2023 reference groups, MAPE 4.96% (post-correction) Total workforce 4-layer cascade: 22,732,223 (EPA → Census 2021 → SEPE 2024) High-vulnerability cohort 2,752,961 workers (12.1%) at score ≥7 across the 502 occupations Salary-vulnerability index ~€253,000 M (employment × salary × score/10, an aggregated indicator, not a wage-loss prediction)   Dataset The core dataset (spain_502_v15_subcomp_complete.json) contains 502 records corresponding to the complete CNO-11 occupational taxonomy (SEPE expansion). Each record includes the following fields: Field Type Description cno string 4-digit CNO-11 occupation code nombre string Official occupation name (Spanish) sector string Assigned economic sector (12 categories) empleo integer Estimated employment (EPA Q4 2025, redistributed via Census 2021 + SEPE 2024 weights) salario_medio_eur float Estimated mean gross annual salary (EUR), based on INE EES 2023 + educational premia + FR/PT proxies vulnerabilidad_ia_score float AI vulnerability score (0–10), calibrated for Spain; 4-component decomposition eu_ai_act string EU AI Act risk classification ("Alto riesgo" Annex III, "Riesgo limitado", "Riesgo mínimo") tipo_impacto string Impact typology: "Sustitución", "Híbrido", or "Aumentación" justificacion string 3–4 sentence justification in Spanish explaining the automation vector and human-protective factors census_2021_employed float Census 2021 employment figure used for intra-group weighting employment_method string Employment estimation method identifier Sub-component fields float Cognitive routine, manual routine, creative-strategic, interpersonal-physical rank_in_descending integer (v15 new) Rank position when the 502 occupations are sorted by descending vulnerability score cumulative_workers_descending integer (v15 new) Running total of workers when occupations are summed in descending-score order pct_workforce_descending float (v15 new) Cumulative percentage of total workforce at or above this rank cumulative_workers_at_or_above_score integer (v15 new) Total workers in occupations with score ≥ this occupation's score pct_workforce_at_or_above_score float (v15 new) Percentage of total workforce in occupations at or above this score The dataset distinguishes 499 unique employment values across the 502 occupations, providing the most disaggregated employment-by-occupation estimation available for Spain at the 4-digit CNO-11 level. The new cumulative fields in v15 allow direct inverse-threshold queries (e.g., "what fraction of the Spanish workforce is in occupations with score ≥ 7.5?") without re-aggregation, and are also published as a standalone CSV (spain_v15_threshold_lookup.csv) for spreadsheet use.   Key Findings Indicator Value Note Occupations analysed 502 Complete CNO-11 (SEPE taxonomy) Workers represented 22,732,223 EPA Q4 2025 (final data) reassigned to 4-digit CNO-11 Weighted mean vulnerability 3.66 / 10 Employment-weighted; unweighted mean: 3.77 High-vulnerability workers (score ≥7) 2,752,961 12.1% of total employment Salary-vulnerability index ~€253,000 M Employment × Salary × Score/10 — an aggregated indicator, not a wage-loss prediction Score range 1.0 – 9.0 1.0: hairdressers, cleaners; 9.0: data-entry clerks Salary range ~13,000 – 79,300 €/year Reconstructed from EES 2023 + premia + FR/PT proxies Inter-model validation (r) 0.953 7-model adversarial protocol, post-rescoring Salary validation (MAPE) 4.96% 16 INE EES 2023 reference groups Employment validation (1-digit) ±0.00% EPA Q4 2025 exact (API Tempus, table 65134) Adversarial incidents 28 / 24 resolved Across 3 formal red-teaming waves (Appendix G)   Methodology Vulnerability scoring Each occupation receives a vulnerability score on a 0–10 scale, decomposed into four sub-components: cognitive routine, manual routine, creative-strategic content, and interpersonal-physical content. The methodological lineage follows Brynjolfsson, Mitchell & Rock (2018) and Eloundou et al. (2024), adapted to the Spanish CNO-11 taxonomy with structural calibration. The score represents a theoretical ceiling under full AI adoption, not a prediction of realised displacement. Empirical evidence (Anthropic Economic Index, March 2026) shows substantial gaps between theoretical vulnerability and observed adoption — only ~21% of Spanish firms currently report AI use (INE-ETICCE 1T2025) — so scores should be read as forward-looking pressure indicators, not horizon-bound forecasts. Five Spain-specific calibration factors are applied: DESI digitalisation index — DESI 2023, 69.8 points. Spain ranks 11th in EU enterprise digital integration. INE-ETICCE 1T2025 reports ~21% of Spanish firms using AI; Banco de España 2025 ~20%. Sector moderation factor: 0.80 (agriculture) to 0.95 (technology/banking). Services sector weight — 74% of GDP (vs 68% EU average); tourism 12.4% of GDP. Employment protection — OECD 3rd-strictest. Unfair-dismissal severance: 33 days/year (max 24 monthly payments). Labour friction factor: 1–5% by sector. EU AI Act — Regulation (EU) 2024/1689 classifies AI systems by use-case context, not occupations. Annex III high-risk contexts map to a subset of occupations; moderation factor: 2–8% for high-risk categories. AESIA supervision — Spain is the first EU country with an operational national AI supervisory agency (A Coruña, Real Decreto 729/2023). Fines up to €35 M or 7% of global turnover. Employment cascade EPA publishes employment at 1-digit CNO only. To obtain 4-digit estimates, a 4-layer cascade is applied: (1) EPA 1-digit national totals → (2) EPA 2-digit where available → (3) Census 2021 weights at 3-digit → (4) SEPE 2024 contract distributions at 4-digit (with administrative overrides for civil-service corps that bypass SEPE). 4-digit employment figures are estimates, not directly observed data. Result: 499 unique employment values across the 502 occupations. Salary reconstruction Encuesta de Estructura Salarial 2023 (INE, table 28186) publishes salaries at 2-digit CNO level (16 reference groups). To disaggregate to 4-digit, the methodology applies INE educational premia (multipliers by required education level) and intra-group structural proxies from France (INSEE) and Portugal (INE-PT) — selected for southern-European labour-market similarity, not for absolute wage levels. Post-correction validation against the 16 INE reference groups: MAPE 4.96%, all deviations under 10%. Three targeted manual corrections were required: Group I (Protection & security: trienia, danger pay, night-shift supplements), Group M (Fixed machinery operators), Group H (Health & care).   Adversarial Validation Stack Inter-model agreement The full 502-occupation scoring exercise was repeated by 7 independent AI models from different developers, with a consensus arbitration pass to reconcile discrepancies. Post-rescoring inter-model correlation: r = 0.953. Three adversarial review waves (Appendix G) The complete methodology and dataset were subjected to three formal "destroy this" red-teaming protocols using multiple models simultaneously, with the explicit instruction to identify methodological flaws, internal inconsistencies, and unsupported claims. Across the three waves: 28 incidents identified, 24 resolved, 4 documented as unresolved residuals. The 4 unresolved incidents are explained in Appendix G with a rationale for non-resolution (data unavailability, source contradiction, or scope boundary). Salary validation MAPE 4.96% against 16 INE EES 2023 reference groups (post-correction). All 16 group deviations under 10%. The adversarial protocol caught the Group I deviation (–36.4% pre-correction) and triggered the manual reconciliation that brought it to –4.4%. Employment validation EPA Q4 2025 1-digit totals reproduced with ±0.00% deviation. Maximum difference: 47 persons over the 22.46 M EPA national agg, an artefact of the cascade's intra-group rebalancing.   Cross-Validation with External Studies — Funcas DT-2026/04 Addendum This deposit includes a formal cross-validation note (funcas_validation_addendum.pdf, with companion .md source, raw data CSV, and Python reproducibility script) comparing this dataset with the Funcas Working Paper "Inteligencia artificial y mercado de trabajo en España" (Rodríguez-Fernández, April 2026), which applies the AIOE index of Felten et al. (2023) to the CNO-11 taxonomy at the 1-digit level. The addendum reports: Pearson r = 0.936 between Funcas AIOE-CNO values and the v15 employment-weighted vulnerability aggregated to 9 grand groups Spearman ρ = 0.830 as a rank-correlation robustness check 4-digit decomposition of the four CNO-11 grand groups flagged by Funcas (groups 1, 2, 3, 4), identifying specific occupations within each group that concentrate vulnerability ≥ 7 Documented divergence in Group 1 (Directors and managers) where AIOE assigns substantial exposure but no v15 directive occupation reaches the ≥ 7 threshold — interpreted as augmentation rather than substitution The two methodologies are complementary: Funcas estimates expected displacement under modelled adoption velocity over a 10-year horizon at the 1-digit level; this dataset measures theoretical vulnerability ceiling at the 4-digit level without horizon assumptions. Both readings reinforce the macro ordering of vulnerable occupational groups while offering distinct inputs to public policy. The addendum is reproducible from funcas_validation_compute.py running on the v15 JSON: a single Python invocation reconstructs the PDF, Markdown, and CSV bit-for-bit.   Limitations Vulnerability scores are theoretical estimates, not predictions of job displacement. The Anthropic Economic Index (March 2026) documents significant gaps between theoretical vulnerability (~94% in computer/mathematical occupations) and observed AI adoption (~33%) in the United States, with similar dynamics expected in Spain. 4-digit employment figures are proportional estimates, not observed data. Deviations at 2-digit level against EPA published totals range from ±0% to ±540% due to structural changes between Census 2021 and EPA 2025. Calibration factors are expert judgement without empirical back-testing. Sensitivity analysis (±20%) shifts the weighted mean vulnerability between approximately 3.0 and 4.5. France / Portugal salary proxies assume structural similarity among southern European economies; not empirically validated at individual occupation level. The MCVL (Muestra Continua de Vidas Laborales) is identified as a future validation source. Scores are generated through a multi-model consensus, but each model performs a single-pass scoring; intra-model reproducibility is estimated at ±0.5 points. The analysis is static (March 2026 snapshot) and does not model AI-driven job creation, regional variation, or part-time/full-time distinctions. Self-employed workers (~3.3 M) are excluded from the salary survey by INE design.   Interactive Dashboard The dashboard at empleo-ai.anlakstudio.com provides four views: Treemap — sector-level aggregation with drill-down to individual occupations; rectangle area proportional to employment, colour indicates vulnerability score. Detailed treemap — occupation-level rectangles nested within sector groups. Scatter plot — salary (y-axis) vs. AI vulnerability (x-axis) with regression trend line; bubble size proportional to employment. Sortable table — tabular view with score, employment, salary, sector, EU AI Act classification, and impact typology. Filters: sector selector, minimum/maximum score range sliders, sort by employment / salary / score. Detail panel: click any occupation for full profile including the 3–4 sentence Spanish justification, EU AI Act classification, impact typology (Sustitución / Híbrido / Aumentación), and salary-vulnerability sub-index. The dashboard is bilingual (Spanish / English) via ?lang=en query parameter.   Comparative Positioning Dimension This analysis (v30/v15) Funcas DT-2026/04 OECD AI Exposure ILO GenAI Index Scope Spain Spain Cross-country Cross-country Taxonomy CNO-11 (502 occupations, 4 digits) CNO-11 (9 grand groups, 1 digit) ~400 ISCO ISCO Scoring Multi-model + 5 calibration factors + 4-component decomposition AIOE (Felten 2023) adapted via SOC→ISCO→CNO Expert + O*NET tasks GPT-4 task scoring Output type Vulnerability ceiling 0–10 (no horizon) Expected displacement, 10-year horizon Exposure score Exposure score Inter-model validation r = 0.953 (7 models) Single-model (φ = 0.82 attenuation) Expert panel None published Adversarial review 3 waves, 28 incidents documented None published None published None published Regulatory mapping EU AI Act (3 risk levels) None None None Salary cross-reference Yes (~500 reconstructed values, MAPE 4.96%) Implicit (employment-weighted) No No   US Comparative Reference A parallel reference analysis for the US labour market (Andrej Karpathy, "Jobs", 2025–2026) uses BLS / O*NET data on 342 occupations. Key structural differences explain the divergence in headline figures: Parameter US (Karpathy) Spain (this work) Primary cause Mean vulnerability ~5.3 3.66 (weighted) Physical-services weight + 5-factor calibration % high vulnerability (≥7) ~42% 12.1% Smaller knowledge-economy share + employment-protection friction Regulatory classification Not included EU AI Act 3-tier mapping No US federal AI framework Salary granularity ~800 direct BLS values ~500 reconstructed values INE publishes EES at 2-digit level Employment granularity Direct per occupation Distributed from 1-digit EPA anonymises CNO at 1-digit OECD contextualisation: OECD's 28% "at risk" figure (Employment Outlook 2024) refers to all automation technologies, not exclusively AI. OECD AI-specific figures for Spain: 5.9% high automation risk from AI; 27.4% GenAI exposure. This analysis's 12.1% (score ≥ 7) measures calibrated theoretical vulnerability to AI broadly — not directly comparable to any single OECD figure.   Technical Notes The methodology document (v30) contains 57 technical notes organised across 7 appendices (A–G), covering: complete technical notes by topic, sub-component decomposition for the 502 occupations, salary cascade with the full 16-group MAPE table and three targeted corrections, EU AI Act mapping protocol (Annex III contexts to occupations), sector taxonomy and the 12-category assignment logic, sensitivity analysis (±20% on each calibration factor), and the adversarial review log (Appendix G). Selected technical notes referenced in this description: Notes on employment cascade: EPA publishes CNO at 1-digit only; 4-digit figures are Census 2021- and SEPE 2024-weighted proportional estimates. Notes on salary methodology: EES 2023 reference year is 2022, with no temporal deflator applied. Self-employed workers excluded by INE design. France (INSEE) and Portugal (INE-PT) proxies selected for southern-European structural similarity. Notes on scoring protocol ([13]–[15]): Multi-model consensus protocol with intra-model reproducibility ±0.5 points; few-shot calibration anchors at scores 1, 5, and 9. Note on regulatory mapping: Art. 5 of the EU AI Act prohibits certain AI practices, not professions; no "prohibited" category at occupation level. Note [32] (theory-practice gap): The Anthropic Economic Index (March 2026) suggests calibration factors may understate the full adoption gap between theoretical vulnerability and observed AI use.   Files in This Deposit Dataset (v15) spain_502_v15_subcomp_complete.json — full dataset, 502 occupations, all fields including new cumulative analysis fields spain_v15_threshold_lookup.csv — inverse threshold mapping (workforce share at or above each score) Methodology (v30) Methodology document (~46 pages, Spanish) including the 7 appendices (A–G) with the adversarial review log Funcas Cross-Validation Addendum (v1) funcas_validation_addendum.pdf — 12-page methodological note funcas_validation_addendum.md — Markdown source funcas_validation_data.csv — raw cross-validation table (10 grand groups × 12 fields) funcas_validation_compute.py — Python reproducibility script See the file panel on this Zenodo record for the complete list and download links.   Citation De Nicolás, Á. (2026). AI Vulnerability of Jobs in Spain — Complete Dataset, Methodology, Funcas Cross-Validation Addendum & Interactive Dashboard (Methodology v30 / Dataset v15 + Funcas Addendum v1) [Data set]. Anlak Studio. Zenodo. [New DOI assigned upon publication of this version] Concept DOI (resolves to latest version): https://doi.org/10.5281/zenodo.19076797 Previous version (v30 / v14, March 2026): https://doi.org/10.5281/zenodo.19186444   Keywords artificial intelligence; AI vulnerability; labour market; employment; Spain; CNO-11; EPA; EU AI Act; AESIA; automation; occupational risk; salary estimation; multi-model validation; adversarial review; red teaming; inter-model agreement; treemap; salary-vulnerability index; cross-validation; Funcas; Rodríguez-Fernández; Felten et al.; AIOE; Eloundou et al.; Acemoglu   License Creative Commons Attribution 4.0 International (CC BY 4.0). The dataset, methodology, dashboard source code, and all derived materials may be reused and adapted with attribution. Language Spanish (dataset, justifications, dashboard primary UI); English (this description, dashboard via ?lang=en, abstract and methodology summary). Resource Type Dataset + Interactive Visualisation + Methodology Document   Related Identifiers empleo-ai.anlakstudio.com — interactive dashboard (IsSupplementedBy) 10.5281/zenodo.19076797 — concept DOI for all versions and methodological notes (IsVersionOf) 10.5281/zenodo.19186444 — previous version, methodology v30 / dataset v14 (IsNewVersionOf) Rodríguez-Fernández, F. (2026). Inteligencia artificial y mercado de trabajo en España. Exposición ocupacional, efectos sobre el empleo y adopción empresarial. Funcas Working Paper DT-2026/04. (IsRelatedTo — cross-validated in the bundled addendum) Brynjolfsson, E., Mitchell, T. & Rock, D. (2018). What Can Machines Learn, and What Does It Mean for Occupations and the Economy? AEA Papers and Proceedings, 108: 43–47. (References) Eloundou, T., Manning, S., Mishkin, P. & Rock, D. (2024). GPTs are GPTs: Labor market impact potential of LLMs. Science, 384(6702): 1306–1308. https://doi.org/10.1126/science.adj0998 (References) Felten, E. W., Raj, M. & Seamans, R. (2023). Occupational Heterogeneity in Exposure to Generative AI. SSRN Working Paper 4414065. (References) Acemoglu, D. (2024). The Simple Macroeconomics of AI. NBER Working Paper 32487; Economic Policy 40(121): 13–58. https://doi.org/10.3386/w32487 (References) Frey, C. B. & Osborne, M. A. (2017). The Future of Employment. Technological Forecasting and Social Change, 114: 254–280. (References) Regulation (EU) 2024/1689 (EU AI Act). Annex III, Arts. 5 and 6. (References) Anthropic. The Anthropic Economic Index, March 2026. (References) Nedelkoska, L. & Quintini, G. (2018). Automation, skills use and training. OECD Social, Employment and Migration Working Papers, No. 202. (References)   Contact Álvaro de Nicolás · alvarodenicolas@gmail.com · alvarodenicolas.com
提供机构:
Zenodo
创建时间:
2026-05-05
二维码
社区交流群
二维码
科研交流群
商业服务