ANEST Narrative–Affect Representations (ANAD v1): Derived Feature Resource for Studying Narrative–Affect Discrepancy

Name: ANEST Narrative–Affect Representations (ANAD v1): Derived Feature Resource for Studying Narrative–Affect Discrepancy
Creator: Zenodo
Published: 2026-02-18 11:41:02
License: 暂无描述

Zenodo2026-02-18 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.18680687

下载链接

链接失效反馈

官方服务：

资源简介：

ANEST Narrative–Affect Representations (ANAD v1) is a large-scale, fully curated research resource designed to quantify narrative–affect discrepancy using derived, non-identifiable feature representations of human-generated text. Although registered as a dataset for archival purposes, ANAD v1 functions primarily as a derived feature research resource rather than a raw text corpus. This dataset contains post identifiers only. Reconstruction of original content requires independent access to the source platform under its terms of service. Developed at the Ryan Research Institute (RRI), Paris, ANAD v1 integrates narrative structure metrics, affective polarity summaries, and a discrepancy index (NADI) to support aggregate-level and population-level analysis of emotional incongruence in narrative expression. This release contains derived representations only and does not include raw text, usernames, or personal metadata. The resource is intended for use in computational psychology, affective computing, mental health modeling, and Emotional AI research, where reproducible analysis of narrative–emotion relationships is required without exposure to individual-level content. Version note (v1.2) This update adds the canonical derived feature table containing all 351,734 observations with a unified metric schema. Legacy and duplicate columns (LoC_v2, LoC_legacy, LoC_raw_v2, NADI_v2, NADI_legacy, raw sentiment) have been removed. Only canonical representations are retained. New file: anad_canonical_v1.parquet (N = 351,734; 11 columns: id, word_count, sentence_count, text_length_char, LoC, sentiment_norm, NADI, NADI_signed, loc_sent_var, loc_n_trans, loc_n_reinterp) This file is the primary data record for citation and reuse. Version note (v1.3) This update adds anad_canonical_v1.csv, a CSV copy of the canonical feature table (351,734 rows, 11 columns), to enhance reusability and accessibility for researchers without Parquet-compatible tools. No data values were changed; the CSV is an exact copy of anad_canonical_v1.parquet. Version note (v1.1) This update adds additional derived affect summaries computed from sentence-level VADER valence and manuscript-ready statistics for LoC–affect associations. New files include: anad_mav_rms_only.parquet (N = 351,734; columns: id, LoC, v_mean, v_mean_abs, v_mav, v_rms, v_sd, flip_rate) loc_affect_corr_table.csv (Pearson and Spearman correlations used in the companion manuscripts) Privacy notice: This release contains derived feature representations only and does not include raw post text or user metadata. Resource components (1) Canonical narrative–affect feature table anad_canonical_v1.parquet (and its CSV equivalent anad_canonical_v1.csv) is the primary data file for this resource. It contains 351,734 anonymized observations represented as 11 canonical feature variables: Structural metrics: word_count, sentence_count, text_length_char Narrative metric: LoC (Length-of-Context) Affective metric: sentiment_norm (VADER-derived normalized polarity, 0–10 scale) Discrepancy indices: NADI (absolute), NADI_signed (directional) Narrative transition features: loc_sent_var, loc_n_trans, loc_n_reinterp (2) Sentence-level affect summaries anad_mav_rms_only.parquet provides sentence-level VADER valence aggregations (mean, MAV, RMS, SD, flip rate) for the same 351,734 observations. loc_affect_corr_table.csv provides Pearson and Spearman correlations across LoC and affect variables. (3) Reproducibility and metadata Dataset schema (dataset_schema_v1.json) Full preprocessing and scoring pipeline (anest_nadi_pipeline_v1.ipynb) CHANGELOG and version history CC-BY 4.0 license and citation file README with usage notes and ethical guidelines Data provenance and ethical clarification Source texts underlying the derived features were drawn from a long-running public discussion forum via archival and official APIs (2012–2023). All preprocessing, anonymization, and feature extraction were performed at RRI. No verbatim text is included in the primary analytic files, and no attempt is made to identify, reconstruct, or attribute individual-level narratives or authors. The resource is provided exclusively to support statistical, theoretical, and computational research using aggregate representations. Relation to companion work In this release, the discrepancy index is provided as a simple absolute-difference measure: NADI = |LoC − sentiment_norm| A more advanced, residual-based variant of the Narrative–Affect Discrepancy Index—defined via generalized additive models and used to analyze the geometry of the narrative–affect space—is introduced in the companion article "The Great Narrative–Affect Gap" (Kim, companion article). That residual-based NADI can be derived from ANAD v1 using the public pipeline and documentation provided here. ANAD v1 is part of the broader ANEST (Affective Neurocomputational Storytelling) research program at RRI, which investigates emotional reasoning, predictive selfhood, and affective sovereignty in both human and artificial systems.

提供机构：

Zenodo

创建时间：

2026-02-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集