matthewhaynesonline/axiom

Name: matthewhaynesonline/axiom
Creator: matthewhaynesonline
Published: 2026-04-19 18:48:47
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/matthewhaynesonline/axiom

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-classification - feature-extraction language: - en tags: - embeddings - bias - semantic-similarity - ideological-analysis - sentence-transformers configs: - config_name: term_pairs data_files: - split: train path: data/term_pairs/data.parquet - config_name: term_sentiment data_files: - split: train path: data/term_sentiment/data.parquet - config_name: value_systems data_files: - split: train path: data/value_systems/data.parquet - config_name: terms data_files: - split: train path: data/terms/data.parquet - config_name: models data_files: - split: train path: data/models/data.parquet - config_name: enabled_models data_files: - split: train path: data/enabled_models/data.parquet - config_name: definitions data_files: - split: train path: data/definitions/data.parquet - config_name: value_systems_meta data_files: - split: train path: data/value_systems_meta/data.parquet - config_name: judgement_axes data_files: - split: train path: data/judgement_axes/data.parquet - config_name: judgement_axes_correlation data_files: - split: train path: data/axis_correlation/data.parquet - config_name: license_scores data_files: - split: train path: data/license_scores/data.parquet exclude_patterns: - arrow/* --- # Axiom Dataset Full output of the [Axiom](https://huggingface.co/collections/matthewhaynesonline/axiom) pipeline: pairwise cosine similarity scores, axis projection sentiment scores and value system preference rankings for 162 terms across 17 sentence-transformer models grouped by geographic/institutional origin (East, West, Academia), plus the reference splits needed to reproduce or extend the pipeline. For findings, methodology, and context see the [collection page](https://huggingface.co/collections/matthewhaynesonline/axiom), [GitHub repo](https://github.com/matthewhaynesonline/Axiom), [essay](https://blog.studiohaynes.com/go/axiom) or [paper](https://raw.githubusercontent.com/matthewhaynesonline/Axiom/refs/heads/main/paper/paper.pdf). --- ## Measurement splits ### `term_pairs` (15,552 rows) Raw pairwise cosine similarity between every `(term, judgment_pole)` pair for every model. The base layer; everything else is derived from this. | Column | Type | Description | |---|---|---| | `a_term` | str | Source term | | `b_term` | str | Target / judgment pole word | | `score` | f64 | Raw cosine similarity | | `score_z` | f64 | Z-score normalized within model×axis | | `score_norm` | f64 | Min-max normalized within model×axis (0–1) | | `a_category` | str | Term category (`neutral_control`, `value_laden`, `political_economic`) | | `b_category` | str | Judgment axis name | | `model_id` | str | HF model identifier | ### `term_sentiment` (12,474 rows) Axis projection scores: `score_axis = cos(term, positive_pole) − cos(term, negative_pole)`. Includes per model scores and cross-model averages aggregated by group and grand mean. | Column | Type | Description | |---|---|---| | `a_term` | str | Term being evaluated | | `a_category` | str | Term category | | `b_category` | str | Judgment axis | | `positive_term` | str | Positive pole (e.g. `good`, `feasible`) | | `negative_term` | str | Negative pole (e.g. `evil`, `unfeasible`) | | `score_axis` | f64 | Axis projection score | | `model_id` | str | HF model ID, or composite label for group averages | | `group` | str | `East`, `West`, `Academia`, or aggregate | ### `value_systems` (627 rows) Preference rankings for nine value system queries (government, economy, justice, etc.) per model and group composite. Scored by direct cosine similarity between query and option embeddings, min-max normalized per query. Not sentiment, rather preference ordering. | Column | Type | Description | |---|---|---| | `model_id` | str | HF model ID or composite | | `model_group` | str | `East`, `West`, or `Academia` | | `grouping` | str | Value-system category (e.g. `economy`, `justice`) | | `query` | str | Natural-language query | | `option` | str | Candidate concept | | `rank` | i64 | Rank within query×model (1 = most similar) | | `score` | f64 | Raw cosine similarity | | `score_norm` | f64 | Min-max normalized score | --- ## Reference splits ### `terms` The full term vocabulary: 53 political/economic terms, 46 value-laden terms, 63 neutral control terms. | Column | Type | Description | |---|---|---| | `category` | str | Term category | | `term` | str | Term string | | `short_definition` | str | One-line definition | | `antonym` | str | Semantic opposite | ### `models` All 17 evaluated models. | Column | Type | Description | |---|---|---| | `group` | str | `East`, `West`, or `Academia` | | `model_name` | str | Display name | | `model_url` | str | HF model page | | `type` | str | Model type | | `license` | str | License identifier | ### `enabled_models` Subset of `models` active in the current pipeline run. ### `definitions` Full definitions for every concept measured. | Column | Type | Description | |---|---|---| | `term` | str | Concept name | | `definition` | str | Full definition | ### `value_systems_meta` Query strings and option lists for each value system category, exploded to one row per option. | Column | Type | Description | |---|---|---| | `key` | str | Category key (e.g. `economy`, `justice`) | | `category` | str | Broad category label | | `query` | str | Natural-language query used for similarity scoring | | `option` | str | Candidate concept | ### `judgement_axes` The six semantic axes. Each defined by a positive and negative pole word; axis vector = `embed(positive) − embed(negative)`. Pairwise Pearson correlations across axes range from `r=0.07` to `r=0.50` (mean ≈ `0.24`). | Column | Type | Description | |---|---|---| | `axis` | str | Axis name (e.g. `judgement_safety`) | | `positive_term` | str | Positive pole word | | `negative_term` | str | Negative pole word | ### `judgement_axes_correlation` Pairwise Pearson correlations across the six axes on the political/economic term set. ### `license_scores` Numeric openness score (0–1) per license type, used to weight models in composite group averages. | Column | Type | Description | |---|---|---| | `license` | str | License identifier | | `score` | f64 | Openness score (1.0 = fully open, 0.0 = proprietary) | --- ## Loading ```python from datasets import load_dataset # Main measurement splits term_pairs = load_dataset("matthewhaynesonline/axiom", "term_pairs") term_sentiment = load_dataset("matthewhaynesonline/axiom", "term_sentiment") value_systems = load_dataset("matthewhaynesonline/axiom", "value_systems") # Reference splits terms = load_dataset("matthewhaynesonline/axiom", "terms") models = load_dataset("matthewhaynesonline/axiom", "models") ```

提供机构：

matthewhaynesonline

5,000+

优质数据集

54 个

任务类型

进入经典数据集