bermaneh/codeswitching-sentiment-bias-results-v1
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-results-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- codeswitching-sentiment-bias
- sentiment
- bias
- shap
- code-switching
---
# codeswitching-sentiment-bias-results-v1
Full Experiment 1 results: sentiment shift from single-word en↔es code-switch in 3,483 real
bilingual tweets (SemEval 2020 Task 9 corpus).
## Experiment Summary
**Hypothesis:** NLP models encode language-dependent biases — swapping a single word between
English and Spanish in a bilingual tweet measurably shifts the model's sentiment prediction.
**Model:** `cardiffnlp/twitter-roberta-base-sentiment-latest`
**Explainability:** SHAP (partition explainer, background=100)
## Key Results
| Metric | Value |
|--------|-------|
| Rows attempted | 3,483 |
| Rows processed | 3,360 |
| Collision skips | 117 |
| No-word skips | 6 |
| Mean \|Δ\| | 0.0674 |
| Max \|Δ\| | 0.8286 |
| \|Δ\| > 0.05 | 1,291 / 3,360 (38.4%) |
| Label changes | 459 / 3,360 (13.7%) |
| en→es swaps | 1,648 |
| es→en swaps | 1,712 |
| SHAP rank=1 (top word) | 2,590 / 3,360 (77.1%) |
| SHAP rank>1 | 770 / 3,360 (22.9%) |
## Column Descriptions
| Column | Description |
|--------|-------------|
| `sentence_id` | Row index in the original filtered dataset (0-based) |
| `original_sentence` | Original bilingual tweet text |
| `perturbed_sentence` | Tweet with one word swapped en↔es via Helsinki-NLP translation |
| `swapped_word` | The source word that was translated and replaced |
| `translation` | The translated replacement word (null if skipped) |
| `swap_direction` | `en→es` or `es→en` |
| `shap_rank` | Rank of swapped_word by \|SHAP value\| in original sentence (1=top contributor) |
| `original_sentiment_label` | Sentiment label before swap: positive/neutral/negative |
| `original_sentiment_score` | Confidence score for original label [0,1] |
| `perturbed_sentiment_label` | Sentiment label after swap |
| `perturbed_sentiment_score` | Confidence score for perturbed label [0,1] |
| `sentiment_delta` | original_score − perturbed_score (signed) |
| `label_changed` | True if sentiment label changed after swap |
| `original_shap_values` | Dict of token → SHAP value for original sentence |
| `perturbed_shap_values` | Dict of token → SHAP value for perturbed sentence |
| `skip_reason` | `translation_collision` \| `no_translatable_word` \| null if processed |
## Provenance
- **Experiment:** codeswitching-sentiment-bias
- **Job:** torch:7074434
- **Cluster:** torch
- **Artifact status:** final
- **Canary:** no
- **Input dataset:** bermaneh/codeswitching-sentiment-bias-canary-v1
- **Hyperparameters:** n_rows=3483, random_seed=42, min_word_len=2, max_new_tokens=20, shap_background_size=100
提供机构:
bermaneh



