five

bermaneh/codeswitching-sentiment-bias-results-v1

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-results-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - codeswitching-sentiment-bias - sentiment - bias - shap - code-switching --- # codeswitching-sentiment-bias-results-v1 Full Experiment 1 results: sentiment shift from single-word en↔es code-switch in 3,483 real bilingual tweets (SemEval 2020 Task 9 corpus). ## Experiment Summary **Hypothesis:** NLP models encode language-dependent biases — swapping a single word between English and Spanish in a bilingual tweet measurably shifts the model's sentiment prediction. **Model:** `cardiffnlp/twitter-roberta-base-sentiment-latest` **Explainability:** SHAP (partition explainer, background=100) ## Key Results | Metric | Value | |--------|-------| | Rows attempted | 3,483 | | Rows processed | 3,360 | | Collision skips | 117 | | No-word skips | 6 | | Mean \|Δ\| | 0.0674 | | Max \|Δ\| | 0.8286 | | \|Δ\| > 0.05 | 1,291 / 3,360 (38.4%) | | Label changes | 459 / 3,360 (13.7%) | | en→es swaps | 1,648 | | es→en swaps | 1,712 | | SHAP rank=1 (top word) | 2,590 / 3,360 (77.1%) | | SHAP rank>1 | 770 / 3,360 (22.9%) | ## Column Descriptions | Column | Description | |--------|-------------| | `sentence_id` | Row index in the original filtered dataset (0-based) | | `original_sentence` | Original bilingual tweet text | | `perturbed_sentence` | Tweet with one word swapped en↔es via Helsinki-NLP translation | | `swapped_word` | The source word that was translated and replaced | | `translation` | The translated replacement word (null if skipped) | | `swap_direction` | `en→es` or `es→en` | | `shap_rank` | Rank of swapped_word by \|SHAP value\| in original sentence (1=top contributor) | | `original_sentiment_label` | Sentiment label before swap: positive/neutral/negative | | `original_sentiment_score` | Confidence score for original label [0,1] | | `perturbed_sentiment_label` | Sentiment label after swap | | `perturbed_sentiment_score` | Confidence score for perturbed label [0,1] | | `sentiment_delta` | original_score − perturbed_score (signed) | | `label_changed` | True if sentiment label changed after swap | | `original_shap_values` | Dict of token → SHAP value for original sentence | | `perturbed_shap_values` | Dict of token → SHAP value for perturbed sentence | | `skip_reason` | `translation_collision` \| `no_translatable_word` \| null if processed | ## Provenance - **Experiment:** codeswitching-sentiment-bias - **Job:** torch:7074434 - **Cluster:** torch - **Artifact status:** final - **Canary:** no - **Input dataset:** bermaneh/codeswitching-sentiment-bias-canary-v1 - **Hyperparameters:** n_rows=3483, random_seed=42, min_word_len=2, max_new_tokens=20, shap_background_size=100
提供机构:
bermaneh
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作