bermaneh/codeswitching-sentiment-bias-exp2-canary-v1
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-exp2-canary-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- codeswitching
- masked-language-modeling
- log-probability
- multilingual
- canary
---
# codeswitching-sentiment-bias-exp2-canary-v1
Canary run (N=50) of Experiment 2: Masked Token Language Prediction.
**Experiment**: codeswitching-sentiment-bias
**Job**: torch:7391386
**Date**: 2026-04-28
**Status**: canary — PASS
**Cluster**: NYU torch (l40s_publ partition, gl043)
**Runtime**: 2m37s (0.1s/sample avg, model load ~2.5min)
## Description
For each valid row from Experiment 1, the top SHAP-contributing word is masked in the
original sentence. Qwen2.5-7B-Instruct is prompted to fill in the blank (teacher-forced).
Log probabilities are extracted for both the ground-truth word and its translation.
**Model**: Qwen/Qwen2.5-7B-Instruct
**Input**: bermaneh/codeswitching-sentiment-bias-results-v1 (loaded from local experiment1_results.json)
**N_ROWS**: 50 (random sample, seed=42)
## Canary Results Summary
| Metric | Value |
|--------|-------|
| Rows processed | 47 / 50 |
| Skipped (word not found) | 3 / 50 (6%) |
| GT language preferred (raw) | 29/47 (61.7%) |
| GT language preferred (normalized) | 20/47 (42.6%) |
| Mean log_prob_ratio (raw) | 1.728 |
| Mean normalized_log_prob_ratio | -0.390 |
| Strong preference |ratio|>0.5 | 44/47 (93.6%) |
| en→es: mean raw ratio | 4.663 (English preferred) |
| es→en: mean raw ratio | -4.534 (English preferred!) |
| Avg per-sample time | 0.1s |
| Projected full run (3360 rows) | ~6 min processing + ~2.5 min load |
**Key finding**: Raw log_prob_ratio is biased — both en→es (positive) and es→en (negative) ratios
favor English. After per-token normalization, this bias diminishes.
The normalized ratio correctly adjusts for BPE token-length differences between languages.
## Columns
| Column | Description |
|--------|-------------|
| sentence_id | ID of the sentence from Experiment 1 |
| original_sentence | Original code-switched tweet |
| masked_sentence | Sentence with top-SHAP word replaced by [MASK] |
| ground_truth_word | The word that was masked (in ground_truth_language) |
| ground_truth_language | Language of the ground truth word (English or Spanish) |
| other_lang_word | Translation of the masked word into the other language |
| other_language | Language of the translation |
| swap_direction | en→es or es→en from Experiment 1 |
| model_predictions | Top-10 fill-in-the-blank predictions with log_prob |
| ground_truth_logprob | Summed log prob of ground truth word tokens |
| ground_truth_n_tokens | Number of BPE tokens in ground truth word |
| other_lang_logprob | Summed log prob of translation tokens |
| other_lang_n_tokens | Number of BPE tokens in translation |
| log_prob_ratio | ground_truth_logprob - other_lang_logprob (raw, biased by length) |
| normalized_log_prob_ratio | (gt_logprob/gt_ntok) - (tr_logprob/tr_ntok) — PRIMARY METRIC |
| preferred_language | Language with higher raw log_prob |
| preferred_language_normalized | Language with higher normalized log_prob per token |
## Provenance
- **experiment_name**: codeswitching-sentiment-bias
- **job_id**: torch:7391386
- **cluster**: NYU torch (l40s_publ)
- **artifact_status**: canary
- **canary**: true
提供机构:
bermaneh



