bermaneh/codeswitching-sentiment-bias-exp2-results-v1
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-exp2-results-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- codeswitching
- masked-language-modeling
- log-probability
- multilingual
- bias
---
# codeswitching-sentiment-bias-exp2-results-v1
Full results of Experiment 2: Masked Token Language Prediction.
**Experiment**: codeswitching-sentiment-bias
**Job**: torch:7391529
**Date**: 2026-04-28
**Status**: final
**Cluster**: NYU torch (l40s_publ partition, gl046)
**Runtime**: 7m49s (0.1s/sample avg)
## Description
For each valid row from Experiment 1, the top SHAP-contributing word is masked in the
original sentence. Qwen2.5-7B-Instruct is prompted to fill in the blank (teacher-forced).
Log probabilities are extracted for both the ground-truth word and its translation,
measuring whether the model has a language preference when completing code-switched sentences.
**Model**: Qwen/Qwen2.5-7B-Instruct (loaded from /scratch/ehb7466/.huggingface cache)
**Input**: results/experiment1_results.json (3,360 valid rows from Experiment 1)
**N_ROWS**: -1 (all rows)
**RANDOM_SEED**: 42
## Results Summary
| Metric | Value |
|--------|-------|
| Rows processed | 3,285 / 3,360 |
| Skipped (word not found verbatim) | 75 / 3,360 (2.2%) |
| GT language preferred (raw) | 1,938 / 3,285 (59.0%) |
| GT language preferred (normalized) | 2,232 / 3,285 (67.9%) |
| Mean log_prob_ratio (raw) | 0.787 |
| Mean normalized_log_prob_ratio | — |
| Strong preference \|ratio\| > 0.5 | 3,159 / 3,285 (96.2%) |
### By swap direction
| Direction | N | Mean raw ratio | GT preferred (raw) |
|-----------|---|----------------|--------------------|
| en→es (English GT) | 1,573 | +4.578 | — |
| es→en (Spanish GT) | 1,712 | −2.697 | — |
**Key finding**: The raw log_prob_ratio is systematically biased toward English — positive for
en→es rows (expected) and negative for es→en rows (English still preferred despite Spanish being GT).
After per-token normalization, Spanish GT rows show strong Spanish preference (mean norm=+6.696),
confirming the raw bias is a BPE token-length artifact. The normalized metric reveals the model
genuinely prefers the ground-truth language word in 67.9% of cases.
## Columns
| Column | Description |
|--------|-------------|
| sentence_id | ID from Experiment 1 dataset |
| original_sentence | Original code-switched tweet |
| masked_sentence | Sentence with top-SHAP word replaced by [MASK] |
| ground_truth_word | The masked word (in ground_truth_language) |
| ground_truth_language | Language of ground truth word (English or Spanish) |
| other_lang_word | Translation of the masked word |
| other_language | Language of the translation |
| swap_direction | en→es or es→en (from Experiment 1) |
| model_predictions | Top-10 fill-in-the-blank predictions with log_prob scores |
| ground_truth_logprob | Summed log prob of ground truth word tokens |
| ground_truth_n_tokens | Number of BPE tokens in ground truth word |
| other_lang_logprob | Summed log prob of translation word tokens |
| other_lang_n_tokens | Number of BPE tokens in translation |
| log_prob_ratio | ground_truth_logprob − other_lang_logprob (raw, length-biased) |
| normalized_log_prob_ratio | Per-token average difference — **PRIMARY METRIC** |
| preferred_language | Language with higher raw log_prob |
| preferred_language_normalized | Language with higher normalized log_prob per token |
## Provenance
- **experiment_name**: codeswitching-sentiment-bias
- **job_id**: torch:7391529
- **cluster**: NYU torch (l40s_publ)
- **artifact_status**: final
- **canary**: false
- **input_dataset**: bermaneh/codeswitching-sentiment-bias-results-v1
提供机构:
bermaneh



