bermaneh/codeswitching-sentiment-bias-exp2-results-v1

Name: bermaneh/codeswitching-sentiment-bias-exp2-results-v1
Creator: bermaneh
Published: 2026-04-28 13:43:18
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-exp2-results-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - codeswitching - masked-language-modeling - log-probability - multilingual - bias --- # codeswitching-sentiment-bias-exp2-results-v1 Full results of Experiment 2: Masked Token Language Prediction. **Experiment**: codeswitching-sentiment-bias **Job**: torch:7391529 **Date**: 2026-04-28 **Status**: final **Cluster**: NYU torch (l40s_publ partition, gl046) **Runtime**: 7m49s (0.1s/sample avg) ## Description For each valid row from Experiment 1, the top SHAP-contributing word is masked in the original sentence. Qwen2.5-7B-Instruct is prompted to fill in the blank (teacher-forced). Log probabilities are extracted for both the ground-truth word and its translation, measuring whether the model has a language preference when completing code-switched sentences. **Model**: Qwen/Qwen2.5-7B-Instruct (loaded from /scratch/ehb7466/.huggingface cache) **Input**: results/experiment1_results.json (3,360 valid rows from Experiment 1) **N_ROWS**: -1 (all rows) **RANDOM_SEED**: 42 ## Results Summary | Metric | Value | |--------|-------| | Rows processed | 3,285 / 3,360 | | Skipped (word not found verbatim) | 75 / 3,360 (2.2%) | | GT language preferred (raw) | 1,938 / 3,285 (59.0%) | | GT language preferred (normalized) | 2,232 / 3,285 (67.9%) | | Mean log_prob_ratio (raw) | 0.787 | | Mean normalized_log_prob_ratio | — | | Strong preference \|ratio\| > 0.5 | 3,159 / 3,285 (96.2%) | ### By swap direction | Direction | N | Mean raw ratio | GT preferred (raw) | |-----------|---|----------------|--------------------| | en→es (English GT) | 1,573 | +4.578 | — | | es→en (Spanish GT) | 1,712 | −2.697 | — | **Key finding**: The raw log_prob_ratio is systematically biased toward English — positive for en→es rows (expected) and negative for es→en rows (English still preferred despite Spanish being GT). After per-token normalization, Spanish GT rows show strong Spanish preference (mean norm=+6.696), confirming the raw bias is a BPE token-length artifact. The normalized metric reveals the model genuinely prefers the ground-truth language word in 67.9% of cases. ## Columns | Column | Description | |--------|-------------| | sentence_id | ID from Experiment 1 dataset | | original_sentence | Original code-switched tweet | | masked_sentence | Sentence with top-SHAP word replaced by [MASK] | | ground_truth_word | The masked word (in ground_truth_language) | | ground_truth_language | Language of ground truth word (English or Spanish) | | other_lang_word | Translation of the masked word | | other_language | Language of the translation | | swap_direction | en→es or es→en (from Experiment 1) | | model_predictions | Top-10 fill-in-the-blank predictions with log_prob scores | | ground_truth_logprob | Summed log prob of ground truth word tokens | | ground_truth_n_tokens | Number of BPE tokens in ground truth word | | other_lang_logprob | Summed log prob of translation word tokens | | other_lang_n_tokens | Number of BPE tokens in translation | | log_prob_ratio | ground_truth_logprob − other_lang_logprob (raw, length-biased) | | normalized_log_prob_ratio | Per-token average difference — **PRIMARY METRIC** | | preferred_language | Language with higher raw log_prob | | preferred_language_normalized | Language with higher normalized log_prob per token | ## Provenance - **experiment_name**: codeswitching-sentiment-bias - **job_id**: torch:7391529 - **cluster**: NYU torch (l40s_publ) - **artifact_status**: final - **canary**: false - **input_dataset**: bermaneh/codeswitching-sentiment-bias-results-v1

提供机构：

bermaneh

5,000+

优质数据集

54 个

任务类型

进入经典数据集