bermaneh/codeswitching-sentiment-bias-exp2-canary-v1

Name: bermaneh/codeswitching-sentiment-bias-exp2-canary-v1
Creator: bermaneh
Published: 2026-04-28 13:27:12
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/bermaneh/codeswitching-sentiment-bias-exp2-canary-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - codeswitching - masked-language-modeling - log-probability - multilingual - canary --- # codeswitching-sentiment-bias-exp2-canary-v1 Canary run (N=50) of Experiment 2: Masked Token Language Prediction. **Experiment**: codeswitching-sentiment-bias **Job**: torch:7391386 **Date**: 2026-04-28 **Status**: canary — PASS **Cluster**: NYU torch (l40s_publ partition, gl043) **Runtime**: 2m37s (0.1s/sample avg, model load ~2.5min) ## Description For each valid row from Experiment 1, the top SHAP-contributing word is masked in the original sentence. Qwen2.5-7B-Instruct is prompted to fill in the blank (teacher-forced). Log probabilities are extracted for both the ground-truth word and its translation. **Model**: Qwen/Qwen2.5-7B-Instruct **Input**: bermaneh/codeswitching-sentiment-bias-results-v1 (loaded from local experiment1_results.json) **N_ROWS**: 50 (random sample, seed=42) ## Canary Results Summary | Metric | Value | |--------|-------| | Rows processed | 47 / 50 | | Skipped (word not found) | 3 / 50 (6%) | | GT language preferred (raw) | 29/47 (61.7%) | | GT language preferred (normalized) | 20/47 (42.6%) | | Mean log_prob_ratio (raw) | 1.728 | | Mean normalized_log_prob_ratio | -0.390 | | Strong preference |ratio|>0.5 | 44/47 (93.6%) | | en→es: mean raw ratio | 4.663 (English preferred) | | es→en: mean raw ratio | -4.534 (English preferred!) | | Avg per-sample time | 0.1s | | Projected full run (3360 rows) | ~6 min processing + ~2.5 min load | **Key finding**: Raw log_prob_ratio is biased — both en→es (positive) and es→en (negative) ratios favor English. After per-token normalization, this bias diminishes. The normalized ratio correctly adjusts for BPE token-length differences between languages. ## Columns | Column | Description | |--------|-------------| | sentence_id | ID of the sentence from Experiment 1 | | original_sentence | Original code-switched tweet | | masked_sentence | Sentence with top-SHAP word replaced by [MASK] | | ground_truth_word | The word that was masked (in ground_truth_language) | | ground_truth_language | Language of the ground truth word (English or Spanish) | | other_lang_word | Translation of the masked word into the other language | | other_language | Language of the translation | | swap_direction | en→es or es→en from Experiment 1 | | model_predictions | Top-10 fill-in-the-blank predictions with log_prob | | ground_truth_logprob | Summed log prob of ground truth word tokens | | ground_truth_n_tokens | Number of BPE tokens in ground truth word | | other_lang_logprob | Summed log prob of translation tokens | | other_lang_n_tokens | Number of BPE tokens in translation | | log_prob_ratio | ground_truth_logprob - other_lang_logprob (raw, biased by length) | | normalized_log_prob_ratio | (gt_logprob/gt_ntok) - (tr_logprob/tr_ntok) — PRIMARY METRIC | | preferred_language | Language with higher raw log_prob | | preferred_language_normalized | Language with higher normalized log_prob per token | ## Provenance - **experiment_name**: codeswitching-sentiment-bias - **job_id**: torch:7391386 - **cluster**: NYU torch (l40s_publ) - **artifact_status**: canary - **canary**: true

提供机构：

bermaneh

5,000+

优质数据集

54 个

任务类型

进入经典数据集