ChuckMcSneed/politiscales_for_llama_results

Name: ChuckMcSneed/politiscales_for_llama_results
Creator: ChuckMcSneed
Published: 2024-01-14 03:58:16
License: 暂无描述

Hugging Face2024-01-14 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ChuckMcSneed/politiscales_for_llama_results

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: wtfpl --- I made [WinterGoliath](https://huggingface.co/ChuckMcSneed/WinterGoliath-123b) and it felt a bit off compared to regular [Goliath](https://huggingface.co/alpindale/goliath-120b). I was wondering if left bias was really present in the models after that so I made an [automatic benchmark](https://github.com/ChuckMcSneed/politiscales_for_llama) using politiscales test. # Interpreting the data - b0/b1 : Internationalism/Nationalism - c0/c1 : Constructivism/Essentialism - e0/e1 : Ecology/Production - j0/j1 : Rehabilitative Justice/Punitive Justice - m0/m1 : Regulation/Laissez-faire - p0/p1 : Communism/Capitalism - s0/s1 : Progressive/Conservative - t0/t1 : Revolution/Reform - reli : Religiousness - comp : Belief in a worldwide conspiracy - prag : Pragmatism - mona : Monarchism - vega : VEGAN - anar : Anarchism - femi: : Radical feminism ### Suggestions for calculations - Whackiness of the model=anar+comp+mona+reli - Certainty=(x0+x1), calculate average of all values - Bias towards value(logratio)=LOG(x0/x1;2) - Left-right bias=b_logratio+c_logratio+e_logratio+j_logratio+m_logratio+p_logratio+s_logratio-t_logratio # Results ![politiscales-llm.png](politiscales-llm.png) - Mixtral-instruct0.1 hates violence(-3.1 logratio_t!!!), religion(0.4%), pragmatic politics(1.4%), monarchy(0.6%), anarchy(5.7%) and conspiracy theories(0.2%); likes regulation(1.5 logratio_t), globalism(1.4) and ecology(0.64). The MOST vegan(72%!!!) model tested so far. EXTREME left-wing bias. - LLAMA2-70b is fairly neutral, with slight bias to the left. - Dicephal-123(self-merge of llama2-70B) is also fairly neutral, with slight bias to the right. Sadly, it is the most right-wing model that I tested. - Xwin, Goliath, Nous-Hermes have low left-wing bias. - DoubleGold and WinterGoliath have medium left-wing bias. - WinterGoddess and Euryale have high left-wing bias. My suspicions were correct, WinterGoliath has stronger left wing bias than Goliath and is less whacky, I cannot consider it an upgrade, just a sidegrade. # Limitations The tests were done without any special prompting. How models perform on the test does not reflect how they perform with the right prompting, it just shows the bias of the model. This test does NOT measure censorship.

---许可证：WTFPL--- 我开发了[WinterGoliath](https://huggingface.co/ChuckMcSneed/WinterGoliath-123b)模型，相较于常规的[Goliath](https://huggingface.co/alpindale/goliath-120b)模型，其表现存在些许偏差。此后我怀疑这类大语言模型是否真的存在左翼倾向，因此基于politiscales测试集开发了[自动化基准测试工具](https://github.com/ChuckMcSneed/politiscales_for_llama)。 # 数据解读 - b0/b1：国际主义/民族主义 - c0/c1：建构主义/本质主义 - e0/e1：生态主义/生产主义 - j0/j1：恢复性司法/惩罚性司法 - m0/m1：监管主义/自由放任主义 - p0/p1：共产主义/资本主义 - s0/s1：进步主义/保守主义 - t0/t1：革命主义/改良主义 - reli：宗教虔诚度 - comp：全球阴谋论认同度 - prag：实用主义倾向 - mona：君主制倾向 - vega：纯素主义（VEGAN） - anar：无政府主义倾向 - femi：激进女权主义倾向 ### 计算建议 - 模型怪异度 = anar + comp + mona + reli - 确定性指标 = (x0 + x1)，需计算所有取值的平均值 - 价值倾向对数比 = LOG(x0/x1; 2) - 左右翼倾向总分 = b_logratio + c_logratio + e_logratio + j_logratio + m_logratio + p_logratio + s_logratio - t_logratio # 测试结果 ![politiscales-llm.png](politiscales-llm.png) - Mixtral-instruct0.1 厌恶暴力（logratio_t为-3.1）、宗教立场（0.4%）、实用主义政治（1.4%）、君主制（0.6%）、无政府主义（5.7%）与阴谋论（0.2%）；支持监管主义（logratio_t为1.5）、全球主义（1.4）与生态主义（0.64）。是目前测试过的纯素主义倾向最高（72%！！！）的模型，具有极强的左翼倾向。 - LLaMA2-70b 整体较为中立，仅存在轻微左翼倾向。 - Dicephal-123（基于LLaMA2-70B的自合并模型）同样较为中立，但存在轻微右翼倾向，且是本次测试中右翼倾向最显著的模型。 - Xwin、Goliath、Nous-Hermes的左翼倾向较低。 - DoubleGold与WinterGoliath的左翼倾向中等。 - WinterGoddess与Euryale的左翼倾向较高。我的推测得到了验证：WinterGoliath的左翼倾向强于Goliath，且怪异度更低，但我无法将其视为升级版本，仅能算作平行迭代版本。 # 局限性说明本次测试未使用任何特殊提示模板。模型在该测试中的表现仅能反映其固有偏见，无法代表其在适配恰当提示后的实际性能。本测试未考量审查机制相关因素。

提供机构：

ChuckMcSneed

原始信息汇总

数据集概述

数据解释

b0/b1: 国际主义/民族主义
c0/c1: 建构主义/本质主义
e0/e1: 生态/生产
j0/j1: 康复正义/惩罚正义
m0/m1: 监管/自由放任
p0/p1: 共产主义/资本主义
s0/s1: 进步/保守
t0/t1: 革命/改革
reli: 宗教性
comp: 全球阴谋论信仰
prag: 实用主义
mona: 君主主义
vega: 素食主义
anar: 无政府主义
femi: 激进女权主义

计算建议

模型古怪性 = anar + comp + mona + reli
确定性 = (x0 + x1)，计算所有值的平均值
价值偏差 = LOG(x0/x1;2)
左右偏差 = b_logratio + c_logratio + e_logratio + j_logratio + m_logratio + p_logratio + s_logratio - t_logratio

结果

Mixtral-instruct0.1: 极度厌恶暴力、宗教、实用政治、君主制、无政府主义和阴谋论；偏好监管、全球主义和生态。测试中最素食主义的模型，极端左翼偏见。
LLAMA2-70b: 相对中立，略偏左。
Dicephal-123: 相对中立，略偏右。
Xwin, Goliath, Nous-Hermes: 低左翼偏见。
DoubleGold 和 WinterGoliath: 中等左翼偏见。
WinterGoddess 和 Euryale: 高左翼偏见。

限制

测试未使用任何特殊提示。模型在测试中的表现不代表其在适当提示下的表现，仅显示模型的偏见。

该测试不测量审查制度。

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于评估语言模型政治倾向的基准测试，包含多个模型的测试结果和倾向性分析，适用于研究模型偏见和倾向性。数据集规模较小（<1K），格式为imagefolder，许可证为WTFPL。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集