K-MHaS
收藏韩国仇恨言论数据集评估工具
概述
Azure OpenAI 服务包含一个内容过滤系统,该系统与大型语言模型(LLM)一起工作,包括图像生成模型。该系统通过将提示和完成内容通过一组分类模型来检测和防止有害内容的输出。内容过滤模型支持英语、德语、日语、西班牙语、法语、意大利语、葡萄牙语和中文。尽管服务可以在许多其他语言中工作,但质量可能会有所不同,特别是对于非支持语言如韩语,测试是必不可少的。此外,即使为支持的语言设置了内容过滤器,也需要进行测试以确保过滤器在您设置的严重性级别上检测内容。此工具通过最小的时间和努力对仇恨言论数据集进行基准测试,使您能够了解现有内容过滤器的性能,哪些类型的内容已被过滤,并配置适当级别的内容过滤器。
韩国多标签仇恨言论数据集(K-MHaS)
韩国多标签仇恨言论数据集(K-MHaS)包含来自韩国在线新闻评论的 109,692 条话语,标记有 8 个细粒度的仇恨言论类别(标签:政治、出身、身体、年龄、性别、宗教、种族、亵渎)或非仇恨言论类别。每个话语提供一个到四个标签,可以有效处理韩语语言模式。更多详情请参阅我们在 COLING 2022 上发表的关于 K-MHaS 的论文。
实现
代码重用了 https://github.com/daekeun-ml/evaluate-llm-on-korean-dataset,但为了评估各种内容过滤场景的性能,许多部分已经改变,例如不当输入提示(ResponsibleAIPolicyViolation),当内容被过滤时不会返回任何内容(提示内容过滤器和完成过滤器)。
结果
GPT4o-mini
| low threshold<br>(custom filter) | default2 | high threshold<br>(custom filter) | |||||
|---|---|---|---|---|---|---|---|
| category_big | category | count | mean | count | mean | count | mean |
| Hate Speech | [Age, Gender, Religion] | 2 | 0.667 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Age, Gender] | 3 | 0.375 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Age, Profanity] | 3 | 0.600 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Age, Race] | 2 | 1.000 | 2 | 1.000 | 0 | 0.000 |
| Hate Speech | [Age, Religion] | 10 | 0.625 | 2 | 0.125 | 0 | 0.000 |
| Hate Speech | [Age] | 83 | 0.469 | 7 | 0.040 | 3 | 0.017 |
| Hate Speech | [Gender, Religion] | 6 | 0.667 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Gender] | 30 | 0.370 | 2 | 0.025 | 0 | 0.000 |
| Hate Speech | [Origin, Age, Religion] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Origin, Age] | 17 | 0.773 | 3 | 0.136 | 0 | 0.000 |
| Hate Speech | [Origin, Gender] | 0 | 0.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Origin, Physical, Age] | 2 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Origin, Physical] | 3 | 0.600 | 1 | 0.200 | 1 | 0.200 |
| Hate Speech | [Origin, Religion] | 8 | 0.444 | 1 | 0.056 | 1 | 0.056 |
| Hate Speech | [Origin] | 51 | 0.573 | 2 | 0.022 | 0 | 0.000 |
| Hate Speech | [Physical, Age, Gender] | 1 | 0.333 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Physical, Age] | 19 | 0.704 | 2 | 0.074 | 0 | 0.000 |
| Hate Speech | [Physical, Gender, Profanity] | 0 | 0.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Physical, Gender, Religion] | 0 | 0.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Physical, Gender] | 14 | 0.583 | 1 | 0.042 | 1 | 0.042 |
| Hate Speech | [Physical, Profanity] | 3 | 0.750 | 1 | 0.250 | 0 | 0.000 |
| Hate Speech | [Physical, Religion] | 3 | 0.750 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Physical] | 77 | 0.626 | 6 | 0.049 | 2 | 0.016 |
| Hate Speech | [Politics, Age, Gender] | 1 | 0.500 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Age, Religion] | 3 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Age] | 21 | 0.840 | 4 | 0.160 | 0 | 0.000 |
| Hate Speech | [Politics, Gender, Religion] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Gender] | 1 | 0.500 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Origin, Age, Religion] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Origin, Age] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Origin, Religion] | 1 | 0.500 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Origin] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Physical, Age] | 4 | 1.000 | 1 | 0.250 | 0 | 0.000 |
| Hate Speech | [Politics, Physical, Religion] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Physical] | 17 | 0.773 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Profanity] | 2 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Politics, Religion] | 16 | 0.800 | 1 | 0.050 | 0 | 0.000 |
| Hate Speech | [Politics] | 85 | 0.669 | 12 | 0.094 | 2 | 0.016 |
| Hate Speech | [Profanity] | 24 | 0.649 | 2 | 0.054 | 1 | 0.027 |
| Hate Speech | [Race] | 3 | 0.750 | 2 | 0.500 | 1 | 0.250 |
| Hate Speech | [Religion, Race] | 1 | 1.000 | 0 | 0.000 | 0 | 0.000 |
| Hate Speech | [Religion] | 40 | 0.714 | 2 | 0.036 | 1 | 0.018 |
| Not Hate Speech | [Not Hate Speech] | 257 | 0.242 | 23 | 0.022 | 3 | 0.003 |
| Filtering Total | |||||||
| Hate Speech | - | 562 | 0.599 | 54 | 0.058 | 13 | 0.014 |
| Not Hate Speech | - | 257 | 0.242 | 23 | 0.022 | 3 | 0.003 |
快速开始
GitHub Codespace
请通过连接到 Codespace 项目来启动一个新项目。通过 devcontainer 自动配置了动手实践所需的环境,因此您只需运行 Jupyter 笔记本。
您的本地 PC
请在您的本地 PC 上安装所需的包:
bash pip install -r requirements.txt
请不要忘记修改 .env 文件以匹配您的账户。将 .env.sample 重命名为 .env 或复制并使用它。
修改您的 .env
ini AZURE_OPENAI_ENDPOINT=<YOUR_OPEN_ENDPOINT> AZURE_OPENAI_API_KEY=<YOUR_OPENAI_API_KEY> AZURE_OPENAI_API_VERSION=<YOUR_OPENAI_API_VERSION> AZURE_OPENAI_DEPLOYMENT_NAME=<YOUR_DEPLOYMENT_NAME> (e.g., gpt-4o-mini)> OPENAI_MODEL_VERSION=<YOUR_OPENAI_MODEL_VERSION> (e.g., 2024-07-18)>
执行命令以进行评估。(评估结果保存在 ./results 文件夹和 ./evals 中。)
bash python main.py
可调参数
python parser.add_argument("--num_samples", type=int, default=2000) parser.add_argument("--is_random", type=bool, default=False) parser.add_argument("--is_debug", type=bool, default=False) parser.add_argument("--num_debug_samples", type=int, default=100) parser.add_argument("--model_provider", type=str, default="azureopenai") parser.add_argument("--max_retries", type=int, default=3) parser.add_argument("--max_tokens", type=int, default=256) parser.add_argument("--temperature", type=float, default=0)
参考文献
K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment (Lee et al., COLING 2022)




