Harm Detection for Message Moderation

Name: Harm Detection for Message Moderation
Creator: Kaggle
Published: 2024-09-26 00:00:00
License: 暂无描述

www.kaggle.com2024-09-26 更新2025-01-15 收录

下载链接：

https://www.kaggle.com/jiayongli/direction-of-harm-detection

下载链接

链接失效反馈

官方服务：

资源简介：

Google's Jigsaw team has worked on [online harassment](https://current.withgoogle.com/the-current/toxicity/) (they provided this [data set](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/)), and Meta has worked on [suicide prevention](https://about.fb.com/news/2024/09/preventing-suicide-and-self-harm-content-spreading-online/). These two problems actually exist in the same problem space: harassment is harm directed from the user to others, and suicide ideation is harm directed from the user to themself. Only the direction is different. By expanding on this idea of the direction of harm, there are four cases: - 'self_harm': harm directed from me to me - 'harming_others': harm directed from me to other people - 'harmed_by_others': harm directed from other people to me - 'reference_to_harm': harm directed from other people to other people Here are some examples. | | self_harm | harming_others | harmed_by_others | reference_to_harm | | --- | --- | --- | --- | --- | | I'm trash | 1 | 0 | 0 | 0 | | John is trash | 0 | 1 | 0 | 0 | | Mary told me I'm trash | 0 | 0 | 1 | 0 | | Adam told Jane she's trash | 0 | 0 | 0 | 1 | <br> Once we have these labels, what can I do with them? From a moderation point of view, these four labels warrant distinct follow-up responses. | | self_harm | harming_others | harmed_by_others | reference_to_harm | | --- | --- | --- | --- | --- | | response to author | suicide helpline | warning/block message | bully/abuse helpline | | | response to others | trigger warning | prompt user to report | trigger warning | trigger warning | <br> I also uploaded the [fine-tuned DeBERTa-v3-small model](https://www.kaggle.com/models/jiayongli/harm-deberta-v3-small). I documented my analysis in the [github repo](https://github.com/lijiayong/direction_of_harm) in Jupyter Notebook format. I documented my process in the [blog post](https://lijiayong.github.io/posts/direction_of_harm/).

谷歌的 Jigsaw 团队致力于研究在线骚扰问题（他们提供了该[数据集](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/)），而 Meta 则专注于自杀预防领域（有关详情请参阅[相关新闻](https://about.fb.com/news/2024/09/preventing-suicide-and-self-harm-content-spreading-online/)）。这两个问题实际上属于同一个问题域：骚扰是指用户对他人施加的伤害，而自杀意念则是指用户对自己施加的伤害。唯一的不同在于伤害的方向。通过扩展这一关于伤害方向的观念，我们可以识别出四种情况： - 'self_harm'：自我伤害，即对我自身的伤害 - 'harming_others'：伤害他人，即对我人的伤害 - 'harmed_by_others'：被他人伤害，即他人对我施加的伤害 - 'reference_to_harm'：间接伤害，即他人对他人施加的伤害以下是一些示例。 | | self_harm | harming_others | harmed_by_others | reference_to_harm | | --- | --- | --- | --- | --- | | 我很糟糕 | 1 | 0 | 0 | 0 | | 约翰很糟糕 | 0 | 1 | 0 | 0 | | 玛丽告诉我我很糟糕 | 0 | 0 | 1 | 0 | | 阿当告诉简，她很糟糕 | 0 | 0 | 0 | 1 | 一旦我们有了这些标签，我们可以如何利用它们呢？从内容监管的角度来看，这四个标签需要采取不同的后续应对措施。 | | self_harm | harming_others | harmed_by_others | reference_to_harm | | --- | --- | --- | --- | --- | | 对作者的反应 | 自杀求助热线 | 警告/封禁信息 | 骚扰/虐待求助热线 | | 对他人的反应 | 触发警告 | 提示用户举报 | 触发警告 | 触发警告 | 我还上传了经过微调的 DeBERTa-v3-small 模型[链接](https://www.kaggle.com/models/jiayongli/harm-deberta-v3-small)。我在 GitHub 仓库（以 Jupyter Notebook 格式）中记录了我的分析：[链接](https://github.com/lijiayong/direction_of_harm)。我在博客文章中记录了我的研究过程：[链接](https://lijiayong.github.io/posts/direction_of_harm/)。

提供机构：

Kaggle

5,000+

优质数据集

54 个

任务类型

进入经典数据集