five

GOSU。AI Dota 2游戏聊天数据集

收藏
帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-13836.html
下载链接
链接失效反馈
官方服务:
资源简介:
##数据集此数据集包含来自[Dota 2][1]的聊天信息-Valve视频游戏,最流行的电子竞技学科之一。该数据集用于训练[罗夫兰机器人][2]。它包含了近100万场公开配对赛的聊天记录(当玩家被游戏服务器随机挑选出来时,技能水平大致相同)。#注意事项和免责声明**重要信息,请阅读。**此数据集对于工作而言完全不安全。在《Dota 2》中,玩家以一种非常特殊的方式相互交流。例如,您可能会发现许多缩写和特定于游戏的术语。对于Dota 2玩家来说,通常会将游戏中的失败归咎于队友和对手。不幸的是,许多信息可能包含粗鲁的侮辱、对其他玩家家人的侮辱、种族主义的表达和其他可怕的事情。我们按“原样”提供信息,没有任何过滤和审查,我们不对数据中的冒犯性内容负责。我们的目标是让研究人员有机会深入到真实的对话中来探索玩家社区。我们想提请大家注意大多数Dota 2玩家的显著毒性问题,我们认为玩家的这种行为是不健康的数据集1的使用。请参阅关于如何[学习Roplan机器人][4]以反映典型玩家的聊天行为的粗略解释。您可以在此数据集上应用自己的语言模型,并创建其他聊天机器人,或者只是比较学习性能。2、通过对电子竞技观众聊天的分析,查看[这篇arXiv论文][3]。您可以对游戏参与者的聊天进行类似的分析。[1]: https://en.wikipedia.org/wiki/Dota_2 [2]: https://roflan.gosu.ai [3]: https://arxiv.org/pdf/1801.02862.pdf [4]: https://www.reddit.com/r/DotA2/comments/7xs8q6/how_we_trained_dota_2_chat_simulator_why_he_is_so/

## Dataset This dataset contains chat messages from [Dota 2][1], a Valve video game and one of the most popular esports titles. It is used to train the [Roflan Bot][2]. The dataset includes chat logs from nearly 1 million public matchmaking matches, where players are randomly selected by game servers to have roughly matching skill levels. ## Notes and Disclaimer **Important information, please read.** This dataset is entirely unsafe for professional work environments. In *Dota 2*, players communicate with each other in a highly specific manner. For example, you may encounter numerous abbreviations and game-specific terminology. Dota 2 players often blame teammates and opponents for in-game defeats. Unfortunately, many messages may contain crude insults, slurs targeting other players' family members, racist remarks, and other highly offensive content. We provide this dataset "as-is" without any filtering or moderation, and we assume no responsibility for the offensive content contained within it. Our goal is to provide researchers with access to authentic conversations to explore the player community. We wish to draw attention to the significant toxicity issue prevalent among most Dota 2 players, which we believe constitutes an unhealthy practice in the use of Dataset 1. Please refer to the rough explanation on [how to train the Roflan Bot][4] to replicate typical player chat behavior. You can apply your own language models to this dataset to build additional chatbots, or simply compare training performance. 2. For analysis of esports viewer chat, refer to [this arXiv paper][3]. You can conduct similar analyses on the chat logs of game participants. [1]: https://en.wikipedia.org/wiki/Dota_2 [2]: https://roflan.gosu.ai [3]: https://arxiv.org/pdf/1801.02862.pdf [4]: https://www.reddit.com/r/DotA2/comments/7xs8q6/how_we_trained_dota_2_chat_simulator_why_he_is_so/
提供机构:
帕依提提
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含近100万场Dota 2公开配对赛的聊天记录,用于训练聊天机器人和研究玩家社区行为。数据集未经过滤,可能包含大量游戏术语和冒犯性语言,反映了真实的玩家交流环境。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务