five

CS-lol|电子竞技数据集|观众分析数据集

收藏
arXiv2023-01-17 更新2024-06-21 收录
电子竞技
观众分析
下载链接:
https://github.com/junj2ejj/CS-lol
下载链接
链接失效反馈
资源简介:
CS-lol是一个大规模数据集,由筑波大学综合人类科学研究生院创建,专注于电子竞技直播中的观众评论与游戏场景的匹配。数据集包含20场《英雄联盟》电子竞技比赛的观众评论和相应的游戏场景描述,总计60,431条评论。创建过程中,数据从YouTube和Twitch平台收集,通过自动语音识别和手动搜索匹配视频内容。CS-lol的应用领域包括信息检索和自然语言处理,旨在通过分析观众评论来理解观众行为和偏好,优化直播体验。
提供机构:
筑波大学综合人类科学研究生院
创建时间:
2023-01-17
AI搜集汇总
数据集介绍
main_image_url
构建方式
CS-lol数据集的构建基于电子竞技直播中的观众评论与游戏场景描述的配对。研究团队从YouTube和Twitch平台分别收集了20场《英雄联盟》职业比赛的评论和场景描述。场景描述通过YouTube的自动语音识别系统生成,而观众评论则从Twitch的直播回放中手动提取。为确保数据的一致性,研究团队对两个平台上的视频进行了时间戳对齐,并过滤了包含极少信息或仅包含表情符号的评论。此外,通过计算评论与场景描述的相关性得分,进一步筛选出与场景高度相关的评论,最终形成了包含场景描述、观众评论及相关性得分的数据集。
特点
CS-lol数据集的显著特点在于其大规模的观众评论与游戏场景描述的配对,涵盖了20场职业比赛,包含24,770条场景描述和60,431条观众评论。数据集通过相关性得分对评论进行了筛选,确保每条评论与特定场景高度相关。此外,数据集还对观众信息进行了匿名化处理,以保护隐私。数据集的评论和描述在词汇分布上具有多样性,为研究观众在电子竞技直播中的互动行为提供了丰富的资源。
使用方法
CS-lol数据集可用于多种研究任务,特别是观众评论检索任务,即根据给定的游戏场景描述检索相关的观众评论。研究者可以使用该数据集进行信息检索、自然语言处理等领域的实验,探索观众评论与游戏场景之间的语义关联。此外,数据集还可用于情感分析、命名实体识别等自然语言处理任务,帮助理解观众在直播中的表达方式和情感倾向。通过分析评论与场景的相关性,研究者可以进一步挖掘观众在电子竞技直播中的行为模式和偏好。
背景与挑战
背景概述
随着电子竞技(E-sports)的迅猛发展,电子竞技直播已成为一个拥有庞大市场的产业,吸引了全球数亿观众。在这一背景下,观众通过实时评论与赛事、解说员以及其他观众互动,形成了独特的社交体验。为了深入理解观众在电子竞技直播中的评论行为及其与赛事场景的关联,Junjie H. Xu等人于2023年开发了CS-lol数据集。该数据集包含了来自电子竞技直播的观众评论与对应的游戏场景描述,旨在通过这些数据推动对观众评论的深入研究。CS-lol数据集的发布不仅为研究者提供了一个大规模的资源,还提出了一个名为“观众评论检索”的任务,旨在从海量评论中检索出与特定场景相关的评论,从而更好地理解观众的实时反馈。
当前挑战
CS-lol数据集的构建与应用面临多重挑战。首先,电子竞技直播中的观众评论具有实时性和高度互动性,评论内容往往简短且包含大量表情符号,这使得评论的语义理解变得复杂。其次,评论与场景的关联性需要通过精确的时间戳进行匹配,而观众在评论时的打字速度和场景的不可预测性增加了这一任务的难度。此外,传统的信息检索方法在处理这类高度简短且语义丰富的评论时表现不佳,如何设计有效的检索模型以捕捉评论与场景之间的语义关联,成为该数据集应用中的主要挑战。
常用场景
经典使用场景
CS-lol数据集的经典使用场景主要集中在电子竞技直播中的观众评论与游戏场景的关联分析。通过将观众评论与游戏场景描述进行配对,研究者可以深入探讨观众在观看电子竞技比赛时的实时反馈和互动行为。这种配对分析不仅有助于理解观众的情绪和偏好,还能为直播平台提供优化用户体验的策略,例如通过评论检索任务来提升直播互动的精准性。
实际应用
CS-lol数据集在实际应用中具有广泛的前景,特别是在电子竞技直播平台的用户体验优化方面。通过分析观众评论与游戏场景的关联性,平台可以实现更精准的评论推荐和互动功能,提升观众的参与感和满意度。此外,该数据集还可用于直播内容的自动化生成,如根据观众评论自动生成解说词或实时反馈,从而增强直播的趣味性和互动性。
衍生相关工作
CS-lol数据集的发布激发了大量相关研究工作,特别是在信息检索(IR)和自然语言处理(NLP)领域。研究者们基于该数据集提出了多种评论检索模型,如BM25、QLD和SDM等,这些模型在评论与场景的匹配任务中表现出色。此外,CS-lol还启发了在NLP领域的进一步探索,如命名实体识别(NER)和依赖解析等任务,这些任务有助于更深入地理解电子竞技直播中的语言特征和观众行为。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

中国空气质量数据集(2014-2020年)

数据集中的空气质量数据类型包括PM2.5, PM10, SO2, NO2, O3, CO, AQI,包含了2014-2020年全国360个城市的逐日空气质量监测数据。监测数据来自中国环境监测总站的全国城市空气质量实时发布平台,每日更新。数据集的原始文件为CSV的文本记录,通过空间化处理生产出Shape格式的空间数据。数据集包括CSV格式和Shape格式两数数据格式。

国家地球系统科学数据中心 收录

FER2013

FER2013数据集是一个广泛用于面部表情识别领域的数据集,包含28,709个训练样本和7,178个测试样本。图像属性为48x48像素,标签包括愤怒、厌恶、恐惧、快乐、悲伤、惊讶和中性。

github 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

flames-and-smoke-datasets

该仓库总结了多个公开的火焰和烟雾数据集,包括DFS、D-Fire dataset、FASDD、FLAME、BoWFire、VisiFire、fire-smoke-detect-yolov4、Forest Fire等数据集。每个数据集都有详细的描述,包括数据来源、图像数量、标注信息等。

github 收录

Yahoo Finance

Dataset About finance related to stock market

kaggle 收录