Bollywood Movie Corpus
收藏arXiv2017-10-11 更新2024-06-21 收录
下载链接:
https://github.com/yourusername/BollywoodMovieCorpus
下载链接
链接失效反馈官方服务:
资源简介:
Bollywood Movie Corpus是由IBM研究院-印度和IIIT-Delhi等机构的研究人员共同创建的一个包含4000部宝莱坞电影数据的综合数据集。该数据集不仅包含电影的基本信息如标题、演员表、剧情文本等,还包括电影海报和预告片的链接,以及性别分析相关的详细数据。数据集的创建过程涉及从Wikipedia和YouTube等平台提取和处理数据,旨在通过分析电影内容来识别和消除性别偏见。该数据集的应用领域主要集中在性别偏见检测和移除,以及生成无偏见的故事内容,为研究者和开发者提供了一个丰富的资源来探索和解决性别偏见问题。
The Bollywood Movie Corpus is a comprehensive dataset encompassing data from 4,000 Bollywood films, jointly created by researchers from institutions including IBM Research India and IIIT-Delhi. This dataset includes not only basic film information such as titles, cast lists, plot synopses and other relevant details, but also links to movie posters and trailers, as well as detailed data related to gender analysis. The dataset was developed by extracting and processing data from platforms such as Wikipedia and YouTube, with the core goal of identifying and eliminating gender bias through film content analysis. Its primary application domains focus on gender bias detection and mitigation, as well as the generation of unbiased story content, providing a rich resource for researchers and developers to explore and address gender bias-related issues.
提供机构:
IBM研究院-印度
创建时间:
2017-10-11



