five

Drama Critiques' Database

收藏
Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/6787151
下载链接
链接失效反馈
官方服务:
资源简介:
Drama Critiques gathers 10 years (2010 – 2020) of London contemporary theatre reviews. By focusing on two literary communities (journalists on the one hand and bloggers on the other hand), this corpus enables one to examine the poetical and political discourse journalistic and digital reviewers build. While our initial corpus is composed of more than 43 000 theatre reviews, the version made available here is constituted of 36 766 reviews. This is explained by the fact that we still have not received the authorisation of some of the bloggers to publish their data in open access. To have more information about our project, the different analyses of the corpus can be found here: https://dramacritiques.com/en/home/ The corpus based on journalism was created thanks to Theatre Record, a paper magazine originally created by the English critic Ian Herbert. Theatre Record reprints in full all the national drama critics’ reviews of the latest productions in and out of London. Published every two weeks in England since January 1981, it is in January 2019 that its archives were digitized thanks to Julian Oddy (https://www.theatrerecord.com/). We have selected 23 newspapers in total which correspond to 21 717 theatre reviews. All of them were initially available in a PDF format. After having converted all the files in a textual format, a massive work of automatic and manual corrections was done on each of the files. This task represents more than 1050 hours of work. The corpus based on blog platforms is constituted of the most popular 28 blog platforms on the Internet. They can be divided into two sub-categories: collective blog platforms on the one hand, and individual blog platforms on the other. Either these digital platforms are run by a publisher who invites other reviewers to post on his website, or the publisher publishes all of his reviews himself. In both cases, these authors are not paid for their activity, the content of their blog does not have a printed version and is completely free. In this version there are 21 blogs, or 15 049 reviews. All of them were automatically extracted thanks to web scraping techniques before being corrected (250 hours of work). You can discover more about each blog here: https://dramacritiques.com/en/categories-2/the-corpus/

《戏剧评论集》(Drama Critiques)收录了2010年至2020年十年间伦敦当代戏剧评论文本。本语料库聚焦两类评论群体——专业新闻记者与博客作者,旨在探析职业新闻评论者与数字平台评论者所构建的诗学与政治话语体系。 我们最初的语料库共包含逾4.3万篇戏剧评论,但本次公开的版本仅收录36766篇。此差异源于部分博客作者尚未授权我们将其评论以开放获取形式公开。若欲了解本项目更多细节,可查阅本语料库的各类分析成果:https://dramacritiques.com/en/home/ 新闻评论子语料库依托《戏剧录》(Theatre Record)构建,该纸质杂志由英国评论家伊恩·赫伯特(Ian Herbert)创办。《戏剧录》完整转载英国国内所有针对伦敦内外最新戏剧作品的专业剧评,自1981年1月起每两周在英国发行一期,2019年1月由朱利安·奥迪(Julian Oddy)完成其档案的数字化工作(https://www.theatrerecord.com/)。 我们共遴选了23家报纸的评论,共计21717篇戏剧评论。所有原始素材均以PDF格式存储。在将所有PDF文件转换为纯文本格式后,我们对每份文件开展了大规模的自动与人工校对工作,该项工作累计耗时超1050小时。 博客评论子语料库则选自互联网上最具影响力的28个博客平台,这些平台可分为两类子类别:集体博客平台与个人博客平台。 此类数字平台要么由平台运营者邀请其他评论者入驻发文,要么由运营者独自发布自身创作的评论。两类平台的作者均无稿酬,博客内容无纸质版本,且完全免费公开。 本次公开的博客子语料库共收录21个博客的15049篇评论。所有博客评论均通过网络爬虫(web scraping)技术自动提取,随后进行了人工校对,累计耗时250小时。 若欲了解各博客的详细信息,可访问:https://dramacritiques.com/en/categories-2/the-corpus/
创建时间:
2023-06-28
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含2010-2020年伦敦当代戏剧评论,总计36,766篇,分为报纸评论(21,717篇)和博客评论(15,049篇)两部分。数据集经过大量校正工作,以CSV格式提供,大小为103.0 MB。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作