Coronavirus-Update Podcast Transcripts
收藏www.kaggle.com2020-12-11 更新2025-01-08 收录
下载链接:
https://www.kaggle.com/juliushibbert/coronavirusupdate-podcast-transcripts
下载链接
链接失效反馈官方服务:
资源简介:
### Coronavirus-Update
"[Coronavirus-Update](https://www.ndr.de/nachrichten/info/Coronavirus-Update-Alle-Folgen,podcastcoronavirus134.html)" is a famous and awarded german Podcast by the public radio and television broadcaster *Norddeutscher Rundfunk* (NDR) about the Coronavirus Pandemic. It is hosted by Korinna Hennig and Anja Martini, regular Guests are the scientists [Christian Drosten](https://en.wikipedia.org/wiki/Christian_Drosten) and [Sandra Ciesek](https://en.wikipedia.org/wiki/Sandra_Ciesek).
### Transcripts
The data contains the transcripts of the podcast extracted from the published [PDF-files](https://www.ndr.de/nachrichten/info/Coronavirus-Update-Die-Podcast-Folgen-als-Skript,podcastcoronavirus102.html) in XML.
Hyphenation, columns and line-feeds are removed. If you need the spoken text `text` just ignore the tags for page numbers `seite` and subheadings `header`.
### ToDo
- rename tags
- check hyphenation at page breaks
《冠状病毒更新》(Coronavirus-Update)是由德国公共广播电台和电视台北德广播公司(Norddeutscher Rundfunk,简称NDR)制作并获奖的著名播客,该播客围绕冠状病毒大流行进行讨论。该节目由Korinna Hennig和Anja Martini主持,常驻嘉宾包括科学家Christian Drosten和Sandra Ciesek。
数据集包含从发布的[PDF文件](https://www.ndr.de/nachrichten/info/Coronavirus-Update-Die-Podcast-Folgen-als-Skript,podcastcoronavirus102.html)中提取的播客文字记录,以XML格式呈现。在处理过程中,已移除连字符、列和换行符。若需获取口述文本`text`,请忽略页面编号`seite`和副标题`header`的标签。
### 待办事项
- 重命名标签
- 检查页面中断处的连字符。
提供机构:
Kaggle



