five

UN General Debates

收藏
www.kaggle.com2017-09-05 更新2025-03-25 收录
下载链接:
https://www.kaggle.com/unitednations/un-general-debates
下载链接
链接失效反馈
官方服务:
资源简介:
### Context: Every year since 1947, representatives of UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics. These statements are akin to the annual legislative state-of-the-union addresses in domestic politics. This dataset, the UN General Debate Corpus (UNGDC), includes the corpus of texts of General Debate statements from 1970 (Session 25) to 2016 (Session 71). ###Content: This dataset includes the text of each country’s statement from the general debate, separated by country, session and year and tagged for each. The text was scanned from PDFs of transcripts of the UN general sessions. As a result, the original scans included page numbers in the text from OCR (Optical character recognition) scans, which have been removed. This dataset only includes English. ### Acknowledgements: This dataset was prepared by Alexander Baturo, Niheer Dasandi, and Slava Mikhaylov, and is presented in the paper "Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus" Research & Politics, 2017. ### Inspiration: This dataset includes over forty years of data from different countries, which allows for the exploration of differences between countries and over time. This allows you to ask both country-specific and longitudinal questions. Some questions that might be interesting: * How has the sentiment of each country’s general debate changed over time? * What topics have been more or less popular over time and by region? * Can you build a classifier which identifies which country a given text is from? * Are there lexical or syntactic changes over time or differences between region? * How does the latitude of a country affect lexical complexity?

{'Context': '自1947年以来,联合国会员国的代表每年均聚集于联合国大会的年度会议。每届会议的核心内容为大会一般性辩论。这是一个领袖及其他高级官员发表声明,阐述其政府对世界政治重大议题观点的论坛。此类声明与国内政治中的年度立法国情咨文类似。本数据集,即联合国大会一般性辩论语料库(UNGDC),收录了自1970年(第25届会议)至2016年(第71届会议)期间的一般性辩论声明文本。', 'Content': '本数据集包括各国在一般性辩论中的声明文本,按国家、会议和年份分别划分,并进行了相应的标注。文本源自联合国大会会议记录的PDF副本扫描。因此,原始扫描中包含的OCR(光学字符识别)扫描的页码已被移除。本数据集仅包含英文。', 'Acknowledgements': '本数据集由Alexander Baturo、Niheer Dasandi和Slava Mikhaylov整理,并在2017年发表的论文《以文本数据理解国家偏好:介绍联合国大会一般性辩论语料库》中呈现。', 'Inspiration': '本数据集囊括了四十余年的不同国家数据,便于探究国家间以及随时间推移的差异。这允许提出既针对特定国家又具有纵向性的问题。以下是一些可能有趣的问题: * 各国一般性辩论的情感随时间如何变化? * 哪些主题在不同时间或不同地区更为流行或不太流行? * 你能否构建一个能够识别给定文本所属国家的分类器? * 随时间推移或地区间的词汇或句法是否发生变化? * 一个国家的纬度如何影响词汇的复杂度?'}
提供机构:
www.kaggle.com
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作