A Dataset of Multidimensional and Multilingual Social Opinions for Malta's Annual Government Budget
收藏Zenodo2021-06-01 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/4650231
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of three high quality multidimensional and multilingual social opinion datasets in the socio-economic domain, specifically Malta's Annual Government Budget. These contain over 6,000 online posts of user-generated content in Maltese, English, Maltese-English or other languages, gathered from newswires and social networking services, for the 2018, 2019 and 2020 budgets. Each online post has been annotated for multiple opinion dimensions in <em>subjectivity</em>, <em>sentiment polarity</em>, <em>emotion</em>, <em>sarcasm </em>and <em>irony</em>, and in terms of negation, topic and language. These datasets are a valuable resource for developing Opinion Mining tools and Language Technologies, and can be used as a baseline for assessing the state-of-the-art and for developing new advanced analytical methods for Opinion Mining. We provide four CSV files, with three files containing the annotated dataset of each respective annual Government Budget for 2018 (Malta-Budget-2018-dataset-v1.csv), 2019 (Malta-Budget-2019-dataset-v1.csv) and 2020 (Malta-Budget-2018-dataset-v1.csv), whereas the other file (Malta-Budget-2018-2020-data-sources-v1.csv) contains information about each data source referenced within each annual budget dataset file. Each online post is annotated with the following metadata and information (annotation types): <strong>Online Post Identifier (Online Post ID)</strong>: unique numerical identifier for the online post; <strong>Twitter Identifier (Twitter ID)</strong>: unique numerical identifier provided by Twitter for the online post (relevant for tweets only); <strong>Related Online Post Identifier (Related Online Post ID):</strong> numerical identifier for the parent online post (if any); <strong>Source Identifier (Source ID):</strong> numerical identifier referring to the actual data source (e.g., website) of the online post; <strong>Online Post Text</strong>: textual string of the online post (relevant only for newswires' comments); <strong>Subjectivity</strong>: binary value, with 1 referring to subjective posts and 0 referring to objective posts; <strong>Sentiment Polarity</strong>: categorical value (3-levels) for the sentiment polarity of the online post (negative, neutral, positive); <strong>Emotion</strong>: categorical value (8-levels) for the emotion of the online post based on Plutchik's eight primary emotions (joy, sadness, fear, anger, anticipation, surprise, disgust, trust); <strong>Sarcasm</strong>: binary value, with 1 referring to sarcasm in online posts; <strong>Irony</strong>: binary value, with 1 referring to irony in online posts; <strong>Negation</strong>: binary value, with 1 referring to negated online posts; <strong>Off-topic</strong>: binary value, with 1 referring to off-topic online posts that are political but not related to the budget; <strong>Language</strong>: numerical value, with 0 referring to online posts in English, 1 referring to posts in Maltese, 2 referring to Maltese-English (Maltenglish) code-switched posts, and 3 referring to posts in other languages. The dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license for non-commercial use.
提供机构:
Zenodo
创建时间:
2021-03-31



