Raw data (Thomson Reuters news articles and comments 2012)
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/d2rppff696
下载链接
链接失效反馈官方服务:
资源简介:
Our research focuses on Internet news recipients’ comments as a continuum that has a communicative potential of its own. We consider these comments to be dynamic interactive integrals of news discourse. We apply a complex three-stage method to study the commenting continuum. At the first stage we utilize corpus-based technologies to analyze discourse structures and obtain a raw scheme of the commenting discourse. At the second stage we present qualita-tive analysis performed with a functional integral analysis which assists in deeper understanding of the discourse structure and its pragmatic functioning. At the third stage we deal with the comparative analysis. We aim at identifying the dynamic changes of commenting discourse continua which they undergo through time and under the influence of technological development.
This dataset includes raw corpus material. It consists of 116 Thomson Reuters world news stories about Iran neuclear program and 1018 comments to them. The minimum comment number to an news article is equal to one, the maximum - is equal to 144. We worked with each news story and its set of comments separately using AntConc software to figure out the list of keywords first and to see the keyword mapping (trace their place in the comments). Then we analysed which part of the article has the strongest impetus. After that we analysed each comment continuum (a set of comments on a news article) on a deeper pragmatic and cognitive level and determined the elements that serve as cognitve-pragmatic foci of the comments.
The analysis of comments allows us to follow the main lines of communicative interaction scheme and retrace the transformation of data presentation in a constant process of its re-contextualization.
The materials also include the program which was developed by Stas Shilov (2012) on my request with the technical task to cout the number of tokens in each comment and to have the maximum, minimum and average values automatically counted both per sentence and per comment. A token in this research is equal to a group of letters (normally, a word/words) and symbols separated by a left and a right space from neighbor tokens. Mostly tokens here are equal to words, though with the tendency to include punctuation signs. The program can be translated into English on request, though its interface is intuative.
The third file contains a graph (Figure 1) wich shows the distribution of comment length within the corpus given, though with little respect to the number of comments to each news article taken separately.
创建时间:
2022-01-26



