Database of digital media publications on maternal (family) capital in Russia in 2006-2019
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5740416
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
The database contains data from publications of digital Russian-language media of the Russian Federation on the topic of maternity capital in the period from May 10, 2006 to June 30, 2019. The database includes uploading general data on publications on maternity capital in .xls formats (UTF-8 encoding) . Full texts of publications are presented in .xml format.
A specialized request was generated for the aggregator of publications of Russian-language digital mass media public.ru . In total, the database consists of 457888 publications of 7665 publishing houses from 1251 settlements in Russia on the territory of 85 regions. The database includes information about the date, type, authors, publisher and place of publication (municipality, region of the publisher), as well as full texts of publications.
Keywords: database, digital media, maternal (family) capital, central and municipal media, Russia
JEL codes: J10, J13, Z18.
Data format and access:
The database consists of full-text publications of digital media on the topic of maternity capital. Materials in Russian have been published in federal, regional and local digital media. Publication period: May 10, 2006 to June 30, 2019.
The database consists of 457888 publications of 7665 publishing houses from 1251 settlements in Russia on the territory of 85 regions. Presentation format .csv, .xml (full texts). The file "Matkap_SMI_17_11_2021.csv" contains processed information from the extended full-text sample by years (contained in the "XML.rar" archive).
pubData - date of publication (format "YYYY-MM-DD")
text - text from "description" in xml-files after lemmatization (removal of punctuation, lowercase and remove punctuation, spaces, numbers)
source - name of the electronic edition
place - town or city in Russia
type - type of electronic edition (newspaper, magazine, TV program, internet resource)
period - frequency of publication
positive - the number of unique positive words from the RuSentiLex2017 dictionary
negative - the number of unique negative words from the RuSentiLex2017 dictionary
neutral - the number of unique neutral words from the RuSentiLex2017 dictionary
Data collection methodology:
The aggregator of publications of Russian-language digital mass media public.ru was used. The selection of publications was limited to the time period from May 10, 2006 (Russian President Vladimir Vladimirovich Putin first announced the maternity capital programme in his message to the Federal Assembly, as one of the mechanisms to stimulate fertility and overcome the demographic crisis) to June 30, 2019 (until this date, the programme allowed full uploading of media publications without losses during the period of uploading publications from 01.08.2021 to 15.08.2021).
Key words used to select articles on maternity capital: “maternal capital”, “maternity capital”, “family capital”, “paternal capital”. The publication had to repeat at least two phrases from the request, while the distance between the phrases had to be no more than 4 sentences. This excluded publications in which the topic of maternity capital was mentioned incidentally, indirectly.
Duplicate articles were removed from the database. Duplicates related to publications that included a full repetition of the text of the publication itself, the publishing house and the municipality (location) of the publishing house. Duplications (reprints) of articles in other publishing houses or in other regions were not excluded.
After lemmatization of the text (as well as after reducing the text to lower case, removing unnecessary spaces, numbers and punctuation), according to the RuSentiLex2017 dictionary (Loukachevitch N., Levchik A. Creating a General Russian Sentiment Lexicon. In Proceedings of Language Resources and Evaluation Conference LREC-2016, 2016.) unique positive, negative and neutral words and phrases (variables) were counted. Repetitions of tonal words (stances) are not counted.
创建时间:
2022-12-26



