Database "Childfree (antinatalist) communities in the social network VKontakte"
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/4612130
下载链接
链接失效反馈官方服务:
资源简介:
The database contains an upload of text comments in Russian from the social network Vkontakte in .csv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. According to name of groups, the reproductive attitudes of this users are like a child free. The unloading contains comments under the posts with which the interaction took place. The absolute amount of likes is used as a criterion (comments are collected where the number of likes is greater than or equal to 5). The text data is processed (stemmization and lemmatization).
The data are suitable for thematic analysis (e.g. LDA - Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), for analysis of the tonalities of statements and forming a dictionary of demographic connotation.
Sample Information
- Number of communities 8
- Content type of communities: communities in which users make mainly negative comments about the birth of children, motherhood, parenthood and own family are selected. However, some users with pro-familistic attitudes inside group may be encountered.
- Only comments with the number of likes >= 5 are collected
- The groups with less than 600 subscribers are excluded
- Comments are collected only from communities (the list of communities below) discussing issues related to childfree, childhood, motherhood, pregnancy, etc.
- A sample of communities on average contains 8 thousand subscribers (with the maximum number of subscribers of 61 071, the minimum number of subscribers of 619, and the average of 8 950)
- The sample of comments contains about 700 thousands user comments
Sample Structure
link_author - link to the author of the comment in the form of https://vk.com/*author identificator*
gender of author - (F - female, M - male, NaN - no data)
link_comment - link to comment in the form of https://vk.com/* post identificatior on a *community wall*?reply=*comment id *
date_time - date and time of publication (format “YYYY-MM-DD HH:MM:SS”)
text - raw comment text
likes - number of likes the comment has
text_prep - processed text (punctuation marks removed, words brought down to lowercase)
text_stem - processed text (based on the text_prep column stemmization using SnowBallstemmer (“Russian”) of the nltk library) is performed
text_sw - processed text (based on the text_prep column stop words are deleted using word_tokenize (text) of the nltk library)
text_lemm - processed text (lemmatization using mystem.lemmatize (text) of pymystem3 library is performed based on the text_prep column)
List of communities (8 communities):
https://vk.com/club69265846
https://vk.com/club43946
https://vk.com/club48085
https://vk.com/club4687918
https://vk.com/club38197124
https://vk.com/club58565280
https://vk.com/club59638638
https://vk.com/club148257242
创建时间:
2022-12-26



