five

Database "Childfree (antinatalist) communities in the social network VKontakte"

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/4612130
下载链接
链接失效反馈
官方服务:
资源简介:
The database contains an upload of text comments in Russian from the social network Vkontakte in .csv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. According to name of groups, the reproductive attitudes of this users are like a child free. The unloading contains comments under the posts with which the interaction took place. The absolute amount of likes is used as a criterion (comments are collected where the number of likes is greater than or equal to 5). The text data is processed (stemmization and lemmatization). The data are suitable for thematic analysis (e.g. LDA - Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), for analysis of the tonalities of statements and forming a dictionary of demographic connotation. Sample Information - Number of communities 8 - Content type of communities: communities in which users make mainly negative comments about the birth of children, motherhood, parenthood and own family are selected. However, some users with pro-familistic attitudes inside group may be encountered.   - Only comments with the number of likes >= 5 are collected - The groups with less than 600 subscribers are excluded - Comments are collected only from communities (the list of communities below) discussing issues related to childfree, childhood, motherhood, pregnancy, etc. - A sample of communities on average contains 8 thousand subscribers (with the maximum number of subscribers of 61 071, the minimum number of subscribers of 619, and the average of 8 950) - The sample of comments contains about 700 thousands user comments Sample Structure link_author - link to the author of the comment in the form of https://vk.com/*author identificator* gender of author - (F - female, M - male, NaN - no data) link_comment - link to comment in the form of https://vk.com/* post identificatior on a *community wall*?reply=*comment id * date_time - date and time of publication (format “YYYY-MM-DD HH:MM:SS”) text - raw comment text likes - number of likes the comment has text_prep - processed text (punctuation marks removed, words brought down to lowercase) text_stem - processed text (based on the text_prep column stemmization using SnowBallstemmer (“Russian”) of the nltk library) is performed text_sw - processed text (based on the text_prep column stop words are deleted using word_tokenize (text) of the nltk library) text_lemm - processed text (lemmatization using mystem.lemmatize (text) of pymystem3 library is performed based on the text_prep column) List of communities (8 communities): https://vk.com/club69265846 https://vk.com/club43946 https://vk.com/club48085 https://vk.com/club4687918 https://vk.com/club38197124 https://vk.com/club58565280 https://vk.com/club59638638 https://vk.com/club148257242
创建时间:
2022-12-26
二维码
社区交流群
二维码
科研交流群
商业服务