five

Familiist (pro-natalist) communities in the social network VKontakte

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4244360
下载链接
链接失效反馈
官方服务:
资源简介:
The database contains an upload of text comments in Russian from the social networkVkontakte in .csv format (UTF-8 encoding). Comments are collected from communities, which discuss pregnancy, childhood, motherhood, paternity, etc. The unloading contains comments under the posts with which the interaction took place. The absolute amount of likes was used as a criterion, (comments were collected where the number of likes is greater than or equal to 5). The text data was processed (stemmization and lemmatization). The data are suitable for thematic analysis (e.g. LDA - Latent Dirichlet Allocation), for modelling the graph structure of communities (the link_comment variable contains a unique identifier of the post, link_author contains a unique user identifier), for analysis of the tonalities of statements and forming a dictionary of demographic connotation.  Sample Information: - Number of communities 38  - Content type of communities: communities in which users are mainly positive about the birth of children, motherhood, parenthood and own family are selected. But users (communities) with anti-familistic biases may be encountered.     - Only comments with the number of likes >= 5 are collected  - Comments are collected only from communities (the list of communities below) discussing issues related to childhood, motherhood, pregnancy, etc.  - A sample of communities on average contains 309 thousand subscribers (maximum value - 1,482,303, minimum value - 72,570, total number of subscribers excluding intersections - 11,743 295)  - The sample of comments contains 112,900 user comments  Sample Structure:  link_author - link to the author of the comment in the form of https://vk.com/*author identificator*  gender of author (F - female, M - male, NaN - no data)  link_comment - link to comment in the form of https://vk.com/* post identificatior on a *community wall*?reply=*comment id *  date_time - date and time of publication (format “YYYY-MM-DD HH:MM:SS”)  text - raw comment text  likes - number of likes the comment has  text_prep - processed text (punctuation marks removed, words brought down to lowercase)  text_stem - processed text (based on the text_prep column stemmization using SnowBallstemmer (“Russian”) of the nltk library) is performed  text_sw - processed text (based on the text_prep column stop words are deleted using word_tokenize (text) of the nltk library)  text_lemm - processed text (lemmatization using mystem.lemmatize (text) of pymystem3 library is performed based on the text_prep column)  List of communities (38 communities): https://vk.com/club52388302 https://vk.com/club34677924 https://vk.com/club99834596 https://vk.com/club170234932 https://vk.com/club20199180 https://vk.com/club118030893 https://vk.com/club14395935 https://vk.com/club100104267 https://vk.com/club181526404 https://vk.com/club35095382 https://vk.com/club58530763 https://vk.com/club69716165 https://vk.com/club29746763 https://vk.com/club78865067 https://vk.com/club20709572 https://vk.com/club93466205 https://vk.com/club61700163 https://vk.com/club91423062 https://vk.com/club69285929 https://vk.com/club104012302 https://vk.com/club20622108 https://vk.com/club86333616 https://vk.com/club24765 https://vk.com/club87169444 https://vk.com/club86688308 https://vk.com/club93776129 https://vk.com/club47207301 https://vk.com/club39873171 https://vk.com/club59224150 https://vk.com/club7430494 https://vk.com/club37739956 https://vk.com/club59701255 https://vk.com/club27427277 https://vk.com/club126238531 https://vk.com/club127678644 https://vk.com/club57782234 https://vk.com/club51314884 https://vk.com/club134261249
创建时间:
2021-08-01
二维码
社区交流群
二维码
科研交流群
商业服务