Datasets of Expert Recommendation in Community Question Answering
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=41c073025c7541b699bfaa132b771892
下载链接
链接失效反馈官方服务:
资源简介:
The Stack Exchange public dataset contains real interactive data from various question-answering communities, which is commonly used in CQA expert recommendation. This study selects data from the Academia sub-community, which is an academic communication community with university teachers and graduate students as main users. The data generated by users who have provided at least one best answer from February 14, 2012 to December 31, 2019 are selected as the training dataset. 1920 questions that have been answered by at least two users in the training dataset from January 1, 2020 to September 25, 2022 are selected as the testing dataset.Due to the important role of the question information to which the answer belongs in identifying the topic of the answer, question information is added as the contextual extension text of the answer when training the BERT-LLDA model. The question information includes the question title, queation body, and question tags.This study combines the questioner ID and respondent ID in the Q&A data into a user pair and stores them in a CSV file. In order to consider the user quality factor in PageRank, this study proposes the user quality weight based on the proportion of votes and best answers.
提供机构:
Science Data Bank
创建时间:
2023-11-10



