General Zhihu Corpus
收藏Figshare2019-05-15 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/General_Zhihu_Corpus/8131781
下载链接
链接失效反馈官方服务:
资源简介:
Chinese language corpus containing 3,434 questions and 231,939 answers posted to Zhihu.com.Questions taken from 10 popular topics: “Culture” (文化), “Education” (教育), “Art” (艺术), “University” (大学), “The Internet” (互联网), “Psychology” (心理), “Technology” (科技), “Health” (健康), “Career Development” (职业发展), “Lifestyle” (生活方式)Includes R scripts used to extract data.Data extracted in April 2019.Files are questions (Q), answers (A) and question topics (T).The naming convention is the URL of the webpage:For questions:https://www.zhihu.com/question/[question number]For answers:https://www.zhihu.com/question/[question number]/answer/[answer number]Answers are organised by author category: "male", "female", "undisclosed gender", "anonymous", "organisation" using information from the user's profile where publicly accessible.Short Answers: ≤1,000 characters Medium Answers: 1,001-4,999 characters Long Answers: ≥5,000 characters
创建时间:
2019-05-15



