The Blog Authorship Corpus
收藏SSH Open MarketPlace2021-07-22 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/1UXld8
下载链接
链接失效反馈官方服务:
资源简介:
The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person. Each blog is presented as a separate file, the name of which indicates a blogger id# and the blogger’s self-provided gender, age, industry and astrological sign.Cite as: J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging in _Proceedings of 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs_.
创建时间:
2021-07-22



