CBC/Corporati
收藏Figshare2011-12-30 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/CBCCorporati/121
下载链接
链接失效反馈官方服务:
资源简介:
The Corporate Blogging Corpus (CBC/Corporati) was assembled between early 2006 and late 2007 as part of my dissertation on the style and pragmatics of corporate blogs (Puschmann, 2010). A list of 137 English-language company blogs were selected and categorized by their function and place in the organization (e.g. PR, marketing, company leadership). The blog posts were harvested over a period of 1.5 years using PHP, MySQL and MagpieRSS. Part of speech tagging was also performed, but this data is excluded from this data set. See the included README file and Puschmann (2010; available at http://blog.ynada.com/368) for a detailed analysis of the corpus. NOTE TO THE ADMIN: this data belongs into the category Social Science/Linguistics (presently missing). Please add if possible.
企业博客语料库(Corporate Blogging Corpus,缩写CBC/Corporati)于2006年初至2007年末期间汇编完成,系笔者关于企业博客文体与语用学的学位论文研究内容之一(Puschmann, 2010)。
本研究筛选了137个英语企业博客,并依据其在企业中的职能与定位进行分类,例如公共关系(Public Relations,简称PR)、市场营销、企业领导层博客等。
上述博客文章通过PHP、MySQL与MagpieRSS工具,耗时1.5年完成采集。
研究人员同时开展了词性标注(Part-of-Speech Tagging)工作,但该部分数据未纳入当前数据集。
如需了解该语料库的详细分析内容,请参阅随附的README文件以及Puschmann(2010)的相关研究(该文献可通过http://blog.ynada.com/368获取)。
致管理员提示:本数据集应归入社会科学/语言学分类(当前该分类尚未添加),若可行请补充添加。
创建时间:
2011-12-30



