A harmonised testsuite for social media POS tagging (DE)
收藏heiDATA2020-03-26 更新2026-05-11 收录
下载链接:
https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/DATA/KXLMHN
下载链接
链接失效反馈官方服务:
资源简介:
<p>A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information. The original data comes from 3 different sources: a twitter dataset with 21,181 tokens, and two datasets from the Empirist shared task 2015: web data (12,718 tokens) and computer-mediated communication (10,505 tokens).</p>
提供机构:
Leibniz Institute for the German Language; Department of Computational Linguistics, Heidelberg University
创建时间:
2018-01-01



