five

A harmonised testsuite for social media POS tagging (DE)

收藏
heiDATA2020-03-26 更新2026-05-11 收录
下载链接:
https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/DATA/KXLMHN
下载链接
链接失效反馈
官方服务:
资源简介:
<p>A harmonised POS testsuite of web data, CMC and Twitter microtext, with word forms and STTS pos tags (+ some additional CMC-specific tags). UD pos tags have been automatically converted, based on the STTS pos tags. The data does not contain (manually corrected) lemma information. The original data comes from 3 different sources: a twitter dataset with 21,181 tokens, and two datasets from the Empirist shared task 2015: web data (12,718 tokens) and computer-mediated communication (10,505 tokens).</p>
提供机构:
Leibniz Institute for the German Language; Department of Computational Linguistics, Heidelberg University
创建时间:
2018-01-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作