UD-Filipino/UD_Tagalog-NewsCrawl
收藏Hugging Face2025-07-23 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/UD-Filipino/UD_Tagalog-NewsCrawl
下载链接
链接失效反馈官方服务:
资源简介:
UD_Tagalog-NewsCrawl数据集包含从Leipzig Tagalog Corpus中提取的注释文本,这些文本来自Tagalog语言的在线新闻网站。文本数据由Angelina Aquino自动解析和注释,并由Elsie Marie Or等人根据适用于Tagalog的UD指南进行手动校正。数据集包含训练、验证和测试三个部分,分别包含12495、1561和1563个句子。数据集还包括多种语言特征,如词性标注、依存关系等。由于数据来源,数据集中可能存在拼写错误、语法错误、不完整句子和Tagalog-English混合使用的情况。
The Tagalog Universal Dependencies NewsCrawl dataset consists of annotated text extracted from the Leipzig Tagalog Corpus, which were crawled from Tagalog-language online news sites. The text data was automatically parsed and annotated, and then manually corrected according to UD guidelines adapted for Tagalog. The dataset includes train, validation, and test splits with 12495, 1561, and 1563 sentences respectively. The dataset may contain typos, grammatical errors, incomplete sentences, and Tagalog-English code-mixing.
提供机构:
UD-Filipino



