microsoft/xglue
收藏数据集概述
名称: XGLUE
语言: 支持多种语言,包括阿拉伯语(ar)、保加利亚语(bg)、德语(de)、希腊语(el)、英语(en)、西班牙语(es)、法语(fr)、印地语(hi)、意大利语(it)、荷兰语(nl)、波兰语(pl)、葡萄牙语(pt)、俄语(ru)、斯瓦希里语(sw)、泰语(th)、土耳其语(tr)、乌尔都语(ur)、越南语(vi)、中文(zh)。
许可证: 其他(Licence Universal Dependencies v2.5)
多语言性: 多语言(multilingual)和翻译(translation)
大小类别:
- 10K<n<100K
- 100K<n<1M
源数据集:
- 扩展自 conll2003
- 扩展自 squad
- 扩展自 xnli
- 原始数据
任务类别:
- 问答
- 摘要
- 文本分类
- 文本到文本生成
- 令牌分类
任务ID:
- acceptability-classification
- extractive-qa
- named-entity-recognition
- natural-language-inference
- news-articles-headline-generation
- open-domain-qa
- parsing
- topic-classification
配置名称:
- mlqa
- nc
- ner
- ntg
- paws-x
- pos
- qadsm
- qam
- qg
- wpr
- xnli
数据集结构
数据实例
ner
示例(test.nl): json { "ner": [ "O", "O", "O", "B-LOC", "O", "B-LOC", "O", "B-LOC", "O", "O", "O", "O", "O", "O", "O", "B-PER", "I-PER", "O", "O", "B-LOC", "O", "O" ],
pos
示例(test.en): json { "pos": [ "ADJ", "ADP", "ADV", "AUX", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X" ],
mlqa
示例(test.en): json { "context": "...", "question": "...", "answers": [ { "answer_start": 123, "text": "..." } ] }
nc
示例(test.en): json { "news_title": "...", "news_body": "...", "news_category": "foodanddrink" }
xnli
示例(test.en): json { "premise": "...", "hypothesis": "...", "label": "entailment" }
paws-x
示例(test.en): json { "sentence1": "...", "sentence2": "...", "label": "same" }
qadsm
示例(test.en): json { "query": "...", "ad_title": "...", "ad_description": "...", "relevance_label": "Good" }
wpr
示例(test.en): json { "query": "...", "web_page_title": "...", "web_page_snippet": "...", "relavance_label": "Perfect" }
qam
示例(test.en): json { "question": "...", "answer": "...", "label": "True" }
qg
示例(test.en): json { "answer_passage": "...", "question": "..." }
ntg
示例(test.en): json { "news_body": "...", "news_title": "..." }



