ChID
收藏arXiv2020-01-28 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1906.01265v3
下载链接
链接失效反馈官方服务:
资源简介:
ChID数据集由清华大学人工智能研究院创建,是一个专注于中文成语理解的大型填空测试数据集。该数据集包含581,000篇文章和729,000个空白,覆盖新闻、小说和散文等多个领域。在ChID中,文章中的成语被空白符号替换,每个空白提供一组候选成语,包括正确答案。数据集的创建旨在评估机器对中文成语的理解能力,特别是在非组合性和隐喻意义方面。ChID的应用领域包括语言理解和自然语言处理,旨在解决机器在中文阅读理解中对成语的准确理解和表示问题。
The ChID dataset, developed by the Institute of Artificial Intelligence at Tsinghua University, is a large-scale cloze test dataset dedicated to Chinese idiom understanding. It encompasses 581,000 articles and 729,000 blanks, spanning diverse domains including news, fiction, and prose. In ChID, idioms within the articles are replaced with blank markers, and each blank is paired with a set of candidate idioms that includes the correct answer. The dataset is designed to evaluate machines' capacity for understanding Chinese idioms, particularly their non-compositional and metaphorical meanings. Its applicable domains cover language understanding and natural language processing, with the goal of resolving the challenges of accurate idiom understanding and representation in Chinese machine reading comprehension.
提供机构:
清华大学人工智能研究院
创建时间:
2019-06-04



