Data and code for "An Open Dataset of Chinese Duration Expressions"
收藏科学数据银行2025-08-06 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=a95e908ba31f41abbac1641c6cf3bba0
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises the data and the code for the manuscript "An Open Dataset of Chinese Duration Expressions".Duration information is essential for understanding and analyzing our world. In textual contexts, duration information is typically conveyed in two formats: numeric (e.g., 1 hour) and verbal (e.g., shortly). To analyze duration information in text, it is crucial to understand how people map duration expressions to corresponding numerical duration. However, the literature has yet to provide lexicons supporting such conversion. Furthermore, existing databases of time-related expressions often lack information about word frequency – a robust predictor of information processing. Here, we report an open dataset of 2,101 Chinese duration expressions, each annotated with its corresponding numerical duration. To obtain high-quality data for word frequency, we obtained the frequency of each duration expression from a large-scale corpus of 10 billion Chinese characters (BLCU Corpus Center (BCC) Corpus) and computed an adjusted frequency for each expression. This dataset provides a valuable resource for research on temporal information in Chinese, facilitating studies in natural language processing, psychology, and linguistics.
提供机构:
Chinese Academy of Sciences; Central University of Finance and Economics
创建时间:
2025-08-06



