five

多语言比喻对话数据集 (MSD)

收藏
arXiv2023-06-09 更新2024-06-21 收录
下载链接:
https://github.com/malongxuan/MSD
下载链接
链接失效反馈
官方服务:
资源简介:
多语言比喻对话数据集(MSD)是由哈尔滨工业大学社会计算与信息检索研究中心创建的高质量数据集,包含约20,000条中英文比喻对话数据。该数据集通过从社交平台如Reddit和微博收集的大规模对话数据中提取,经过严格的数据收集和标注流程,确保了数据的质量和多样性。MSD数据集不仅支持比喻研究,如识别、解释和生成任务,还支持对话研究,如响应检索和生成任务。该数据集的应用领域广泛,旨在解决自然语言处理中比喻现象的复杂性和对话系统中比喻使用的挑战。

The Multilingual Metaphor Dialogue Dataset (MSD) is a high-quality dataset created by the Research Center for Social Computing and Information Retrieval at Harbin Institute of Technology, which contains approximately 20,000 pairs of Chinese-English metaphorical dialogue data. Extracted from large-scale dialogue datasets collected from social platforms including Reddit and Weibo, MSD has undergone strict data collection and annotation procedures to ensure its data quality and diversity. This dataset not only supports metaphor-related research tasks such as recognition, interpretation and generation, but also supports dialogue research tasks including response retrieval and generation. With a wide range of application scenarios, MSD aims to address the complexity of metaphorical phenomena in natural language processing and the challenges of metaphor usage in dialogue systems.
提供机构:
哈尔滨工业大学社会计算与信息检索研究中心
创建时间:
2023-06-09
二维码
社区交流群
二维码
科研交流群
商业服务