five

Arukikata Travelogue Dataset with Geographic Entity Mention, Coreference, and Link Annotation (ATD-MCL)

收藏
arXiv2023-05-23 更新2024-06-21 收录
下载链接:
https://github.com/naist-nlp/atd-mcl
下载链接
链接失效反馈
官方服务:
资源简介:
Arukikata Travelogue Dataset (ATD-MCL) 是一个专为文档级别地点解析系统评估设计的日语旅行游记数据集。该数据集由日本国立情报学研究所等多个机构合作创建,包含200篇旅行游记文档,总计12,171个地点提及、6,339个共指群组和2,551个与地理数据库条目链接的地点。数据集内容丰富,涵盖了从粗粒度到细粒度的各种地点和设施,如国家、市镇、地区、便利设施建筑、地标、道路和公共交通线路。创建过程中,研究者们选择了旅行游记作为文本类型,因其能够包含大量地点提及和地理相关性,如共指和地理接近性。该数据集主要应用于地理信息分析,旨在通过文本分析推荐旅游景点和路线,解决旅游规划中的信息提取和定位问题。

The Arukikata Travelogue Dataset (ATD-MCL) is a Japanese travelogue dataset specifically designed for the evaluation of document-level place parsing systems. This dataset was collaboratively created by multiple institutions including the National Institute of Informatics (NII) of Japan. It contains 200 travelogue documents, totaling 12,171 place mentions, 6,339 coreference clusters, and 2,551 places linked to entries in geographic databases. The dataset covers a rich variety of places and facilities ranging from coarse-grained to fine-grained, including countries, cities, districts, amenity buildings, landmarks, roads, and public transit lines. During the dataset construction, researchers selected travelogues as the text type, as they can contain abundant place mentions and rich geographic relevance such as coreference relations and geographic proximity. This dataset is primarily applied in geographic information analysis, aiming to recommend tourist attractions and travel routes via text analysis, and to address information extraction and location positioning issues in travel planning.
提供机构:
日本国立情报学研究所
创建时间:
2023-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作