five

中文NLP地址要素解析

收藏
阿里云天池2026-06-03 更新2024-09-21 收录
下载链接:
https://tianchi.aliyun.com/dataset/186578
下载链接
链接失效反馈
官方服务:
资源简介:
地址要素解析是将地址文本拆分成独立语义的要素,并对这些要素进行类型识别的过程。地址要素解析与地址相关性共同构成了中文地址处理两大核心任务,具有很大的商业价值。目前中文地址领域缺少标准的评测和数据集,这次我们将开放较大规模的标注语料,希望和社区共同推动地址文本处理领域的发展。标注数据集由训练集、验证集和测试集组成,整体标注数据大约2万条左右

Address element parsing refers to the process of splitting an address text into semantically independent elements and identifying the types of these elements. Address element parsing and address relevance together constitute the two core tasks of Chinese address processing, which possess significant commercial value. Currently, there is a lack of standard evaluation benchmarks and datasets in the Chinese address processing domain. Here, we will release a large-scale annotated corpus, with the hope of jointly advancing the development of the address text processing field with the community. The annotated dataset is composed of a training set, a validation set and a test set, with approximately 20,000 annotated entries in total.
提供机构:
阿里云天池
创建时间:
2024-09-20
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集专注于中文地址要素解析任务,旨在将地址文本拆分为独立语义要素并进行类型识别,以弥补中文地址领域标准评测和数据集的不足。数据集包含约2万条标注语料,分为训练集、验证集和测试集,采用BIEO标签体系进行标注,适用于推动中文地址处理技术的发展。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务