Geocoding Address Parsing Benchmark
收藏arXiv2023-12-15 更新2024-06-21 收录
下载链接:
https://github.com/zhengcongyin/Geocoding-Address-Parsing-Benchmark
下载链接
链接失效反馈官方服务:
资源简介:
Geocoding Address Parsing Benchmark数据集是由德州农工大学创建,包含超过239,000条地址记录,覆盖美国所有50个州和华盛顿特区。该数据集通过分析实际地理编码系统日志中的用户输入模式,合成了21种不同的输入错误和变体,用于评估地址解析技术。数据集分为训练、验证和测试三个子集,旨在为地理编码系统提供一个‘黄金标准’评估数据集,解决实际场景中多样化的用户输入质量问题。该数据集的应用领域包括城市规划、公共卫生等,旨在提高地理编码数据的准确性和可靠性。
Geocoding Address Parsing Benchmark dataset was developed by Texas A&M University. It contains over 239,000 address records covering all 50 U.S. states and Washington, D.C. By analyzing user input patterns from real geocoding system logs, this dataset synthesizes 21 distinct input errors and variants for evaluating address parsing technologies. The dataset is divided into three subsets: training, validation, and test sets. It is intended to serve as a "gold standard" evaluation dataset for geocoding systems, addressing the issue of diverse user input quality in real-world scenarios. Application areas of this dataset include urban planning, public health and other fields, with the aim of enhancing the accuracy and reliability of geocoding data.
提供机构:
德州农工大学
创建时间:
2023-10-23



