jarredparrett/deepparse_address_mutations_comb_3
收藏Hugging Face2024-12-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/jarredparrett/deepparse_address_mutations_comb_3
下载链接
链接失效反馈官方服务:
资源简介:
`deepparse_address_mutations_comb_3`数据集是一个用于改进地址匹配任务的强大数据集。该数据集通过对100,000个标注地址应用多达三种变异器生成,模拟了现实世界中的不一致、拼写错误和格式差异。变异分为正变异和负变异,分别保持语义正确性和模拟错误。数据集分为训练、验证和测试子集,每个子集包含不同的变异组合和数量。
The dataset, named `deepparse_address_mutations_comb_3`, provides a robust collection of address mutations designed to improve address matching tasks. The dataset was generated using combinations of up to three mutators applied to 100,000 annotated addresses, creating variations that simulate real-world inconsistencies, typos, and formatting differences. Mutations are categorized into positive (maintaining semantic correctness or equivalence) and negative (simulating real-world errors). The dataset includes features such as address, mutation components, compressed address, mutated address, and label. It is divided into training, validation, and testing subsets, containing 4,517,388, 968,012, and 968,012 samples respectively.
提供机构:
jarredparrett



