five

jarredparrett/deepparse_address_mutations_comb_3

收藏
Hugging Face2024-12-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/jarredparrett/deepparse_address_mutations_comb_3
下载链接
链接失效反馈
官方服务:
资源简介:
`deepparse_address_mutations_comb_3`数据集是一个用于改进地址匹配任务的强大数据集。该数据集通过对100,000个标注地址应用多达三种变异器生成,模拟了现实世界中的不一致、拼写错误和格式差异。变异分为正变异和负变异,分别保持语义正确性和模拟错误。数据集分为训练、验证和测试子集,每个子集包含不同的变异组合和数量。

The dataset, named `deepparse_address_mutations_comb_3`, provides a robust collection of address mutations designed to improve address matching tasks. The dataset was generated using combinations of up to three mutators applied to 100,000 annotated addresses, creating variations that simulate real-world inconsistencies, typos, and formatting differences. Mutations are categorized into positive (maintaining semantic correctness or equivalence) and negative (simulating real-world errors). The dataset includes features such as address, mutation components, compressed address, mutated address, and label. It is divided into training, validation, and testing subsets, containing 4,517,388, 968,012, and 968,012 samples respectively.
提供机构:
jarredparrett
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作