darkmatter2222/redact-v1
收藏Hugging Face2025-02-12 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/darkmatter2222/redact-v1
下载链接
链接失效反馈官方服务:
资源简介:
Redact-v1数据集是一个100%由合成数据构成的集合,所有元素都是人工生成的,不包含任何真实或外部来源的数据。数据集包含了多种人工生成的敏感数据类别,如人名、卡号、账号、社会安全号码、政府身份证号码、出生日期、密码、税号、电话号码、住址、电子邮件地址、IP地址、护照号码和驾驶执照号码等。这些数据被设计成带有一定的噪声,以模拟真实世界中的数据格式,用于训练模型在下游任务中进行稳健的上下文学习和有效的遮蔽。
The Redact-v1 dataset consists of 100% synthetic data, with every element artificially generated, containing no data from real or external sources. It includes various categories of artificially generated sensitive data such as peoples names, card numbers, account numbers, social security numbers, government-issued ID numbers, dates of birth, passwords, tax identification numbers, phone numbers, residential addresses, email addresses, IP addresses, passport numbers, and drivers license numbers. These data are designed with noise to simulate real-world data formatting for robust context-based learning and effective redaction in downstream tasks.
提供机构:
darkmatter2222



