darkmatter2222/NLU-Redact-PII-v1
收藏Hugging Face2025-02-11 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/darkmatter2222/NLU-Redact-PII-v1
下载链接
链接失效反馈官方服务:
资源简介:
合成数据集是为了测试编辑和匿名化管道而生成的。它使用一系列生成器产生有效和故意无效的敏感数据格式,如姓名、卡号、账号、社会保障号码等,并将这些数据嵌入到通过LLM(Llama/Granite)生成的连贯句子中。数据集包含了多种敏感字段,并添加了噪声和故意变形的格式,以挑战模型在不同条件下的性能,从而开发出健壮的编辑和匿名化系统。
The synthetic dataset is generated for testing redaction and anonymization pipelines. It uses a suite of generators to produce both valid and intentionally invalid formats for sensitive data such as names, card numbers, account numbers, social security numbers, etc., and embeds them into coherent sentences generated by an LLM (Llama/Granite). The dataset includes a variety of sensitive fields and adds noise and deliberately malformed formats to challenge the performance of models under different conditions, ultimately developing robust redaction and anonymization systems.
提供机构:
darkmatter2222



