five

tanaos/synthetic-text-anonymizer-dataset-v1

收藏
Hugging Face2025-12-21 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-text-anonymizer-dataset-v1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集由Tanaos使用Artifex Python库合成创建,旨在训练和评估用于文本匿名化的命名实体识别系统,即能够识别和编辑文本中的个人身份信息(PII)的模型。数据集包含带有命名实体标签的文本样本,每个样本由一句话或段落组成,实体按照以下类别逐词标注:`O`(无实体)、`PERSON`(个人、虚构角色)、`LOCATION`(地理区域)、`DATE`(绝对或相对日期,包括年、月、日)、`ADDRESS`(完整地址)、`PHONE_NUMBER`(电话号码)。标点符号和特殊字符未被标注。

This dataset was created synthetically by Tanaos with the Artifex Python library. The dataset is designed to train and evaluate Named Entity Recognition systems for Text Anonymization — models that can identify and redact Personally Identifiable Information (PII) from text. The dataset contains text samples labeled with named entity tags. Each sample consists of a sentence or paragraph with entities annotated, word-by-word, according to the following categories: `O` (No entity is present), `PERSON` (Individual people, fictional characters), `LOCATION` (Geographical areas), `DATE` (Absolute or relative dates, including years, months and/or days), `ADDRESS` (Full addresses), `PHONE_NUMBER` (Telephone numbers). Punctuation and special characters are not labeled.
提供机构:
tanaos
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作