tanaos/synthetic-text-anonymizer-dataset-v1
收藏Hugging Face2025-12-21 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-text-anonymizer-dataset-v1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由Tanaos使用Artifex Python库合成创建,旨在训练和评估用于文本匿名化的命名实体识别系统,即能够识别和编辑文本中的个人身份信息(PII)的模型。数据集包含带有命名实体标签的文本样本,每个样本由一句话或段落组成,实体按照以下类别逐词标注:`O`(无实体)、`PERSON`(个人、虚构角色)、`LOCATION`(地理区域)、`DATE`(绝对或相对日期,包括年、月、日)、`ADDRESS`(完整地址)、`PHONE_NUMBER`(电话号码)。标点符号和特殊字符未被标注。
This dataset was created synthetically by Tanaos with the Artifex Python library. The dataset is designed to train and evaluate Named Entity Recognition systems for Text Anonymization — models that can identify and redact Personally Identifiable Information (PII) from text. The dataset contains text samples labeled with named entity tags. Each sample consists of a sentence or paragraph with entities annotated, word-by-word, according to the following categories: `O` (No entity is present), `PERSON` (Individual people, fictional characters), `LOCATION` (Geographical areas), `DATE` (Absolute or relative dates, including years, months and/or days), `ADDRESS` (Full addresses), `PHONE_NUMBER` (Telephone numbers). Punctuation and special characters are not labeled.
提供机构:
tanaos



