bigcode-pii-dataset-training
收藏huggingface.co2025-03-25 收录
下载链接:
https://huggingface.co/datasets/bigcode/bigcode-pii-dataset-training
下载链接
链接失效反馈官方服务:
资源简介:
Bigcode PII Training Dataset
Dataset Description
This is the dataset used for the training of bigcode-pii-model (after training on pseudo-labeled data).
It is a concatenation of an early version of bigcode-pii-dataset which had less samples, and pii-for-code
(a dataset with 400 files we annotated in a previous iteration: MORE INFO TO BE ADDED).
Files with AMBIGUOUS and ID were excluded. Each PII subtype was remaped to it supertype.
Statistics
The… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/bigcode-pii-dataset-training.
本数据集用于大代码-PII 模型(在伪标签数据上训练后)的训练。该数据集为早期版本的大代码-PII 数据集(样本量较少)与 pii-for-code 数据集(包含我们在先前迭代中标注的400个文件:更多信息待补充)的拼接。排除包含模糊和ID标记的文件。每个个人识别信息(PII)子类型均被重新映射为其上级类型。
统计数据:请查阅数据集页面获取完整描述:https://huggingface.co/datasets/bigcode/bigcode-pii-dataset-training。
提供机构:
huggingface.co



