LeakPro/synthetic_tab
收藏Hugging Face2025-01-21 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/LeakPro/synthetic_tab
下载链接
链接失效反馈官方服务:
资源简介:
Synthetic Text Anonymization Benchmark (TAB)数据集是一个合成的文本匿名化基准数据集,包含了1014个由欧洲人权法院(ECHR)生成的案例。该数据集是为了微调QLoRA(Quantized Low-Rank Adapter)模型而创建的,这些模型是基于mistralai/Mistral-7B-Instruct-v0.3。每个案例都是一个JSON文档,包含合成案例文本、生成器是否完成案例序列的标记以及文本的令牌长度。
The Synthetic Text Anonymization Benchmark (TAB) dataset is a collection of 1,014 synthetic cases generated from the European Court of Human Rights (ECHR). This dataset was created to fine-tune QLoRA (Quantized Low-Rank Adapter) models based on mistralai/Mistral-7B-Instruct-v0.3. Each case is a JSON document that includes the synthetic case text, a flag indicating whether the generator finished the case sequence, and the token length of the text.
提供机构:
LeakPro



