Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data

Name: Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data
Creator: Disclosures-SSRC
Published: 2025-11-18 19:45:57
License: 暂无描述

Hugging Face2025-11-18 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data

下载链接

链接失效反馈

官方服务：

资源简介：

# Beyond Public Access in LLM Pre-Training Data The official HuggingFace repository for the paper "Beyond Public Access in LLM Pre-Training Data" by [The AI Disclosures Project](https://www.ssrc.org/programs/ai-disclosures-project/). Using a legally obtained dataset of 34 copyrighted O'Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI's large language models were trained on copyrighted content without consent.

提供机构：

Disclosures-SSRC

5,000+

优质数据集

54 个

任务类型

进入经典数据集