Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data
收藏Hugging Face2025-11-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data
下载链接
链接失效反馈官方服务:
资源简介:
# Beyond Public Access in LLM Pre-Training Data
The official HuggingFace repository for the paper "Beyond Public Access in LLM Pre-Training Data" by [The AI Disclosures Project](https://www.ssrc.org/programs/ai-disclosures-project/).
Using a legally obtained dataset of 34 copyrighted O'Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI's large language models were trained on copyrighted content without consent.
提供机构:
Disclosures-SSRC



