five

Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data

收藏
Hugging Face2025-11-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Disclosures-SSRC/Detecting-Access-Violations-in-a-LLMs-Pre-Training-Data
下载链接
链接失效反馈
官方服务:
资源简介:
# Beyond Public Access in LLM Pre-Training Data The official HuggingFace repository for the paper "Beyond Public Access in LLM Pre-Training Data" by [The AI Disclosures Project](https://www.ssrc.org/programs/ai-disclosures-project/). Using a legally obtained dataset of 34 copyrighted O'Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI's large language models were trained on copyrighted content without consent.
提供机构:
Disclosures-SSRC
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作