five

Data and Code for Understanding Generative AI Content with Embedding Models

收藏
DataCite Commons2025-09-10 更新2026-04-25 收录
下载链接:
https://www.osti.gov/servlets/purl/2587970
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains code for the experiments in the paper "Understanding Generative AI Content with Embedding Models". Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).
提供机构:
PNNL (PNNL2)
创建时间:
2025-09-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作