Data and Code for Understanding Generative AI Content with Embedding Models
收藏DataCite Commons2025-09-10 更新2026-04-25 收录
下载链接:
https://www.osti.gov/servlets/purl/2587970
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains code for the experiments in the paper "Understanding Generative AI Content with Embedding Models". Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).
提供机构:
PNNL (PNNL2)
创建时间:
2025-09-10



