Data and Code for Understanding Generative AI Content with Embedding Models

Name: Data and Code for Understanding Generative AI Content with Embedding Models
Creator: PNNL (PNNL2)
Published: 2025-09-10 19:36:04
License: 暂无描述

DataCite Commons2025-09-10 更新2026-04-25 收录

下载链接：

https://www.osti.gov/servlets/purl/2587970

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains code for the experiments in the paper "Understanding Generative AI Content with Embedding Models". Constructing high-quality features is critical to any quantitative data analysis. While feature engineering was historically addressed by carefully hand-crafting data representations based on domain expertise, deep neural networks (DNNs) now offer a radically different approach. DNNs implicitly engineer features by transforming their input data into hidden feature vectors called embeddings. For embedding vectors produced by foundation models -- which are trained to be useful across many contexts -- we demonstrate that simple and well-studied dimensionality-reduction techniques such as Principal Component Analysis uncover inherent heterogeneity in input data concordant with human-understandable explanations. Of the many applications for this framework, we find empirical evidence that there is intrinsic separability between real samples and those generated by artificial intelligence (AI).

提供机构：

PNNL (PNNL2)

创建时间：

2025-09-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集