Large Language Model-Driven Narrative Generation Study Data: ChatGPT-Generated Narratives, Real Tweets, and Source Code
收藏DataCite Commons2025-05-01 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/nyxndvwfsh
下载链接
链接失效反馈官方服务:
资源简介:
In the interests of advancing Large Language Models (LLMs) usage in engineering, science, and medicine, and other fields, we provide the data sets and code associated with the Structured Narrative Prompt for LLMs Study. Data for this study was generated using an Agent-Based Model (ABM), the LLM ChatGPT, and using a set of tweets previously collected from Twitter. To facilitate reproducibility, transparency, and reuse of our work, this repository includes:
(1) Simulation-related code and data for generating simulated agents' life events
(a) output from the Java ABM simulation, including the ABM-generated narratives and associated life-event information
(2) ChatGPT-related code and data
(a) the Python script that generates structured prompts for ChatGPT from the ABM-generated life events
(b) the set of generated structured prompts (inputs) for ChatGPT, (used to generate the LLM narratives)
(c) the Python script that submits the structured prompts to ChatGPT via the API
(d) the set of ChatGPT-generated narratives
(e) the Python script that combines ChatGPT (output) narratives with the ABM simulation narratives, in preparation for PANAS sentiment analysis
(3) Analysis-related code and data
(a) the PANAS sentiment analysis R scripts
(b) the statistical significance test R scripts (Chi-squared test and Fisher's exact test), used for finding significant differences in sentiment scoring among ABM-generated narratives, LLM-generated narratives, and the real tweets
(a) the PANAS lexicon used for the sentiment analysis
(b) the set of utilized tweets with PII removed
(c) the approved IRB documentation for collecting those tweets
Folder Names/Breakdown for Data File section:
1. LLM-related Scripts and Data: LLM_Phase_Scripts_and_Data.zip
2. Analysis-related Scripts and Data: Analysis_Phase_Scripts_and_Data.zip
为推动大语言模型(Large Language Models, LLMs)在工程、科学、医学及其他领域的应用,我们发布了与面向大语言模型研究的结构化叙事提示相关的数据集与代码。本研究的数据依托智能体模型(Agent-Based Model, ABM)、大语言模型ChatGPT,以及此前从Twitter采集的推文集合生成。为保障研究的可复现性、透明度与成果复用性,本仓库包含以下内容:
(1) 智能体人生事件模拟生成相关的代码与数据
(a) Java版智能体模型模拟输出结果,包含智能体模型生成的叙事文本与关联的人生事件信息
(2) ChatGPT相关代码与数据
(a) 用于从智能体模型生成的人生事件中为ChatGPT构建结构化提示的Python脚本
(b) 用于ChatGPT的结构化提示集合(输入数据,用于生成大语言模型叙事文本)
(c) 通过API向ChatGPT提交结构化提示的Python脚本
(d) ChatGPT生成的叙事文本集合
(e) 用于将ChatGPT输出的叙事文本与智能体模型生成的叙事文本进行合并,以开展积极与消极情感量表(Positive and Negative Affect Schedule, PANAS)情感分析的Python脚本
(3) 分析相关代码与数据
(a) 积极与消极情感量表(Positive and Negative Affect Schedule, PANAS)情感分析词典
(b) 已移除个人可识别信息(Personally Identifiable Information, PII)的目标推文集合
(c) 用于采集该推文的伦理审查委员会(Institutional Review Board, IRB)批准文件
(d) 积极与消极情感量表情感分析所用的R脚本
(e) 用于检验情感评分差异的R脚本(卡方检验(Chi-squared test)与费希尔精确检验(Fisher's exact test)),用于识别智能体模型生成的叙事、大语言模型生成的叙事与真实推文之间的情感评分显著性差异
数据文件的文件夹命名与分类如下:
1. 大语言模型相关脚本与数据:LLM_Phase_Scripts_and_Data.zip
2. 分析阶段相关脚本与数据:Analysis_Phase_Scripts_and_Data.zip
提供机构:
Mendeley Data
创建时间:
2023-11-15



