five

LinguisticFootprintsOfChatGPT

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11109704
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset was produced for the research paper "Tracing Linguistic Footprints of ChatGPT Across Tasks, Domains and Personas in English and German." The project explores how the output of large language models like ChatGPT differs from human-generated text and analyzes the impact of task-specific prompting on linguistic features in both English and German texts.   The dataset contains human-written files collected from a number of publicly available datasets as well as their counterparts, generated by ChatGPT. The human data comes from the following corpora: E3C: The European Clinical Case Corpus (Minard et al., 2021) GGPONC: The German Guideline Program in Oncology NLP Corpus (Borchert et al., 2022) 20 Minuten: articles from a free Swiss daily newspaper (Kew et al., 2023) CNN: articles (Hermann et al., 2015) CSB: The Credit Suisse Bulletin corpus (Volk et al., 2016)   Additionally, more original human texts were collected from the PubMed Central Database and The Zurich Open Repository and Archive. The generated texts were produced by ChatGPT-3 under 3 distinct tasks, to continue generation, to explain text, and to create a new text. Depending on the task, the prompts contained different sections of the original human text. The completion and creation tasks processed the title and the 1st paragraph. For the explanation task the model was provided with the main part of the text. For more information see our paper at tbd Code: https://github.com/shaitarAn/LinguisticFootprintsChatGPT
创建时间:
2024-05-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作