HCVAR: PARAPHRASED, ORIGINAL (GPT,LLAMA,MISTRAL)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14875082
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Overview:------------------
We have randomly sampled 2000 samples for each of the following domains from the HCVAR data.
Essay
News
Q&A
Review
It originally contained machine-generated text by GPT. We additionally generated using Llama and Mistral as well.
We paraphrased a portion of the AI-generated text using a Pegasus-based paraphraser. (details are given below)
Paraphrasing Information:--------------------------
50% of the AI-generated text in the dataset is paraphrased, while the other half remains unchanged as originally generated.
The selection of AI text for paraphrasing is random.
Dataset Structure:------------------Each file in the dataset contains the following required columns:
original_text: The non-paraphrased text.
label: - 1: Human-generated text - 0: AI-generated text
is_selected: - 1: The AI text was selected for paraphrasing - 0: The text was not selected for paraphrasing
text: The paraphrased version of the AI-generated text (if applicable), non-paraphrased otherwise.
创建时间:
2025-02-19



