jaroslawjanas/open-generation-data
收藏Hugging Face2025-12-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/jaroslawjanas/open-generation-data
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- text
- benchmarks
- ai-detection
- paraphrase
- retrieval
pretty_name: Open Generation Data
task_categories:
- text-generation
license: apache-2.0
size_categories:
- 1K<n<10K
---
# 🔍 AI Detection Paraphrases — Inputs Dataset
This dataset originates from the research paper:
> **Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense**
> Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer
> 📄 [arXiv:2303.13408](https://arxiv.org/abs/2303.13408)
---
## 📦 Dataset Details
| Property | Value |
|----------|-------|
| **Split** | `input` |
| **Rows** | 7,711 |
| **Format** | Parquet |
| **License** | Apache 2.0 |
### Schema
| Column | Type | Description |
|--------|------|-------------|
| `prefix` | `string` | Context/prompt text |
| `targets` | `list[string]` | List of candidate continuations |
| `scores` | `list[float64]` | Scores for each target |
---
## 🔗 Original Sources
- **GitHub**: [martiansideofthemoon/ai-detection-paraphrases](https://github.com/martiansideofthemoon/ai-detection-paraphrases)
- **Google Drive**: [Original data files](https://drive.google.com/drive/folders/1mPROenBB0fzLO9AX4fe71k0UYv0xt3X1)
> ⚠️ **Note**: Only the `inputs.jsonl` file from the original release is included in this repository.
---
## 📖 Citation
```bibtex
@misc{krishna2023paraphrasingevadesdetectorsaigenerated,
title={Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense},
author={Kalpesh Krishna and Yixiao Song and Marzena Karpinska and John Wieting and Mohit Iyyer},
year={2023},
eprint={2303.13408},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2303.13408},
}
```
---
## 📜 License
This dataset is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), consistent with the original GitHub repository.
提供机构:
jaroslawjanas



