five

jaroslawjanas/open-generation-data

收藏
Hugging Face2025-12-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/jaroslawjanas/open-generation-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - text - benchmarks - ai-detection - paraphrase - retrieval pretty_name: Open Generation Data task_categories: - text-generation license: apache-2.0 size_categories: - 1K<n<10K --- # 🔍 AI Detection Paraphrases — Inputs Dataset This dataset originates from the research paper: > **Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense** > Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer > 📄 [arXiv:2303.13408](https://arxiv.org/abs/2303.13408) --- ## 📦 Dataset Details | Property | Value | |----------|-------| | **Split** | `input` | | **Rows** | 7,711 | | **Format** | Parquet | | **License** | Apache 2.0 | ### Schema | Column | Type | Description | |--------|------|-------------| | `prefix` | `string` | Context/prompt text | | `targets` | `list[string]` | List of candidate continuations | | `scores` | `list[float64]` | Scores for each target | --- ## 🔗 Original Sources - **GitHub**: [martiansideofthemoon/ai-detection-paraphrases](https://github.com/martiansideofthemoon/ai-detection-paraphrases) - **Google Drive**: [Original data files](https://drive.google.com/drive/folders/1mPROenBB0fzLO9AX4fe71k0UYv0xt3X1) > ⚠️ **Note**: Only the `inputs.jsonl` file from the original release is included in this repository. --- ## 📖 Citation ```bibtex @misc{krishna2023paraphrasingevadesdetectorsaigenerated, title={Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense}, author={Kalpesh Krishna and Yixiao Song and Marzena Karpinska and John Wieting and Mohit Iyyer}, year={2023}, eprint={2303.13408}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2303.13408}, } ``` --- ## 📜 License This dataset is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), consistent with the original GitHub repository.
提供机构:
jaroslawjanas
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作