xwjzds/paraphrase_collections
收藏数据集概述
数据集名称
Sentence Paraphase Collections
数据集描述
Sentence Paraphase 是一个结合了多种来源的句子改写任务的数据集,包括使用ChatGPT进行改写、Paraphrase Adversaries from Word Scrambling (PAWS) 和 STS benchmark。该数据集过滤掉了非英语、过短或相似度不高的配对。
数据集结构
-
特征(Features):
input:字符串类型output:字符串类型
-
数据实例(Data Instances):
-
示例:
{input: U.S. prosecutors have arrested more than 130 individuals and have seized more than $17 million in a continuing crackdown on Internet fraud and abuse., output: More than 130 people have been arrested and $17 million worth of property seized in an Internet fraud sweep announced Friday by three U.S. government agencies.}
-
数据集统计
- 类别计数:
- Paraphrase: 223241
数据集大小
- 下载大小:21377198字节
- 数据集大小:34347236字节
- 训练集:
- 字节数:34347236
- 示例数:223241
许可证信息
该数据集根据Creative Commons NonCommercial (CC BY-NC 4.0)许可证提供。
引用信息
@misc{xu2023detime, title={DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM}, author={Weijie Xu and Wenxiang Hu and Fanyou Wu and Srinivasan Sengamedu}, year={2023}, eprint={2310.15296}, archivePrefix={arXiv}, primaryClass={cs.CL} }



