five

redis/llm-paraphrases

收藏
Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/redis/llm-paraphrases
下载链接
链接失效反馈
官方服务:
资源简介:
一个大规模合成的释义数据集,包含句子对,这些句子对在多个领域和写作风格中具有平衡的正负样本。数据集由Redis策划,Warris Gill共享,主要用于训练嵌入模型以进行语义缓存和释义检测。每个示例包含一对句子和一个二进制标签,指示它们是否是释义(语义等效)或不是。数据集包括正样本(保留原始意图的释义查询)和负样本(语义相关但不同的查询),使模型能够区分近重复查询和仅相关的查询。数据集完全为英文,使用Apache-2.0许可证。

A large-scale synthetically generated paraphrase dataset containing sentence pairs with balanced positive and negative examples across varied domains and writing styles. The dataset is curated by Redis and shared by Warris Gill, designed for training embedding models for semantic caching and paraphrase detection. Each example contains a pair of sentences with a binary label indicating whether they are paraphrases (semantically equivalent) or not. The dataset includes both positive samples (paraphrased queries retaining the original intent) and negative samples (semantically related but distinct queries), enabling models to distinguish near-duplicate queries from merely related ones. The dataset is entirely in English and licensed under Apache-2.0.
提供机构:
redis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作