HebArabNlpProject/Hebrew-Paraphrase-Dataset

Name: HebArabNlpProject/Hebrew-Paraphrase-Dataset
Creator: HebArabNlpProject
Published: 2025-03-20 14:14:28
License: 暂无描述

Hugging Face2025-03-20 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/HebArabNlpProject/Hebrew-Paraphrase-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个高质量的海伯来语释义数据集，包含9785个例子，其中75%为段落级释义，25%为句子级释义。数据集覆盖了多种文本体裁，如百科全书条目、法律和对话文本、新闻等。数据集中有300个实例经过语言学家的人工验证，作为高质量的金标准示例。数据集通过自动过滤程序确保高质量，包括语言一致性、长度要求、生成失败和相似度检查。每个实例包括原始文本、释义文本、类型和是否为金标准集的标识。

This is a high-quality Hebrew paraphrase dataset consisting of 9785 instances, with 75% at the paragraph level and 25% at the sentence level. The dataset covers a variety of text genres, including encyclopedic entries, legal and conversational texts, news, and more. Among them, 300 instances have been manually validated by linguists as high-quality gold standard examples. The dataset ensures high quality through automatic filtering procedures, including language consistency, length requirements, generation failures, and similarity checks. Each instance includes the original text, paraphrased text, type, and an indication of whether it is part of the gold-standard subset.

提供机构：

HebArabNlpProject

5,000+

优质数据集

54 个

任务类型

进入经典数据集