mertcobanov/all-nli-triplets-turkish

Name: mertcobanov/all-nli-triplets-turkish
Creator: mertcobanov
Published: 2024-12-19 06:54:01
License: 暂无描述

Hugging Face2024-12-19 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/mertcobanov/all-nli-triplets-turkish

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个双语（英语和土耳其语）版本的`sentence-transformers/all-nli`数据集。它提供了英语和土耳其语翻译的句子三元组，适用于训练和评估多语言及土耳其语特定的自然语言理解（NLU）模型。每个三元组包括一个锚点句子、一个语义相似的正面句子和一个语义不相似的负面句子。数据集支持自然语言推理（NLI）、语义相似性和多语言句子嵌入等任务。数据集包含六个列，分别是锚点句子、正面句子、负面句子及其土耳其语翻译。数据集分为训练集、测试集和开发集三个部分。数据集的创建基于`sentence-transformers/all-nli`数据集，通过机器翻译模型将英语三元组翻译成土耳其语，并进行了质量检查以确保语义一致性。该数据集的创建旨在解决土耳其语在自然语言处理（NLP）中资源不足的问题，支持多语言句子嵌入、语义相似性和土耳其语NLU等任务。

This dataset is a bilingual (English and Turkish) version of the `sentence-transformers/all-nli` dataset. It provides triplets of sentences in both English and their corresponding Turkish translations, making it suitable for training and evaluating multilingual and Turkish-specific Natural Language Understanding (NLU) models. Each triplet consists of an anchor sentence, a positive sentence (semantically similar to the anchor), and a negative sentence (semantically dissimilar to the anchor). The dataset enables tasks such as Natural Language Inference (NLI), semantic similarity, and multilingual sentence embedding. The dataset contains six columns: anchor, positive, negative, and their Turkish translations. It is divided into three splits: train, test, and dev. The dataset is based on the `sentence-transformers/all-nli` dataset, with English triplets translated into Turkish using a machine translation model and quality checks performed to ensure semantic consistency. The dataset was created to address the lack of Turkish resources in Natural Language Processing (NLP) and supports tasks such as multilingual sentence embedding, semantic similarity, and Turkish NLU.

提供机构：

mertcobanov

5,000+

优质数据集

54 个

任务类型

进入经典数据集