five

fredxlpy/LETZ

收藏
Hugging Face2024-06-16 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/fredxlpy/LETZ
下载链接
链接失效反馈
官方服务:
资源简介:
LETZ数据集用于卢森堡语的零样本分类任务,旨在通过利用卢森堡在线词典的数据来改进低资源语言的主题分类性能。数据集包含两个配置(LETZ-SYN和LETZ-WoT),每个配置都有训练、验证和测试集。每个数据集包含以下列:文本(卢森堡语句子或短语)、标签(潜在的主题标签)和类别(二元指示符,1表示相关,0表示不相关)。

LETZ数据集用于卢森堡语的零样本分类任务,旨在通过利用卢森堡在线词典的数据来改进低资源语言的主题分类性能。数据集包含两个配置(LETZ-SYN和LETZ-WoT),每个配置都有训练、验证和测试集。每个数据集包含以下列:文本(卢森堡语句子或短语)、标签(潜在的主题标签)和类别(二元指示符,1表示相关,0表示不相关)。
提供机构:
fredxlpy
原始信息汇总

Dataset Card for Luxembourgish Entailment-based Topic classification via Zero-shot learning (LETZ)

Dataset Summary

The datasets for Luxembourgish Entailment-based Topic classification via Zero-shot learning (LETZ) can be used to adapt language models to zero-shot classification in Luxembourgish. It leverages data from the Luxembourg Online Dictionary to provide relevant topic classification examples in Luxembourgish. The LETZ datasets were created to address the limitations of using Natural Language Inference (NLI) datasets for zero-shot classification in low-resource languages. Specifically, they aim to improve topic classification performance by providing more relevant and accessible data through dictionary entries.

Columns in the Dataset

Each dataset includes the following columns:

  • Text: The Luxembourgish sentence or phrase.
  • Label: The potentially associated topic label.
  • Class: A binary indicator where “1” denotes relevance (entailment) and “0” denotes irrelevance (non-entailment).

Dataset Description

Source Data

The original Luxembourg Online Dictionary (LOD) data can be downloaded from the Luxembourgish Open Data Platform or can be accessed via their API. All of their data is available under a Creative Commons Zero (CC0) license.

Citation Information

@inproceedings{philippy-etal-2024-forget, title = "Forget {NLI}, Use a Dictionary: Zero-Shot Topic Classification for Low-Resource Languages with Application to {L}uxembourgish", author = "Philippy, Fred and Haddadan, Shohreh and Guo, Siwen", editor = "Melero, Maite and Sakti, Sakriani and Soria, Claudia", booktitle = "Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024", month = may, year = "2024", address = "Torino, Italia", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.sigul-1.13", pages = "97--104" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作