aisingapore/linguistic_diagnostics-syntax
收藏Hugging Face2024-12-20 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/aisingapore/linguistic_diagnostics-syntax
下载链接
链接失效反馈官方服务:
资源简介:
LINDSEA Syntax是一个用于评估大型语言模型(LLMs)对印尼语语言现象,特别是语法理解的语言诊断工具。该数据集仅包含印尼语的分割,并提供了少量示例的分割。数据集包含380个示例的主要分割和5个示例的少量示例分割。每个示例包含句子对、提示模板和元数据,元数据包括语言、语言现象、类别、子类别、正确和错误的句子以及是否打乱的布尔值。数据集的总大小为206821字节,下载大小为42471字节。
The LINDSEA Syntax dataset is a linguistic diagnostic tool specifically designed to evaluate a models understanding of linguistic phenomena, particularly syntax, in Indonesian. The dataset includes splits for Indonesian and contains fewshot examples. The features of the dataset include ID, label, prompts, prompt templates, and metadata, which contains information about language, linguistic phenomenon, category, subcategory, correct, wrong, and whether it is shuffled. The dataset statistics show the number of examples in different splits and the number of tokens for different models (such as GPT-4o, Gemma 2, Llama 3). The dataset is sourced from BHASA and is licensed under CC BY 4.0.
提供机构:
aisingapore



