five

sapienzanlp/ghigliottinai

收藏
Hugging Face2024-09-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp/ghigliottinai
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: w1 dtype: string - name: w2 dtype: string - name: w3 dtype: string - name: w4 dtype: string - name: w5 dtype: string - name: choices sequence: string - name: label dtype: int64 splits: - name: train num_bytes: 6426 num_examples: 62 - name: test num_bytes: 57662 num_examples: 553 download_size: 44398 dataset_size: 64088 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # ghigliottinAI MCQA References: - https://ghigliottin-ai.github.io/ - https://nlp4fun.github.io/ Starting from two different EVALITA tasks, nlp4fun (EVALITA 2018) and ghigliottin-AI (EVALITA 2020), we collected cc. 600 different games extracted from TV show and from BOARDGAME of "L'Eredità". "La Ghigliottina" is a complex game, to be solved, it needs a very large comprehension of the italian cultural knowledge. It consists in: given five different, uncorrelated words, the solution is a word that is a shared concept between them. The original game itself is not well-posed, the solution is not unique, and list all the possible solution is not a affordable. We decided to reframe the problem as a Multi-choice QA, where four possible words are listed and between them all but one are incorrect answers to the game. ## Distractor Generation For each game the three distractor was chosen among all the possible italian words, the distractor was chosen to be aligned with 3 out of 5 hints and distant to the other ones (computing the cosine similarity in FastTest static embeddings). Moreover, the distractors was chosen to have lenght at most len(solution) + 1. With this setting, we created three different words that are not the possible solution of the game, making a task relativelly simple to be solved by humans, but not that much for Language Models. ## Example Here you can see the structure of the single sample in the present dataset. ```json { "w1": string, # text of the first hint "w2": string, # text of the second hint "w3": string, # text of the third hint "w4": string, # text of the fourth hint "w5": string, # text of the fifth hint "choices": list, # list of possible words, with the correct one plus 3 distractors "label": int, # index of the correct answer in the choices } ``` ## Statistics Training: 62 Test: 553 ## Proposed Prompts Here we will describe the prompt given to the model over which we will compute the perplexity score, as model's answer we will chose the prompt with lower perplexity. Moreover, for each subtask, we define a description that is prepended to the prompts, needed by the model to understand the task. Description of the task: ```txt Ti viene chiesto di risolvere il gioco della ghigliottina.\nIl gioco della ghigliottina consiste nel trovare un concetto che lega cinque parole date. Tale concetto è esprimibile tramite una singola parola.\n\n ``` ### MCQA style Prompt: ```txt Date le parole: {{w1}}, {{w2}}, {{w3}}, {{w4}}, {{w5}}\nDomanda: Quale tra i seguenti concetti è quello che lega le parole date?\nA. {{choices[0]}}\nB. {{choices[1]}}\nC. {{choices[2]}}\nD. {{choices[3]}}\nRisposta: ``` ### Cloze style In this case the gold answer is not the corresponding letter but the word itself. ```txt Date le parole: {{w1}}, {{w2}}, {{w3}}, {{w4}}, {{w5}}\nDomanda: Quale tra i seguenti concetti è quello che lega le parole date?\n{{choices[0]}}\n{{choices[1]}}\n{{choices[2]}}\n{{choices[3]}}\nRisposta: ``` ## Results Here some results are reported from the two prompting strategies | GhigliottinAI-MCQA | ACCURACY (5-shots) | | :-----: | :--: | | Gemma-2B | 23.86 | | QWEN2-1.5B | 39.24 | | Mistral-7B | 42.49 | | ZEFIRO | 40.86 | | Llama-3-8B | 46.65 | | Llama-3-8B-IT | 47.38 | | ANITA | 41.95 | | GhigliottinAI-CLOZE | ACCURACY_norm (5-shots) | | :-----: | :--: | | Gemma-2B | 35.08 | | QWEN2-1.5B | 33.81 | | Mistral-7B | 39.60 | | ZEFIRO | 41.22 | | Llama-3-8B | 43.39 | | Llama-3-8B-IT | 48.46 | | ANITA |48.64 | ## Acknowledge We would like to thank the authors of this resource for publicly releasing such an intriguing benchmark. Further, We want to thanks the student of [MNLP-2024 course](https://naviglinlp.blogspot.com/), where with their first homework tried different interesting prompting strategies, reframing strategies, and distractor generation approaches. The original dataset is freely available for download [link_1](https://github.com/ghigliottin-AI/ghigliottin-AI.github.io), [link_2](https://github.com/nlp4fun/nlp4fun.github.io). ## License No license found on original data.

数据集信息: 特征: - 名称:w1 数据类型:字符串 - 名称:w2 数据类型:字符串 - 名称:w3 数据类型:字符串 - 名称:w4 数据类型:字符串 - 名称:w5 数据类型:字符串 - 名称:choices 序列类型:字符串 - 名称:label 数据类型:int64 拆分: - 名称:训练集 字节数:6426 样本数:62 - 名称:测试集 字节数:57662 样本数:553 # ghigliottinAI MCQA 参考文献: - https://ghigliottin-ai.github.io/ - https://nlp4fun.github.io/ 基于两个不同的EVALITA任务——nlp4fun(EVALITA 2018)和ghigliottin-AI(EVALITA 2020),我们收集了约600个不同的游戏实例,这些实例来源于电视节目和《L'Eredità》桌游。 《La Ghigliottina》是一款复杂的游戏,要解决它需要对意大利文化知识有深入的理解。游戏规则是:给定五个不同且不相关的单词,找出一个能连接它们的共同概念词。原版游戏的设定不够严谨,解决方案并非唯一,且列出所有可能的解决方案也不现实。因此,我们将该问题重构为多选题问答(MCQA)任务,即列出四个可能的单词,其中只有一个是正确答案,其余为错误答案。 ## 干扰项生成 对于每个游戏实例,三个干扰项从所有可能的意大利语单词中选出,这些干扰项需与五个提示词中的三个相关联,同时与另外两个距离较远(通过FastText静态嵌入计算余弦相似度)。此外,干扰项的长度最多为正确答案长度加1。 ## 示例 以下是本数据集单个样本的结构: json { "w1": 字符串, # 第一个提示词文本 "w2": 字符串, # 第二个提示词文本 "w3": 字符串, # 第三个提示词文本 "w4": 字符串, # 第四个提示词文本 "w5": 字符串, # 第五个提示词文本 "choices": 列表, # 可能的单词列表,包含正确答案和三个干扰项 "label": 整数, # 正确答案在choices中的索引 } ## 统计信息 训练集:62个样本 测试集:553个样本 ## 提出的提示策略 此处描述给模型的提示,我们将基于这些提示计算困惑度得分,并选择困惑度较低的提示作为模型的答案。此外,对于每个子任务,我们定义了一个任务描述,前置到提示前,帮助模型理解任务。 任务描述: txt 请你解决“绞刑架”游戏。 “绞刑架”游戏的规则是找出一个能连接所给五个单词的概念,该概念可用单个单词表达。 ### 多选题问答风格 提示: txt 所给单词:{{w1}}, {{w2}}, {{w3}}, {{w4}}, {{w5}} 问题:以下哪个概念是连接所给单词的? A. {{choices[0]}} B. {{choices[1]}} C. {{choices[2]}} D. {{choices[3]}} 答案: ### 完形填空风格 在这种情况下,正确答案不是对应的字母,而是单词本身。 txt 所给单词:{{w1}}, {{w2}}, {{w3}}, {{w4}}, {{w5}} 问题:以下哪个概念是连接所给单词的? {{choices[0]}} {{choices[1]}} {{choices[2]}} {{choices[3]}} 答案: ## 结果 以下是两种提示策略的部分结果: | GhigliottinAI-MCQA | 准确率(5样本) | | :-----: | :--: | | Gemma-2B | 23.86 | | QWEN2-1.5B | 39.24 | | Mistral-7B | 42.49 | | ZEFIRO | 40.86 | | Llama-3-8B | 46.65 | | Llama-3-8B-IT | 47.38 | | ANITA | 41.95 | | GhigliottinAI-CLOZE | 标准化准确率(5样本) | | :-----: | :--: | | Gemma-2B | 35.08 | | QWEN2-1.5B | 33.81 | | Mistral-7B | 39.60 | | ZEFIRO | 41.22 | | Llama-3-8B | 43.39 | | Llama-3-8B-IT | 48.46 | | ANITA |48.64 | ## 致谢 我们感谢该资源的作者公开发布了这个引人入胜的基准数据集。 此外,我们感谢[MNLP-2024课程](https://naviglinlp.blogspot.com/)的学生,他们在第一次作业中尝试了多种有趣的提示策略、重构策略和干扰项生成方法。 原始数据集可免费下载:[链接1](https://github.com/ghigliottin-AI/ghigliottin-AI.github.io),[链接2](https://github.com/nlp4fun/nlp4fun.github.io)。 ## 许可证 原始数据未找到许可证。
提供机构:
sapienzanlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作