sapienzanlp/pretens
收藏Hugging Face2024-10-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp/pretens
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 314451
num_examples: 5837
- name: test
num_bytes: 839852
num_examples: 14560
download_size: 345578
dataset_size: 1154303
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
# Presupposed Taxonomies: Evaluating Neural Network Semantics (PreTENS)
Original Paper: https://aclanthology.org/2022.semeval-1.29.pdf
This dataset comes from SemEVAL-2022 shared tasks.
The PreTENS task aims at focusing on semantic competence with specific attention on the evaluation of language models with respect to the recognition of appropriate taxonomic relations between two nominal arguments.
We collected the Italian part of the original dataset, and more specifically only the first sub-task: **acceptability sentence classification**.
## Example
Here you can see the structure of the single sample in the present dataset.
```json
{
"text": string, # sample's text
"label": int, # 0: non ha senso, 1: ha senso
}
```
## Statitics
| PRETENS | 0 | 1 |
| :--------: | :----: | :----: |
| Training | 3029 | 2808 |
| Test | 7707 | 6853 |
## Proposed Prompts
Here we will describe the prompt given to the model over which we will compute the perplexity score, as model's answer we will chose the prompt with lower perplexity.
Moreover, for each subtask, we define a description that is prepended to the prompts, needed by the model to understand the task.
Description of the task: "Indica se le seguenti frasi hanno senso.\n\n"
### Cloze Style:
Label (**non ha senso**): "{{text}}\nLa frase precedente non ha senso"
Label (**ha senso**): "{{text}}\nLa frase precedente ha senso"
### MCQA Style:
```txt
{{text}}\nDomanda: La frase precedente ha senso senso? Rispondi sì o no:
```
## Results
The following results are given by the Cloze-style prompting over some english and italian-adapted LLMs.
| PRETENS | ACCURACY (15-shots) |
| :-----: | :--: |
| Gemma-2B | 53.5 |
| QWEN2-1.5B | 56.47 |
| Mistral-7B | 66.5 |
| ZEFIRO | 62 |
| Llama-3-8B | 72.34 |
| Llama-3-8B-IT | 65.58 |
| ANITA | 66.1 |
## Aknowledgement
We would like to thank the authors of this resource for publicly releasing such an intriguing benchmark.
Additionally, we extend our gratitude to the students of the [MNLP-2024 course](https://naviglinlp.blogspot.com/), whose first homework explored various interesting prompting strategies.
The original dataset is freely available for download [here](https://github.com/shammur/SemEval2022Task3).
## License
The data come under [MIT](https://opensource.org/license/mit) license.
数据集信息:
特征:
- 名称:文本(text),数据类型:字符串(string)
- 名称:标签(label),数据类型:64位整数(int64)
划分集:
- 名称:训练集(train),字节数:314451,样本数:5837
- 名称:测试集(test),字节数:839852,样本数:14560
下载大小:345578,数据集总大小:1154303
配置项:
- 配置名称:默认(default),数据文件:
- 划分集:训练集,路径:data/train-*
- 划分集:测试集,路径:data/test-*
# 预设分类体系:评估神经网络语义(Presupposed Taxonomies: Evaluating Neural Network Semantics,简称PreTENS)
**原始论文**:https://aclanthology.org/2022.semeval-1.29.pdf
本数据集源自SemEval-2022共享任务赛道。
PreTENS任务聚焦于语义能力评估,具体针对语言模型识别两个名词性成分间恰当分类关系的能力展开评测。
我们采集了原始数据集的意大利语部分,且仅选取其中首个子任务:**可接受性句子分类**。
## 示例
此处展示当前数据集单条样本的结构:
json
{
"text": string, # 样本文本
"label": int, # 0: 无意义,1: 有意义
}
## 统计信息
| PRETENS | 0 | 1 |
| :--------: | :----: | :----: |
| 训练集 | 3029 | 2808 |
| 测试集 | 7707 | 6853 |
## 预设提示策略
本节将介绍用于模型的提示模板,我们将通过计算困惑度(perplexity)来选择困惑度更低的提示作为模型输出结果。此外,针对每个子任务,我们预先定义了一段任务描述作为提示前缀,帮助模型理解任务要求。
任务描述:"请判断以下句子是否符合语义。
"
### 完形填空式提示
标签(**无意义**):"{{text}}
前述句子无意义"
标签(**有意义**):"{{text}}
前述句子符合语义"
### 多项选择问答式提示
txt
{{text}}
问题:前述句子是否符合语义?请回答“是”或“否”:
## 实验结果
以下结果为使用完形填空式提示,在若干英文及意大利语适配型大语言模型(Large Language Model,简称LLM)上得到的评测结果。
| PRETENS | 15样本少样本(15-shots)准确率 |
| :-----: | :--: |
| Gemma-2B | 53.5 |
| QWEN2-1.5B | 56.47 |
| Mistral-7B | 66.5 |
| ZEFIRO | 62 |
| Llama-3-8B | 72.34 |
| Llama-3-8B-IT | 65.58 |
| ANITA | 66.1 |
## 致谢
我们谨向公开发布这一优质评测基准的原作者致以诚挚谢意。此外,我们感谢[MNLP-2024课程](https://naviglinlp.blogspot.com/)的全体学员,他们在首次课程作业中探索了多种新颖的提示策略。
原始数据集可在[此处](https://github.com/shammur/SemEval2022Task3)免费下载。
## 许可证
本数据集采用[MIT开源许可证](https://opensource.org/license/mit)进行分发。
提供机构:
sapienzanlp



