PolyAI/banking77
收藏Hugging Face2024-09-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PolyAI/banking77
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
extended:
- original
language_creators:
- expert-generated
language:
- en
license:
- cc-by-4.0
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- intent-classification
- multi-class-classification
paperswithcode_id: null
pretty_name: BANKING77
---
# Dataset Card for BANKING77
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets)
- **Repository:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets)
- **Paper:** [ArXiv](https://arxiv.org/abs/2003.04807)
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
Dataset composed of online banking queries annotated with their corresponding intents.
BANKING77 dataset provides a very fine-grained set of intents in a banking domain.
It comprises 13,083 customer service queries labeled with 77 intents.
It focuses on fine-grained single-domain intent detection.
### Supported Tasks and Leaderboards
Intent classification, intent detection
### Languages
English
## Dataset Structure
### Data Instances
An example of 'train' looks as follows:
```
{
'label': 11, # integer label corresponding to "card_arrival" intent
'text': 'I am still waiting on my card?'
}
```
### Data Fields
- `text`: a string feature.
- `label`: One of classification labels (0-76) corresponding to unique intents.
Intent names are mapped to `label` in the following way:
| label | intent (category) |
|---:|:-------------------------------------------------|
| 0 | activate_my_card |
| 1 | age_limit |
| 2 | apple_pay_or_google_pay |
| 3 | atm_support |
| 4 | automatic_top_up |
| 5 | balance_not_updated_after_bank_transfer |
| 6 | balance_not_updated_after_cheque_or_cash_deposit |
| 7 | beneficiary_not_allowed |
| 8 | cancel_transfer |
| 9 | card_about_to_expire |
| 10 | card_acceptance |
| 11 | card_arrival |
| 12 | card_delivery_estimate |
| 13 | card_linking |
| 14 | card_not_working |
| 15 | card_payment_fee_charged |
| 16 | card_payment_not_recognised |
| 17 | card_payment_wrong_exchange_rate |
| 18 | card_swallowed |
| 19 | cash_withdrawal_charge |
| 20 | cash_withdrawal_not_recognised |
| 21 | change_pin |
| 22 | compromised_card |
| 23 | contactless_not_working |
| 24 | country_support |
| 25 | declined_card_payment |
| 26 | declined_cash_withdrawal |
| 27 | declined_transfer |
| 28 | direct_debit_payment_not_recognised |
| 29 | disposable_card_limits |
| 30 | edit_personal_details |
| 31 | exchange_charge |
| 32 | exchange_rate |
| 33 | exchange_via_app |
| 34 | extra_charge_on_statement |
| 35 | failed_transfer |
| 36 | fiat_currency_support |
| 37 | get_disposable_virtual_card |
| 38 | get_physical_card |
| 39 | getting_spare_card |
| 40 | getting_virtual_card |
| 41 | lost_or_stolen_card |
| 42 | lost_or_stolen_phone |
| 43 | order_physical_card |
| 44 | passcode_forgotten |
| 45 | pending_card_payment |
| 46 | pending_cash_withdrawal |
| 47 | pending_top_up |
| 48 | pending_transfer |
| 49 | pin_blocked |
| 50 | receiving_money |
| 51 | Refund_not_showing_up |
| 52 | request_refund |
| 53 | reverted_card_payment? |
| 54 | supported_cards_and_currencies |
| 55 | terminate_account |
| 56 | top_up_by_bank_transfer_charge |
| 57 | top_up_by_card_charge |
| 58 | top_up_by_cash_or_cheque |
| 59 | top_up_failed |
| 60 | top_up_limits |
| 61 | top_up_reverted |
| 62 | topping_up_by_card |
| 63 | transaction_charged_twice |
| 64 | transfer_fee_charged |
| 65 | transfer_into_account |
| 66 | transfer_not_received_by_recipient |
| 67 | transfer_timing |
| 68 | unable_to_verify_identity |
| 69 | verify_my_identity |
| 70 | verify_source_of_funds |
| 71 | verify_top_up |
| 72 | virtual_card_not_working |
| 73 | visa_or_mastercard |
| 74 | why_verify_identity |
| 75 | wrong_amount_of_cash_received |
| 76 | wrong_exchange_rate_for_cash_withdrawal |
### Data Splits
| Dataset statistics | Train | Test |
| --- | --- | --- |
| Number of examples | 10 003 | 3 080 |
| Average character length | 59.5 | 54.2 |
| Number of intents | 77 | 77 |
| Number of domains | 1 | 1 |
## Dataset Creation
### Curation Rationale
Previous intent detection datasets such as Web Apps, Ask Ubuntu, the Chatbot Corpus or SNIPS are limited to small number of classes (<10), which oversimplifies the intent detection task and does not emulate the true environment of commercial systems. Although there exist large scale *multi-domain* datasets ([HWU64](https://github.com/xliuhw/NLU-Evaluation-Data) and [CLINC150](https://github.com/clinc/oos-eval)), the examples per each domain may not sufficiently capture the full complexity of each domain as encountered "in the wild". This dataset tries to fill the gap and provides a very fine-grained set of intents in a *single-domain* i.e. **banking**. Its focus on fine-grained single-domain intent detection makes it complementary to the other two multi-domain datasets.
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
The dataset does not contain any additional annotations.
#### Who are the annotators?
[N/A]
### Personal and Sensitive Information
[N/A]
## Considerations for Using the Data
### Social Impact of Dataset
The purpose of this dataset it to help develop better intent detection systems.
Any comprehensive intent detection evaluation should involve both coarser-grained multi-domain datasets and a fine-grained single-domain dataset such as BANKING77.
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[PolyAI](https://github.com/PolyAI-LDN)
### Licensing Information
Creative Commons Attribution 4.0 International
### Citation Information
```
@inproceedings{Casanueva2020,
author = {I{\~{n}}igo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic},
title = {Efficient Intent Detection with Dual Sentence Encoders},
year = {2020},
month = {mar},
note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets},
url = {https://arxiv.org/abs/2003.04807},
booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020}
}
```
### Contributions
Thanks to [@dkajtoch](https://github.com/dkajtoch) for adding this dataset.
annotations_creators:
- 专家生成
extended:
- 原始数据集
language_creators:
- 专家生成
language:
- 英语
license:
- CC BY 4.0
multilinguality:
- 单语言
size_categories:
- 10000 < 样本数 < 100000
source_datasets:
- 原始数据集
task_categories:
- 文本分类
task_ids:
- 意图分类(intent-classification)
- 多类别分类(multi-class-classification)
paperswithcode_id: 无
pretty_name: BANKING77
# BANKING77数据集卡片
## 目录
- [数据集描述](#dataset-description)
- [数据集概述](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据样例](#data-instances)
- [数据字段](#data-fields)
- [数据划分](#data-splits)
- [数据集构建](#dataset-creation)
- [构建初衷](#curation-rationale)
- [源数据](#source-data)
- [注释](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据集使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献致谢](#contributions)
## 数据集描述
- **主页:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets)
- **代码仓库:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets)
- **论文:** [ArXiv](https://arxiv.org/abs/2003.04807)
- **排行榜:**
- **联系人:**
### 数据集概述
本数据集由标注了对应意图的线上银行客服查询文本构成。BANKING77数据集提供了银行领域下粒度极细的意图集合,包含13083条被标注为77种意图的客服查询文本,专注于细粒度单领域意图检测任务。
### 支持任务与排行榜
意图分类、意图检测
### 语言
英语
## 数据集结构
### 数据样例
训练集的一条样例如下所示:
{
'label': 11, # 与"card_arrival(卡片送达)"意图对应的整数标签
'text': '我还在等我的卡呢?'
}
### 数据字段
- `text`: 字符串类型特征。
- `label`: 对应唯一意图的分类标签(取值范围0-76)。
意图名称与标签的映射关系如下:
| 标签 | 意图(类别) |
|---:|:-------------------------------------------------|
| 0 | 激活我的卡片(activate_my_card) |
| 1 | 年龄限制(age_limit) |
| 2 | Apple Pay或Google Pay(apple_pay_or_google_pay) |
| 3 | ATM支持(atm_support) |
| 4 | 自动充值(automatic_top_up) |
| 5 | 银行转账后余额未更新(balance_not_updated_after_bank_transfer) |
| 6 | 支票或现金存款后余额未更新(balance_not_updated_after_cheque_or_cash_deposit) |
| 7 | 不允许添加收款人(beneficiary_not_allowed) |
| 8 | 取消转账(cancel_transfer) |
| 9 | 卡片即将过期(card_about_to_expire) |
| 10 | 卡片受理(card_acceptance) |
| 11 | 卡片送达(card_arrival) |
| 12 | 卡片配送预估(card_delivery_estimate) |
| 13 | 卡片绑定(card_linking) |
| 14 | 卡片无法使用(card_not_working) |
| 15 | 信用卡支付收取手续费(card_payment_fee_charged) |
| 16 | 未识别到信用卡支付记录(card_payment_not_recognised) |
| 17 | 信用卡支付汇率错误(card_payment_wrong_exchange_rate) |
| 18 | ATM吞卡(card_swallowed) |
| 19 | 取现收取手续费(cash_withdrawal_charge) |
| 20 | 未识别到取现记录(cash_withdrawal_not_recognised) |
| 21 | 修改PIN码(change_pin) |
| 22 | 卡片疑似被盗用(compromised_card) |
| 23 | 非接触支付无法使用(contactless_not_working) |
| 24 | 国家/地区支持情况(country_support) |
| 25 | 信用卡支付被拒(declined_card_payment) |
| 26 | 取现被拒(declined_cash_withdrawal) |
| 27 | 转账被拒(declined_transfer) |
| 28 | 未识别到直接借记支付记录(direct_debit_payment_not_recognised) |
| 29 | 虚拟借记卡限额(disposable_card_limits) |
| 30 | 修改个人信息(edit_personal_details) |
| 31 | 兑换收取手续费(exchange_charge) |
| 32 | 汇率问题(exchange_rate) |
| 33 | 通过应用办理兑换(exchange_via_app) |
| 34 | 账单收取额外费用(extra_charge_on_statement) |
| 35 | 转账失败(failed_transfer) |
| 36 | 法币支持情况(fiat_currency_support) |
| 37 | 获取虚拟借记卡(get_disposable_virtual_card) |
| 38 | 获取实体卡片(get_physical_card) |
| 39 | 申请备用卡片(getting_spare_card) |
| 40 | 获取虚拟卡片(getting_virtual_card) |
| 41 | 卡片丢失或被盗(lost_or_stolen_card) |
| 42 | 手机丢失或被盗(lost_or_stolen_phone) |
| 43 | 申请实体卡片(order_physical_card) |
| 44 | 忘记密码(passcode_forgotten) |
| 45 | 信用卡支付待处理(pending_card_payment) |
| 46 | 取现待处理(pending_cash_withdrawal) |
| 47 | 充值待处理(pending_top_up) |
| 48 | 转账待处理(pending_transfer) |
| 49 | PIN码锁定(pin_blocked) |
| 50 | 收款到账(receiving_money) |
| 51 | 退款未显示(Refund_not_showing_up) |
| 52 | 申请退款(request_refund) |
| 53 | 信用卡支付被撤销(reverted_card_payment?) |
| 54 | 支持的卡片与货币(supported_cards_and_currencies) |
| 55 | 注销账户(terminate_account) |
| 56 | 通过银行转账充值收取手续费(top_up_by_bank_transfer_charge) |
| 57 | 通过信用卡充值收取手续费(top_up_by_card_charge) |
| 58 | 通过现金或支票充值(top_up_by_cash_or_cheque) |
| 59 | 充值失败(top_up_failed) |
| 60 | 充值限额(top_up_limits) |
| 61 | 充值被撤销(top_up_reverted) |
| 62 | 通过信用卡充值(topping_up_by_card) |
| 63 | 交易被重复扣费(transaction_charged_twice) |
| 64 | 转账收取手续费(transfer_fee_charged) |
| 65 | 转入账户(transfer_into_account) |
| 66 | 收款人未收到转账(transfer_not_received_by_recipient) |
| 67 | 转账时效问题(transfer_timing) |
| 68 | 无法验证身份(unable_to_verify_identity) |
| 69 | 申请身份验证(verify_my_identity) |
| 70 | 验证资金来源(verify_source_of_funds) |
| 71 | 验证充值记录(verify_top_up) |
| 72 | 虚拟卡片无法使用(virtual_card_not_working) |
| 73 | Visa或Mastercard(visa_or_mastercard) |
| 74 | 为何需要验证身份(why_verify_identity) |
| 75 | 收到的现金金额错误(wrong_amount_of_cash_received) |
| 76 | 取现汇率错误(wrong_exchange_rate_for_cash_withdrawal) |
### 数据划分
| 数据集统计指标 | 训练集 | 测试集 |
| --- | --- | --- |
| 样本总数 | 10003 | 3080 |
| 平均字符长度 | 59.5 | 54.2 |
| 意图总数 | 77 | 77 |
| 领域总数 | 1 | 1 |
## 数据集构建
### 构建初衷
此前的意图检测数据集,如Web Apps、Ask Ubuntu、Chatbot Corpus或SNIPS,仅支持少量类别(少于10类),这会简化意图检测任务,无法模拟商用系统的真实应用场景。尽管存在大规模多领域数据集(如HWU64和CLINC150),但每个领域的样本可能不足以充分捕捉“真实场景”下该领域的全部复杂性。本数据集旨在填补这一空白,提供单领域(即**银行领域**)下粒度极细的意图集合。其专注于细粒度单领域意图检测的特性,使其与前述两个多领域数据集形成互补。
### 源数据
#### 初始数据收集与标准化
[需补充更多信息]
#### 源文本的生产者是谁?
[需补充更多信息]
### 注释
#### 注释流程
本数据集无额外注释。
#### 注释者是谁?
[无]
### 个人与敏感信息
[无]
## 数据集使用注意事项
### 数据集的社会影响
本数据集的研发目的是助力开发更优质的意图检测系统。任何全面的意图检测评估,都应同时包含粗粒度多领域数据集与细粒度单领域数据集(如BANKING77)。
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[PolyAI](https://github.com/PolyAI-LDN)
### 许可信息
知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International)
### 引用信息
@inproceedings{Casanueva2020,
author = {Iñigo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic},
title = {Efficient Intent Detection with Dual Sentence Encoders},
year = {2020},
month = {mar},
note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets},
url = {https://arxiv.org/abs/2003.04807},
booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020}
}
### 贡献致谢
感谢[@dkajtoch](https://github.com/dkajtoch)贡献本数据集。
提供机构:
PolyAI
原始信息汇总
数据集概述
数据集名称
- BANKING77
数据集描述
- 数据集摘要:BANKING77数据集包含13,083条在线银行查询,每条查询都标注了对应的意图。该数据集专注于单一银行领域的细粒度意图检测,共包含77种不同的意图。
- 支持的任务:意图分类、意图检测。
- 语言:英语。
数据集结构
- 数据实例:每个数据实例包含一个文本字段和一个标签字段。文本字段包含用户的查询,标签字段是一个整数,对应于特定的意图。
- 数据字段:
text:字符串类型,用户的查询文本。label:整数类型,从0到76,对应77种不同的意图。
- 数据分割:数据集分为训练集和测试集,训练集包含10,003个实例,测试集包含3,080个实例。
数据集创建
- 数据集来源:数据集为原创数据,未使用其他数据集作为来源。
- 注释过程:数据集的注释由专家生成,未提供具体的注释者信息。
- 个人和敏感信息:数据集中未包含个人和敏感信息。
使用数据集的考虑
- 社会影响:该数据集旨在帮助开发更有效的意图检测系统,对于全面评估意图检测系统,需要结合粗粒度的多领域数据集和细粒度的单领域数据集。
- 偏见讨论:未提供具体信息。
- 其他已知限制:未提供具体信息。
附加信息
-
数据集许可证:数据集遵循Creative Commons Attribution 4.0 International许可证。
-
引用信息:
@inproceedings{Casanueva2020, author = {I{~{n}}igo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic}, title = {Efficient Intent Detection with Dual Sentence Encoders}, year = {2020}, month = {mar}, note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets}, url = {https://arxiv.org/abs/2003.04807}, booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020} }
搜集汇总
数据集介绍

构建方式
BANKING77数据集的构建,源于对单一领域——银行业务中客户服务查询的深入分析。该数据集的构建理念是为了解决现有意图检测数据集在类别数量和领域深度上的不足,通过收集和标注13,083条客户服务查询语句,并细分为77种不同的意图,旨在提供一种更加精细化的单领域意图检测资源。
特点
该数据集的特点在于其精细化的意图分类,专注于银行业务领域,提供了远超传统数据集的类别数量,使得研究者和开发者能够对意图检测系统进行更为深入和准确的训练与评估。此外,BANKING77遵循Creative Commons Attribution 4.0 International许可,保证了数据的开放性和可访问性。
使用方法
使用BANKING77数据集,用户可以将其分为训练集和测试集,分别包含10,003和3,080个样本。数据集以JSON格式存储,每个样本包括文本和对应的意图标签。用户可以利用该数据集进行意图分类模型的训练,以及评估模型的性能。此外,数据集的开放许可也允许用户在遵守协议的前提下,自由地共享和修改数据。
背景与挑战
背景概述
BANKING77数据集是由PolyAI团队于2020年创建的,旨在为单领域意图检测任务提供精细粒度的标注数据。该数据集聚焦于银行业务领域,包含13083条客户服务查询记录,被标注为77种不同的意图类别,弥补了市场上大规模多领域数据集在单一领域深度上的不足。BANKING77数据集的构建旨在推动意图识别系统的发展,对自然语言处理领域,特别是在对话系统和意图识别研究中具有重要的影响力。
当前挑战
BANKING77数据集在构建过程中遇到的挑战主要包括:1) 如何在单一领域中捕捉并标注出足够细粒度的意图类别;2) 数据的收集与标注过程中确保信息准确性与一致性;3) 处理领域内可能存在的偏见问题。在应用层面,数据集面临的挑战包括:如何利用这些精细的意图类别来提升意图检测系统的准确度和实用性,以及如何平衡模型复杂度与性能表现。
常用场景
经典使用场景
在自然语言处理领域,BANKING77数据集因其细致的意图分类而备受推崇。该数据集包含77种不同的银行服务查询意图,是研究者在意图识别任务上的常用资源。经典的使用场景包括构建和评估银行客户服务聊天机器人,通过对用户查询进行精确的意图分类,从而提供更为个性化的服务响应。
实际应用
实际应用中,BANKING77数据集被广泛用于开发智能客户服务系统,如银行自动问答系统和聊天机器人。这些系统能够通过理解用户的查询意图,提供即时的帮助和解决方案,从而提升客户体验并降低人工客服的成本。
衍生相关工作
基于BANKING77数据集,衍生出了一系列相关研究工作,包括但不限于改进的意图识别模型、跨领域意图识别技术的探索以及结合上下文的对话系统研究。这些研究进一步推动了银行服务自动化技术的发展,并扩展了NLP技术在金融服务领域的应用范围。
以上内容由遇见数据集搜集并总结生成



