five

MinalMahala/banking77

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MinalMahala/banking77
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated extended: - original language_creators: - expert-generated language: - en license: - cc-by-4.0 multilinguality: - monolingual size_categories: - 10K<n<100K source_datasets: - original task_categories: - text-classification task_ids: - intent-classification - multi-class-classification paperswithcode_id: null pretty_name: BANKING77 --- # Dataset Card for BANKING77 ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets) - **Repository:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets) - **Paper:** [ArXiv](https://arxiv.org/abs/2003.04807) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary Dataset composed of online banking queries annotated with their corresponding intents. BANKING77 dataset provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection. ### Supported Tasks and Leaderboards Intent classification, intent detection ### Languages English ## Dataset Structure ### Data Instances An example of 'train' looks as follows: ``` { 'label': 11, # integer label corresponding to "card_arrival" intent 'text': 'I am still waiting on my card?' } ``` ### Data Fields - `text`: a string feature. - `label`: One of classification labels (0-76) corresponding to unique intents. Intent names are mapped to `label` in the following way: | label | intent (category) | |---:|:-------------------------------------------------| | 0 | activate_my_card | | 1 | age_limit | | 2 | apple_pay_or_google_pay | | 3 | atm_support | | 4 | automatic_top_up | | 5 | balance_not_updated_after_bank_transfer | | 6 | balance_not_updated_after_cheque_or_cash_deposit | | 7 | beneficiary_not_allowed | | 8 | cancel_transfer | | 9 | card_about_to_expire | | 10 | card_acceptance | | 11 | card_arrival | | 12 | card_delivery_estimate | | 13 | card_linking | | 14 | card_not_working | | 15 | card_payment_fee_charged | | 16 | card_payment_not_recognised | | 17 | card_payment_wrong_exchange_rate | | 18 | card_swallowed | | 19 | cash_withdrawal_charge | | 20 | cash_withdrawal_not_recognised | | 21 | change_pin | | 22 | compromised_card | | 23 | contactless_not_working | | 24 | country_support | | 25 | declined_card_payment | | 26 | declined_cash_withdrawal | | 27 | declined_transfer | | 28 | direct_debit_payment_not_recognised | | 29 | disposable_card_limits | | 30 | edit_personal_details | | 31 | exchange_charge | | 32 | exchange_rate | | 33 | exchange_via_app | | 34 | extra_charge_on_statement | | 35 | failed_transfer | | 36 | fiat_currency_support | | 37 | get_disposable_virtual_card | | 38 | get_physical_card | | 39 | getting_spare_card | | 40 | getting_virtual_card | | 41 | lost_or_stolen_card | | 42 | lost_or_stolen_phone | | 43 | order_physical_card | | 44 | passcode_forgotten | | 45 | pending_card_payment | | 46 | pending_cash_withdrawal | | 47 | pending_top_up | | 48 | pending_transfer | | 49 | pin_blocked | | 50 | receiving_money | | 51 | Refund_not_showing_up | | 52 | request_refund | | 53 | reverted_card_payment? | | 54 | supported_cards_and_currencies | | 55 | terminate_account | | 56 | top_up_by_bank_transfer_charge | | 57 | top_up_by_card_charge | | 58 | top_up_by_cash_or_cheque | | 59 | top_up_failed | | 60 | top_up_limits | | 61 | top_up_reverted | | 62 | topping_up_by_card | | 63 | transaction_charged_twice | | 64 | transfer_fee_charged | | 65 | transfer_into_account | | 66 | transfer_not_received_by_recipient | | 67 | transfer_timing | | 68 | unable_to_verify_identity | | 69 | verify_my_identity | | 70 | verify_source_of_funds | | 71 | verify_top_up | | 72 | virtual_card_not_working | | 73 | visa_or_mastercard | | 74 | why_verify_identity | | 75 | wrong_amount_of_cash_received | | 76 | wrong_exchange_rate_for_cash_withdrawal | ### Data Splits | Dataset statistics | Train | Test | | --- | --- | --- | | Number of examples | 10 003 | 3 080 | | Average character length | 59.5 | 54.2 | | Number of intents | 77 | 77 | | Number of domains | 1 | 1 | ## Dataset Creation ### Curation Rationale Previous intent detection datasets such as Web Apps, Ask Ubuntu, the Chatbot Corpus or SNIPS are limited to small number of classes (<10), which oversimplifies the intent detection task and does not emulate the true environment of commercial systems. Although there exist large scale *multi-domain* datasets ([HWU64](https://github.com/xliuhw/NLU-Evaluation-Data) and [CLINC150](https://github.com/clinc/oos-eval)), the examples per each domain may not sufficiently capture the full complexity of each domain as encountered "in the wild". This dataset tries to fill the gap and provides a very fine-grained set of intents in a *single-domain* i.e. **banking**. Its focus on fine-grained single-domain intent detection makes it complementary to the other two multi-domain datasets. ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process The dataset does not contain any additional annotations. #### Who are the annotators? [N/A] ### Personal and Sensitive Information [N/A] ## Considerations for Using the Data ### Social Impact of Dataset The purpose of this dataset it to help develop better intent detection systems. Any comprehensive intent detection evaluation should involve both coarser-grained multi-domain datasets and a fine-grained single-domain dataset such as BANKING77. ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [PolyAI](https://github.com/PolyAI-LDN) ### Licensing Information Creative Commons Attribution 4.0 International ### Citation Information ``` @inproceedings{Casanueva2020, author = {I{\~{n}}igo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic}, title = {Efficient Intent Detection with Dual Sentence Encoders}, year = {2020}, month = {mar}, note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets}, url = {https://arxiv.org/abs/2003.04807}, booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020} } ``` ### Contributions Thanks to [@dkajtoch](https://github.com/dkajtoch) for adding this dataset.

--- annotations_creators: - 专家生成(expert-generated) extended: - 原始(original) language_creators: - 专家生成(expert-generated) language: - 英语(en) license: - CC-BY-4.0 multilinguality: - 单语言(monolingual) size_categories: - 10K<n<100K source_datasets: - 原始(original) task_categories: - 文本分类(text-classification) task_ids: - 意图分类(intent-classification) - 多类别分类(multi-class-classification) paperswithcode_id: 无 pretty_name: BANKING77 --- # BANKING77数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持的任务与排行榜](#supported-tasks-and-leaderboards) - [语言说明](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [注释说明](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可证信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集概述 - **主页:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets) - **代码仓库:** [Github](https://github.com/PolyAI-LDN/task-specific-datasets) - **相关论文:** [ArXiv](https://arxiv.org/abs/2003.04807) - **排行榜:** - **联系人:** ### 数据集摘要 本数据集由标注了对应意图(intent)的在线银行客服咨询文本组成。 BANKING77数据集针对银行领域提供了一套细粒度的意图分类体系。 该数据集共包含13083条客服咨询文本,对应77种意图标签。 其核心聚焦于细粒度单领域意图检测(intent detection)任务。 ### 支持的任务与排行榜 意图分类(intent classification)、意图检测(intent detection) ### 语言说明 英语(English) ## 数据集结构 ### 数据实例 训练集的一个样本示例如下: { 'label': 11, # 与“card_arrival(卡片送达)”意图对应的整数标签 'text': '我还在等我的卡呢?' } ### 数据字段 - `text`: 字符串类型特征。 - `label`: 取值范围为0到76的分类标签,对应唯一的意图(intent)。 意图名称与标签的映射关系如下: | 标签 | 意图(类别) | |---:|:-------------------------------------------------| | 0 | 激活我的卡片(activate_my_card) | | 1 | 年龄限制(age_limit) | | 2 | Apple Pay或Google Pay(apple_pay_or_google_pay) | | 3 | ATM支持(atm_support) | | 4 | 自动充值(automatic_top_up) | | 5 | 银行转账后余额未更新(balance_not_updated_after_bank_transfer) | | 6 | 支票或现金存款后余额未更新(balance_not_updated_after_cheque_or_cash_deposit) | | 7 | 不允许受益人(beneficiary_not_allowed) | | 8 | 取消转账(cancel_transfer) | | 9 | 卡片即将过期(card_about_to_expire) | | 10 | 卡片受理异常(card_acceptance) | | 11 | 卡片送达(card_arrival) | | 12 | 卡片送达预估时间(card_delivery_estimate) | | 13 | 卡片绑定(card_linking) | | 14 | 卡片无法使用(card_not_working) | | 15 | 卡片支付已扣费(card_payment_fee_charged) | | 16 | 未识别的卡片支付记录(card_payment_not_recognised) | | 17 | 卡片支付汇率错误(card_payment_wrong_exchange_rate) | | 18 | ATM吞卡(card_swallowed) | | 19 | 取现手续费(cash_withdrawal_charge) | | 20 | 未识别的取现记录(cash_withdrawal_not_recognised) | | 21 | 修改PIN码(change_pin) | | 22 | 卡片疑似被盗用(compromised_card) | | 23 | 非接触支付无法使用(contactless_not_working) | | 24 | 地区支持情况(country_support) | | 25 | 卡片支付被拒绝(declined_card_payment) | | 26 | 取现被拒绝(declined_cash_withdrawal) | | 27 | 转账被拒绝(declined_transfer) | | 28 | 未识别的直接借记支付记录(direct_debit_payment_not_recognised) | | 29 | 虚拟借记卡额度(disposable_card_limits) | | 30 | 修改个人信息(edit_personal_details) | | 31 | 兑换手续费(exchange_charge) | | 32 | 汇率(exchange_rate) | | 33 | 通过应用进行兑换(exchange_via_app) | | 34 | 账单额外扣费(extra_charge_on_statement) | | 35 | 转账失败(failed_transfer) | | 36 | 法定货币支持(fiat_currency_support) | | 37 | 获取虚拟借记卡(get_disposable_virtual_card) | | 38 | 获取实体卡(get_physical_card) | | 39 | 申请备用卡(getting_spare_card) | | 40 | 获取虚拟卡(getting_virtual_card) | | 41 | 卡片丢失或被盗(lost_or_stolen_card) | | 42 | 手机丢失或被盗(lost_or_stolen_phone) | | 43 | 申请实体卡(order_physical_card) | | 44 | 忘记密码(passcode_forgotten) | | 45 | 待处理的卡片支付(pending_card_payment) | | 46 | 待处理的取现申请(pending_cash_withdrawal) | | 47 | 待处理的充值申请(pending_top_up) | | 48 | 待处理的转账申请(pending_transfer) | | 49 | PIN码锁定(pin_blocked) | | 50 | 收款(receiving_money) | | 51 | 退款未显示(Refund_not_showing_up) | | 52 | 申请退款(request_refund) | | 53 | 卡片支付被撤销(reverted_card_payment?) | | 54 | 支持的卡片与货币类型(supported_cards_and_currencies) | | 55 | 注销账户(terminate_account) | | 56 | 通过银行转账充值的手续费(top_up_by_bank_transfer_charge) | | 57 | 通过卡片充值的手续费(top_up_by_card_charge) | | 58 | 通过现金或支票充值(top_up_by_cash_or_cheque) | | 59 | 充值失败(top_up_failed) | | 60 | 充值额度限制(top_up_limits) | | 61 | 充值被撤销(top_up_reverted) | | 62 | 通过卡片充值(topping_up_by_card) | | 63 | 交易重复扣费(transaction_charged_twice) | | 64 | 转账手续费(transfer_fee_charged) | | 65 | 转入账户(transfer_into_account) | | 66 | 收款方未收到转账(transfer_not_received_by_recipient) | | 67 | 转账时效(transfer_timing) | | 68 | 无法验证身份(unable_to_verify_identity) | | 69 | 验证我的身份(verify_my_identity) | | 70 | 验证资金来源(verify_source_of_funds) | | 71 | 验证充值(verify_top_up) | | 72 | 虚拟卡无法使用(virtual_card_not_working) | | 73 | Visa或Mastercard(visa_or_mastercard) | | 74 | 为何需要验证身份(why_verify_identity) | | 75 | 收到的现金金额错误(wrong_amount_of_cash_received) | | 76 | 取现汇率错误(wrong_exchange_rate_for_cash_withdrawal) | ### 数据划分 | 数据集统计指标 | 训练集 | 测试集 | | --- | --- | --- | | 样本数量 | 10003 | 3080 | | 平均字符长度 | 59.5 | 54.2 | | 意图数量 | 77 | 77 | | 领域数量 | 1 | 1 | ## 数据集构建 ### 构建初衷 此前的意图检测数据集,例如Web Apps、Ask Ubuntu、聊天机器人语料库(Chatbot Corpus)或SNIPS,仅支持少量类别(少于10种),这会简化意图检测任务的难度,无法复现商业系统中的真实应用场景。尽管现有大规模多领域数据集(如[HWU64](https://github.com/xliuhw/NLU-Evaluation-Data)与[CLINC150](https://github.com/clinc/oos-eval)),但每个领域的样本可能不足以充分捕捉真实场景中该领域的全部复杂性。本数据集旨在填补这一空白,针对单一领域(即**银行领域**)提供一套细粒度的意图分类体系。其聚焦细粒度单领域意图检测的特性,使其与前述两类多领域数据集形成互补。 ### 源数据 #### 初始数据收集与标准化 「需补充更多信息」 #### 源文本创作者身份 「需补充更多信息」 ### 注释说明 #### 注释流程 本数据集无额外注释。 #### 注释者身份 「无(N/A)」 ### 个人与敏感信息 「无(N/A)」 ## 数据集使用注意事项 ### 数据集的社会影响 本数据集旨在助力开发更优秀的意图检测系统。任何全面的意图检测评估,都应同时包含粗粒度多领域数据集与BANKING77这类细粒度单领域数据集。 ### 偏差讨论 「需补充更多信息」 ### 其他已知局限 「需补充更多信息」 ## 附加信息 ### 数据集维护者 [PolyAI](https://github.com/PolyAI-LDN) ### 许可证信息 知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International) ### 引用信息 @inproceedings{Casanueva2020, author = {Iñigo Casanueva and Tadas Temcinas and Daniela Gerz and Matthew Henderson and Ivan Vulic}, title = {Efficient Intent Detection with Dual Sentence Encoders}, year = {2020}, month = {mar}, note = {Data available at https://github.com/PolyAI-LDN/task-specific-datasets}, url = {https://arxiv.org/abs/2003.04807}, booktitle = {Proceedings of the 2nd Workshop on NLP for ConvAI - ACL 2020} } ### 贡献致谢 感谢[@dkajtoch](https://github.com/dkajtoch)为本数据集的收录提供支持。
提供机构:
MinalMahala
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作