five

xwjzds/extractive_qa_question_answering_hr

收藏
Hugging Face2024-03-22 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/xwjzds/extractive_qa_question_answering_hr
下载链接
链接失效反馈
官方服务:
资源简介:
HR-Multiwoz是一个包含5980个抽取式问答对的完全标注数据集,涵盖了10个人力资源(HR)领域,旨在评估大型语言模型(LLM)代理。该数据集是HR领域中首个公开的、用于NLP研究的对话数据集。数据集的结构包括问题、答案和答案上下文三个字段。数据集的设计目的是评估抽取式问答算法的迁移学习能力,但不适用于训练。数据集为英文,且包含合成问题。

HR-Multiwoz是一个包含5980个抽取式问答对的完全标注数据集,涵盖了10个人力资源(HR)领域,旨在评估大型语言模型(LLM)代理。该数据集是HR领域中首个公开的、用于NLP研究的对话数据集。数据集的结构包括问题、答案和答案上下文三个字段。数据集的设计目的是评估抽取式问答算法的迁移学习能力,但不适用于训练。数据集为英文,且包含合成问题。
提供机构:
xwjzds
原始信息汇总

数据集概述

HR-MultiwoZ是一个包含5980个抽取式问答的标注数据集,涵盖10个HR领域,用于评估LLM Agent。这是首个针对NLP研究的开源标注对话数据集。

数据集详情

  • 语言: 英语
  • 许可: MIT

数据集来源

数据集结构

  • 数据实例: 每个数据条目包含answer_context, question, answer
  • 数据字段:
    • question: 字符串,表示问题
    • answer: 字符串,表示答案
    • answer_context: 字符串,包含答案的上下文

数据集用途

  • 直接使用: 设计用于评估抽取式QA算法的迁移学习能力。
  • 超出范围的使用: 不适用于训练。

数据集创建

  • 数据生产者: 非Amazon
  • 个人和敏感信息: 无

偏差、风险和局限性

  • 该数据集仅包含英语数据,且问题为合成产生。

引用信息

@inproceedings{xu-etal-2024-hr, title = "{HR}-{M}ulti{WOZ}: A Task Oriented Dialogue ({TOD}) Dataset for {HR} {LLM} Agent", author = "Xu, Weijie and Huang, Zicheng and Hu, Wenxiang and Fang, Xi and Cherukuri, Rajesh and Nayyar, Naumaan and Malandri, Lorenzo and Sengamedu, Srinivasan", editor = "Hruschka, Estevam and Lake, Thom and Otani, Naoki and Mitchell, Tom", booktitle = "Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)", month = mar, year = "2024", address = "St. Julian{}s, Malta", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.nlp4hr-1.5", pages = "59--72", abstract = "Recent advancements in Large Language Models (LLMs) have been reshaping Natural Language Processing (NLP) task in several domains. Their use in the field of Human Resources (HR) has still room for expansions and could be beneficial for several time consuming tasks. Examples such as time-off submissions, medical claims filing, and access requests are noteworthy, but they are by no means the sole instances. However the aforementioned developments must grapple with the pivotal challenge of constructing a high-quality training dataset. On one hand, most conversation datasets are solving problems for customers not employees. On the other hand, gathering conversations with HR could raise privacy concerns. To solve it, we introduce HR-Multiwoz, a fully-labeled dataset of 550 conversations spanning 10 HR domains. Our work has the following contributions:(1) It is the first labeled open-sourced conversation dataset in the HR domain for NLP research. (2) It provides a detailed recipe for the data generation procedure along with data analysis and human evaluations. The data generation pipeline is transferrable and can be easily adapted for labeled conversation data generation in other domains. (3) The proposed data-collection pipeline is mostly based on LLMs with minimal human involvement for annotation, which is time and cost-efficient.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作