Multi-Turn-Insurance-Underwriting
收藏魔搭社区2025-12-04 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Multi-Turn-Insurance-Underwriting
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Multi-Turn-Insurance-Underwriting
## Dataset Summary
This dataset includes sample traces and associated metadata from multi-turn interactions between a commercial underwriter and AI assistant. We built the system in [langgraph](https://langchain-ai.github.io/langgraph/) with [model context protocol](https://modelcontextprotocol.io) and ReAct agents. In each sample, the underwriter has a specific task to solve related to a recent application for insurance by a small business. We created a diverse sample dataset covering 6 distinct types of tasks, several of which have subtasks involving more nuanced, complex underwriting logic. Our tasks require an average of 3-7 steps of reasoning and tool use, with a total of 10-20 conversational turns.
Note: This is an expert-verified sample from a much larger set that includes 1500-1600 conversations per model. Our most up-to-date accuracy results from this larger set can be found on [our leaderboard page](https://leaderboard.snorkel.ai/category/SnorkelUnderwrite).
- **Curated by:** Snorkel AI
- **License:** Apache-2.0 License
Here's an example multi-turn conversation in this dataset:
<img src="conversation_sample.png" style="width:500px;height:1200px;">
## Dataset Creation
<img src="dataset_diagram.png" style="width:400px;height:400px;">
This diagram shows the architecture of how we create the dataset, with assistant responses interleaved with questions, ending with a final answer to the task.
- The input is one of 6 distinct types of tasks represented by a seed question.
- The underwriter then asks a detailed question related to that task.
- The assistant co-pilot then reasons about the question, asking follow up questions when necessary and calling tools to get extra information. The assistant is able to access tools like an appetite matrix, a sql database, and underwriting guidelines, all hooked up through model context protocol.
- After multiple steps of reasoning and interaction with a user, the co-pilot produces a final solution for the task, along with a rationale.
### Curation Rationale
This dataset aims to support evaluation of large language models on multi-turn tasks, which require proper reasoning, tool calling, and interaction with a user.
The scenario is set up to have verifiable answers, while being complex enough to be challenging for state-of-the-art LLMs.
### Source Data Collection and Processing
Each company and task was validated by a Chartered Property Casualty Underwriter (CPCU). Datapoints also contained reference answers and whether the final task answer from the AI model is correct.
## Quickstart
To load the dataset, use the following code:
```python
from datasets import load_dataset
ds = load_dataset("snorkelai/Multi-Turn-Insurance-Underwriting")
```
## Dataset Task Types
These are the 6 task types in our dataset, along with their respective frequencies in the dataset.
| Task Type | Frequency |
| -------------------------- | --------- |
| `Appetite Check` | 92 |
| `Product Recommendations` | 89 |
| `Policy Limits` | 84 |
| `Small Business Eligibility Check` | 71 |
| `Deductibles` | 34 |
| `Business Classification` | 10 |
Here is more information about each of the task types:
1. **`Appetite Check`**
Determine whether the company is within underwriting appetite for a specific LOB.
_Example prompt: “Hi, I need help determining whether a company applying for insurance for a specific LOB is in appetite.”_
2. **`Product Recommendations`**
Suggest other relevant insurance products or lines of business for a company.
_Example prompt: “Hi, I'd like to figure out what other insurance products, or LOBs, I should offer this company.”_
3. **`Policy Limits`**
Recommend suitable policy limits for a company and specific line of business.
_Example prompt: “What policy limits should I offer this company for this LOB?”_
4. **`Small Business Eligibility Check`**
Determine if the company qualifies as a small business.
_Example prompt: “Hi, I need help determining whether a company qualifies as a 'small business'.”_
5. **`Deductibles`**
Suggest an appropriate policy deductible for a given company and line of business.
_Example prompt: “What policy deductible should I offer this company for this LOB?”_
6. **`Business Classification`**
Find the correct 6-digit NAICS code based on a company’s business description.
_Example prompt: “Hi, I need help finding the correct 6-digit NAICS classification code for this company.”_
## Dataset Structure
This is a random, expert-verified sample of conversations from some of the top performing models from our leaderboard:
- o4-mini
- o3
- Claude 4 Opus
- Claude 4 Sonnet
- Gemini 2.5 Pro
Here's a description of each field in the dataset along with their data types and descriptions:
- `company task id (int)`: Unique identifier of company/task combination that can be leveraged to do direct comparison of different models on the same core task.
- `assistant model name (string)`: The AI assistant model used to generate responses.
- `task (string)`: The core task the underwriter has to solve for (one of the six tasks described above).
- `company name (string)`: Name of the company.
- `company description (string)`: Text description of the company, containing details about the business operations, industry, and other relevant information.
- `building construction (string)`: Information about the physical structure or construction type of the company's building.
- `state (string)`: The U.S. state where the company is located.
- `lob (string)`: Line of Business - the category or type of insurance coverage being considered.
- `annual revenue (int)`: The company's yearly revenue in USD.
- `number of employees (int)`: The total count of employees working at the company.
- `number of vehicles (int)`: The number of vehicles operated and owed by the company.
- `total payroll (int)`: The company's yearly annual payroll in USD.
- `trace (list of dictionaries in json format)`: Contains the full trace including an initial question, many steps of reasoning interleaved with questions/answers between the assistant and underwriter, and a final answer to the question.
- Each step in the trace will contain (all values serialized):
- `id`: A unique id for that step
- `role`: The role for that step - user or assistant
- `type`: Type of role (for distinguishing types of assistant responses) - user, user-facing assistant, internal assistant or tool
- `content`: The text content of that step
- `response_metadata`: Optional metadata about that step
- 'usage_metadata': Optional metadata about token consumption for that step
- 'tool_calls': Optional data about tool use
- 'additional_kwargs': Optional data about additional keyword arugments
- `reference_answer (string)`: The ground truth or expected answer for the insurance-related query.
- `correct (boolean)`: Boolean value (True/False) indicating whether the AI assistant's response matched the reference answer correctly. This is obtained by running a LLM-as-judge over the generated final answer and the reference answer.
A note on types: 'internal assistant' reflects content from the ReAct agent not intended for the user, such as tool calls, while the 'tool' type reflects results returned by the tool call. In conversation view on Hugging Face, assistant responses are not disguished so some of the internal dialogue and tool results will be shown as part of the conversation.
#### Personal and Sensitive Information
The dataset was developed based on synthetic and public company and underwriting information so there is no sensitive data risk.
## Bias, Risks, and Limitations
This dataset was constructed to be realistic and was co-created with experts (e.g., CPCUs) who validated that the data is indeed representative of real-world scenarios. However, the dataset was not collected from real-world logs so it may lack the natural variability, ambiguity, or errors found in real underwriter-agent conversations.
Insurance is also a highly regulated field, and underwriting rules can vary by state, country, or regulatory updates and industry trends (e.g., climate risk, cyber insurance). The dataset may become stale or non-compliant over time if not updated regularly.
## Citation
If you find this dataset helpful, please cite us:
```
@misc{snorkelai2025multiturninsurance,
author = {Snorkel AI},
title = {Multi-Turn Insurance Underwriting},
year = {2025},
howpublished = {\url{https://huggingface.co/datasets/snorkelai/Multi-Turn-Insurance-Underwriting}},
}
```
# 多轮保险核保数据集卡片
## 数据集概述
本数据集收录了商业保险核保员与AI智能体之间多轮交互的样本轨迹及关联元数据。我们基于LangGraph(langgraph)、模型上下文协议(Model Context Protocol)以及ReAct AI智能体构建了该系统。在每个样本中,核保员需完成与某小型企业近期提交的保险投保申请相关的特定任务。我们构建了涵盖6类不同任务的多样化样本数据集,其中部分任务包含涉及更精细、复杂核保逻辑的子任务。本数据集的任务平均需要3-7轮推理与工具调用,交互总轮次为10-20轮。
注:本样本为更大规模数据集的专家验证子集,每个模型对应1500-1600条对话。该大规模数据集的最新准确率结果可参见[我们的排行榜页面](https://leaderboard.snorkel.ai/category/SnorkelUnderwrite)。
- **数据整理方:** Snorkel AI
- **许可证:** Apache-2.0许可证
本数据集的多轮对话示例如下:
<img src="conversation_sample.png" style="width:500px;height:1200px;">
## 数据集构建
<img src="dataset_diagram.png" style="width:400px;height:400px;">
本图示展示了本数据集的构建架构,其中AI智能体的回复与提问交替出现,最终生成任务的最终答案。
- 输入为6类任务之一,以初始提问作为任务标识。
- 随后核保员会提出与该任务相关的详细问题。
- AI协作助手会针对该问题进行推理,必要时提出跟进问题并调用工具获取额外信息。该助手可通过模型上下文协议接入承保意愿矩阵、SQL数据库及核保指南等工具。
- 经过多轮推理与用户交互后,协作助手将生成该任务的最终解决方案及推理依据。
### 数据整理依据
本数据集旨在支持针对多轮任务的大语言模型(Large Language Model,LLM)评估,这类任务需要合理的推理、工具调用以及与用户的交互。该场景设置了可验证的答案,同时复杂度足以对当前最先进的大语言模型构成挑战。
### 源数据采集与处理
每家企业及对应任务均由特许财产意外险核保员(Chartered Property Casualty Underwriter,CPCU)进行验证。数据点还包含参考答案,以及AI模型生成的最终任务答案是否正确的标记。
## 快速入门
加载本数据集的代码示例如下:
python
from datasets import load_dataset
ds = load_dataset("snorkelai/Multi-Turn-Insurance-Underwriting")
## 数据集任务类型
本数据集包含6类任务,下表列出了各类任务的出现频次。
| 任务类型 | 出现频次 |
| -------------------------- | --------- |
| `承保意愿核查` | 92 |
| `保险产品推荐` | 89 |
| `保单限额设定` | 84 |
| `小型企业资质核查` | 71 |
| `免赔额设定` | 34 |
| `企业业务分类` | 10 |
各类任务的详细说明如下:
1. **`承保意愿核查`**
判定某企业是否符合特定保险业务线(Line of Business,LOB)的承保意愿要求。
_示例提问:“您好,我需要确认某申请特定保险业务线的企业是否符合承保意愿要求。”_
2. **`保险产品推荐`**
为企业推荐其他相关的保险产品或保险业务线。
_示例提问:“您好,我想了解应为该企业提供哪些其他保险产品或保险业务线。”_
3. **`保单限额设定`**
为企业及特定保险业务线推荐合适的保单限额。
_示例提问:“针对该企业的此保险业务线,我应为其设定多少保单限额?”_
4. **`小型企业资质核查`**
判定某企业是否具备小型企业资质。
_示例提问:“您好,我需要确认某企业是否符合‘小型企业’的认定标准。”_
5. **`免赔额设定`**
为特定企业及保险业务线推荐合适的保单免赔额。
_示例提问:“针对该企业的此保险业务线,我应为其设定多少保单免赔额?”_
6. **`企业业务分类`**
根据企业的业务描述,匹配正确的北美行业分类系统(North American Industry Classification System,NAICS)6位代码。
_示例提问:“您好,我需要为某企业匹配正确的6位NAICS行业分类代码。”_
## 数据集结构
本数据集为从我们排行榜上的部分顶尖模型生成的对话中随机选取的专家验证样本,涉及的模型包括:
- o4-mini
- o3
- Claude 4 Opus
- Claude 4 Sonnet
- Gemini 2.5 Pro
下表为数据集中各字段的类型及说明:
- `企业任务ID(整数型)`:企业与任务组合的唯一标识符,可用于直接对比不同模型在同一核心任务上的表现。
- `助手模型名称(字符串型)`:用于生成回复的AI智能体模型。
- `任务类型(字符串型)`:核保员需完成的核心任务(即上述6类任务之一)。
- `企业名称(字符串型)`:投保企业的名称。
- `企业描述(字符串型)`:企业的文本描述,包含业务运营、所属行业及其他相关信息。
- `建筑结构(字符串型)`:企业经营场所的物理结构或建筑类型信息。
- `所在州(字符串型)`:企业所在的美国州份。
- `保险业务线(字符串型)`:待审核的保险保障类别或类型。
- `年营收(整数型)`:企业的年度营收(单位:美元)。
- `员工总数(整数型)`:企业的在职员工总人数。
- `车辆数量(整数型)`:企业运营及拥有的车辆总数。
- `年度薪酬总额(整数型)`:企业的年度薪酬总支出(单位:美元)。
- `交互轨迹(JSON格式字典列表)`:包含完整的交互轨迹,涵盖初始提问、多轮AI智能体与核保员之间的问答推理步骤,以及最终的问题答案。
- 每个交互步骤包含以下字段(所有值均已序列化):
- `id`:该交互步骤的唯一标识符
- `role`:该步骤的角色,可选值为`user`(核保员)或`assistant`(AI智能体)
- `type`:角色类型(用于区分AI智能体回复的类别),可选值为`user`、`面向用户的助手`、`内部助手`或`tool`(工具)
- `content`:该步骤的文本内容
- `response_metadata`:该步骤的可选元数据
- `usage_metadata`:该步骤的Token使用量元数据(可选)
- `tool_calls`:关于工具调用的可选数据
- `additional_kwargs`:关于额外关键字参数的可选数据
- `参考答案(字符串型)`:保险相关查询的标准答案或预期答案。
- `正确性标记(布尔型)`:布尔值(True/False),用于标记AI智能体的回复是否与参考答案一致,该标记通过大语言模型作为评审者对生成的最终答案与参考答案进行比对得到。
关于角色类型的说明:`内部助手`类型对应ReAct AI智能体的非面向用户内容,例如工具调用请求;而`tool`类型对应工具调用返回的结果。在Hugging Face的对话视图中,助手回复未做区分,因此部分内部对话及工具结果会作为对话的一部分展示。
### 个人与敏感信息说明
本数据集基于合成数据及公开的企业与核保信息构建,不存在敏感数据泄露风险。
## 偏倚、风险与局限性
本数据集旨在贴合真实场景,且与特许财产意外险核保员(CPCU)等专家共同构建,经专家验证数据能够真实反映现实核保场景。但本数据集并非从真实交互日志中采集,因此可能缺乏真实核保员与AI智能体对话中存在的自然变异性、歧义性或错误。
保险行业属于高度监管领域,核保规则会因州/国家、监管更新及行业趋势(如气候风险、网络保险)而发生变化。若未定期更新,本数据集可能会过时或不符合当前监管要求。
## 引用
若本数据集对您有所帮助,请引用如下文献:
@misc{snorkelai2025multiturninsurance,
author = {Snorkel AI},
title = {Multi-Turn Insurance Underwriting},
year = {2025},
howpublished = {url{https://huggingface.co/datasets/snorkelai/Multi-Turn-Insurance-Underwriting}},
}
提供机构:
maas
创建时间:
2025-06-04



