endishai/lexenvs-tasks
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/endishai/lexenvs-tasks
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- reinforcement-learning
- evaluation
- credit-cards
- grpo
- rl-environment
- reward-model
size_categories:
- n<1K
pretty_name: LexEnvs Credit Card Optimization Tasks
---
# LexEnvs — Credit Card Optimization Tasks
A dataset of 164 evaluation tasks for training and benchmarking RL agents on credit card optimization. Each task presents a user scenario with spending patterns, constraints, and preferences, and asks the agent to recommend optimal credit cards with expected value (EV) calculations.
## Dataset Description
This dataset is the task suite for the [LexEnvs Harbor RL Environment](https://github.com/endishai/lexenvs), a stateless evaluation server that scores agent responses on a multi-dimensional rubric.
### Task Structure
Each task is a JSON object containing:
- **prompt** — A user scenario with spending profile and constraints, plus references to a shared knowledge base and system prompt
- **scoring** — Weighted evaluation dimensions with automated and human-review components
- **reference_solution** — Ground truth card recommendations, EV breakdowns, and expert notes
### Difficulty Levels
| Difficulty | Count | Description |
|-----------|-------|-------------|
| Easy | ~30 | Single card recommendation, straightforward constraints |
| Medium | ~60 | Multi-card optimization, interacting constraints |
| Hard | ~74 | Complex portfolios, conflicting constraints, edge cases |
Tasks prefixed with `objective_` use fully automated scoring (no human review needed).
### Scoring Dimensions
Tasks are scored on weighted dimensions that vary by task type:
**Standard tasks** (`easy_*`, `medium_*`, `hard_*`):
- **EV Accuracy** (40%) — How close the agent's EV calculation is to the reference
- **Constraint Compliance** (30%) — Correct cards recommended, housing options matched
- **Reasoning Quality** (20%) — Quality of tradeoff analysis (human review)
- **Constraint Prioritization** (10%) — Handling of ambiguous/conflicting constraints (human review)
**Objective tasks** (`objective_*`):
- **EV Accuracy** (30%) — Computed against card database ground truth
- **Card Selection** (25%) — F1 score of recommended vs. optimal cards
- **Factual Fidelity** (30%) — Accuracy of claims about card features
- **Constraint Compliance** (15%) — Adherence to user constraints and issuer rules
## Included Files
- `tasks/` — 164 task definition JSON files
- `knowledge_base.md` — Shared knowledge base (~56K chars) covering credit card issuers, transfer partners, point valuations, and application rules
- `system_prompt_template.md` — Shared system prompt template referenced by all tasks
- `card_database.json` — Structured card data used for automated EV computation
- `card_prefix_to_issuer.json` — Card name prefix to issuer mapping
## Usage
### With the Datasets Library
```python
from datasets import load_dataset
dataset = load_dataset("endishai/lexenvs-tasks")
# Browse tasks
for task in dataset["train"]:
print(task["task_id"], task["metadata"]["difficulty"])
```
### With the Harbor Evaluation Server
The tasks are designed to be served by the LexEnvs Harbor server, which handles knowledge base injection, scoring, and reward computation:
```python
import httpx
# List available tasks
tasks = httpx.get("http://localhost:8000/api/tasks").json()
# Get a task prompt (includes system prompt + knowledge base)
task = httpx.get("http://localhost:8000/api/tasks/easy_01").json()
# Evaluate an agent's answer
result = httpx.post(
"http://localhost:8000/api/tasks/easy_01/evaluate",
json={"answer": agent_response},
).json()
print(result["reward"]) # float in [0, 1]
```
## Citation
If you use this dataset in your research, please cite:
```bibtex
@misc{lexenvs2026,
title={LexEnvs: A Harbor RL Environment for Credit Card Optimization},
author={Imberman, Daniel and Book, Kenny and Loeber, John},
year={2026},
url={https://github.com/endishai/lexenvs}
}
```
## License
Apache License 2.0 — see [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).
许可证:Apache-2.0
任务类别:
- 文本生成
- 问答
语言:
- 英语
标签:
- 强化学习(Reinforcement Learning, RL)
- 评估
- 信用卡
- GRPO
- RL环境
- 奖励模型
样本规模:
- n<1K
展示名称:LexEnvs 信用卡优化任务集
---
# LexEnvs — 信用卡优化任务集
本数据集包含164个评估任务,用于训练和基准测试强化学习(Reinforcement Learning, RL)智能体(AI Agent)以开展信用卡优化任务。每个任务均提供包含消费模式、约束条件与偏好设定的用户场景,要求AI智能体(AI Agent)推荐最优信用卡并给出期望价值(Expected Value, EV)计算结果。
## 数据集说明
本数据集为 [LexEnvs Harbor 强化学习环境(RL Environment)](https://github.com/endishai/lexenvs) 的任务套件,该环境是一个无状态评估服务器,可基于多维度评分标准对AI智能体(AI Agent)的回复进行打分。
### 任务结构
每个任务均为JSON对象,包含以下字段:
- **prompt** — 包含用户消费概况与约束条件的场景描述,以及对共享知识库与系统提示的引用
- **scoring** — 包含加权评估维度,涵盖自动化评审与人工复核环节
- **reference_solution** — 信用卡推荐的标准答案、期望价值拆解结果与专家注释
### 难度等级
| 难度等级 | 任务数量 | 描述 |
|---------|---------|-----|
| 简单 | ~30 | 单张信用卡推荐,约束条件直观明确 |
| 中等 | ~60 | 多张信用卡优化,约束条件存在交互影响 |
| 困难 | ~74 | 复杂信用卡组合、约束冲突与边缘场景 |
以`objective_`为前缀的任务采用全自动化评分(无需人工评审)。
### 评分维度
任务基于不同类型的加权维度进行评分:
**标准任务(`easy_*`、`medium_*`、`hard_*`):**
- **EV准确性(40%)** — AI智能体(AI Agent)的EV计算结果与参考值的贴合程度
- **约束合规性(30%)** — 推荐信用卡正确、住宿方案匹配
- **推理质量(20%)** — 权衡分析的质量(需人工评审)
- **约束优先级处理(10%)** — 模糊/冲突约束的处理能力(需人工评审)
**自动化任务(`objective_*`):**
- **EV准确性(30%)** — 基于信用卡数据库真实值进行计算
- **卡片选择(25%)** — 推荐卡片与最优卡片的F1分数
- **事实保真度(30%)** — 信用卡功能相关表述的准确性
- **约束合规性(15%)** — 遵守用户约束与发卡机构规则
## 包含文件
- `tasks/` — 164个任务定义JSON文件
- `knowledge_base.md` — 共享知识库(约56,000字符),涵盖信用卡发卡机构、转账合作伙伴、积分估值与申请规则
- `system_prompt_template.md` — 所有任务均引用的共享系统提示模板
- `card_database.json` — 用于自动化EV计算的结构化信用卡数据
- `card_prefix_to_issuer.json` — 信用卡名称前缀与发卡机构的映射表
## 使用方法
### 使用数据集库
python
from datasets import load_dataset
dataset = load_dataset("endishai/lexenvs-tasks")
# 浏览任务
for task in dataset["train"]:
print(task["task_id"], task["metadata"]["difficulty"])
### 使用Harbor评估服务器
本数据集专为LexEnvs Harbor服务器设计,该服务器可处理知识库注入、评分与奖励计算:
python
import httpx
# 列出可用任务
tasks = httpx.get("http://localhost:8000/api/tasks").json()
# 获取任务提示(包含系统提示与知识库)
task = httpx.get("http://localhost:8000/api/tasks/easy_01").json()
# 评估AI智能体(AI Agent)的回复
result = httpx.post(
"http://localhost:8000/api/tasks/easy_01/evaluate",
json={"answer": agent_response},
).json()
print(result["reward"]) # 取值范围为[0, 1]的浮点数
## 引用
如果您在研究中使用本数据集,请引用:
bibtex
@misc{lexenvs2026,
title={LexEnvs: A Harbor RL Environment for Credit Card Optimization},
author={Imberman, Daniel and Book, Kenny and Loeber, John},
year={2026},
url={https://github.com/endishai/lexenvs}
}
## 许可证
Apache License 2.0 — 详见 [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).
提供机构:
endishai



