five

endishai/lexenvs-tasks

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/endishai/lexenvs-tasks
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - reinforcement-learning - evaluation - credit-cards - grpo - rl-environment - reward-model size_categories: - n<1K pretty_name: LexEnvs Credit Card Optimization Tasks --- # LexEnvs — Credit Card Optimization Tasks A dataset of 164 evaluation tasks for training and benchmarking RL agents on credit card optimization. Each task presents a user scenario with spending patterns, constraints, and preferences, and asks the agent to recommend optimal credit cards with expected value (EV) calculations. ## Dataset Description This dataset is the task suite for the [LexEnvs Harbor RL Environment](https://github.com/endishai/lexenvs), a stateless evaluation server that scores agent responses on a multi-dimensional rubric. ### Task Structure Each task is a JSON object containing: - **prompt** — A user scenario with spending profile and constraints, plus references to a shared knowledge base and system prompt - **scoring** — Weighted evaluation dimensions with automated and human-review components - **reference_solution** — Ground truth card recommendations, EV breakdowns, and expert notes ### Difficulty Levels | Difficulty | Count | Description | |-----------|-------|-------------| | Easy | ~30 | Single card recommendation, straightforward constraints | | Medium | ~60 | Multi-card optimization, interacting constraints | | Hard | ~74 | Complex portfolios, conflicting constraints, edge cases | Tasks prefixed with `objective_` use fully automated scoring (no human review needed). ### Scoring Dimensions Tasks are scored on weighted dimensions that vary by task type: **Standard tasks** (`easy_*`, `medium_*`, `hard_*`): - **EV Accuracy** (40%) — How close the agent's EV calculation is to the reference - **Constraint Compliance** (30%) — Correct cards recommended, housing options matched - **Reasoning Quality** (20%) — Quality of tradeoff analysis (human review) - **Constraint Prioritization** (10%) — Handling of ambiguous/conflicting constraints (human review) **Objective tasks** (`objective_*`): - **EV Accuracy** (30%) — Computed against card database ground truth - **Card Selection** (25%) — F1 score of recommended vs. optimal cards - **Factual Fidelity** (30%) — Accuracy of claims about card features - **Constraint Compliance** (15%) — Adherence to user constraints and issuer rules ## Included Files - `tasks/` — 164 task definition JSON files - `knowledge_base.md` — Shared knowledge base (~56K chars) covering credit card issuers, transfer partners, point valuations, and application rules - `system_prompt_template.md` — Shared system prompt template referenced by all tasks - `card_database.json` — Structured card data used for automated EV computation - `card_prefix_to_issuer.json` — Card name prefix to issuer mapping ## Usage ### With the Datasets Library ```python from datasets import load_dataset dataset = load_dataset("endishai/lexenvs-tasks") # Browse tasks for task in dataset["train"]: print(task["task_id"], task["metadata"]["difficulty"]) ``` ### With the Harbor Evaluation Server The tasks are designed to be served by the LexEnvs Harbor server, which handles knowledge base injection, scoring, and reward computation: ```python import httpx # List available tasks tasks = httpx.get("http://localhost:8000/api/tasks").json() # Get a task prompt (includes system prompt + knowledge base) task = httpx.get("http://localhost:8000/api/tasks/easy_01").json() # Evaluate an agent's answer result = httpx.post( "http://localhost:8000/api/tasks/easy_01/evaluate", json={"answer": agent_response}, ).json() print(result["reward"]) # float in [0, 1] ``` ## Citation If you use this dataset in your research, please cite: ```bibtex @misc{lexenvs2026, title={LexEnvs: A Harbor RL Environment for Credit Card Optimization}, author={Imberman, Daniel and Book, Kenny and Loeber, John}, year={2026}, url={https://github.com/endishai/lexenvs} } ``` ## License Apache License 2.0 — see [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).

许可证:Apache-2.0 任务类别: - 文本生成 - 问答 语言: - 英语 标签: - 强化学习(Reinforcement Learning, RL) - 评估 - 信用卡 - GRPO - RL环境 - 奖励模型 样本规模: - n<1K 展示名称:LexEnvs 信用卡优化任务集 --- # LexEnvs — 信用卡优化任务集 本数据集包含164个评估任务,用于训练和基准测试强化学习(Reinforcement Learning, RL)智能体(AI Agent)以开展信用卡优化任务。每个任务均提供包含消费模式、约束条件与偏好设定的用户场景,要求AI智能体(AI Agent)推荐最优信用卡并给出期望价值(Expected Value, EV)计算结果。 ## 数据集说明 本数据集为 [LexEnvs Harbor 强化学习环境(RL Environment)](https://github.com/endishai/lexenvs) 的任务套件,该环境是一个无状态评估服务器,可基于多维度评分标准对AI智能体(AI Agent)的回复进行打分。 ### 任务结构 每个任务均为JSON对象,包含以下字段: - **prompt** — 包含用户消费概况与约束条件的场景描述,以及对共享知识库与系统提示的引用 - **scoring** — 包含加权评估维度,涵盖自动化评审与人工复核环节 - **reference_solution** — 信用卡推荐的标准答案、期望价值拆解结果与专家注释 ### 难度等级 | 难度等级 | 任务数量 | 描述 | |---------|---------|-----| | 简单 | ~30 | 单张信用卡推荐,约束条件直观明确 | | 中等 | ~60 | 多张信用卡优化,约束条件存在交互影响 | | 困难 | ~74 | 复杂信用卡组合、约束冲突与边缘场景 | 以`objective_`为前缀的任务采用全自动化评分(无需人工评审)。 ### 评分维度 任务基于不同类型的加权维度进行评分: **标准任务(`easy_*`、`medium_*`、`hard_*`):** - **EV准确性(40%)** — AI智能体(AI Agent)的EV计算结果与参考值的贴合程度 - **约束合规性(30%)** — 推荐信用卡正确、住宿方案匹配 - **推理质量(20%)** — 权衡分析的质量(需人工评审) - **约束优先级处理(10%)** — 模糊/冲突约束的处理能力(需人工评审) **自动化任务(`objective_*`):** - **EV准确性(30%)** — 基于信用卡数据库真实值进行计算 - **卡片选择(25%)** — 推荐卡片与最优卡片的F1分数 - **事实保真度(30%)** — 信用卡功能相关表述的准确性 - **约束合规性(15%)** — 遵守用户约束与发卡机构规则 ## 包含文件 - `tasks/` — 164个任务定义JSON文件 - `knowledge_base.md` — 共享知识库(约56,000字符),涵盖信用卡发卡机构、转账合作伙伴、积分估值与申请规则 - `system_prompt_template.md` — 所有任务均引用的共享系统提示模板 - `card_database.json` — 用于自动化EV计算的结构化信用卡数据 - `card_prefix_to_issuer.json` — 信用卡名称前缀与发卡机构的映射表 ## 使用方法 ### 使用数据集库 python from datasets import load_dataset dataset = load_dataset("endishai/lexenvs-tasks") # 浏览任务 for task in dataset["train"]: print(task["task_id"], task["metadata"]["difficulty"]) ### 使用Harbor评估服务器 本数据集专为LexEnvs Harbor服务器设计,该服务器可处理知识库注入、评分与奖励计算: python import httpx # 列出可用任务 tasks = httpx.get("http://localhost:8000/api/tasks").json() # 获取任务提示(包含系统提示与知识库) task = httpx.get("http://localhost:8000/api/tasks/easy_01").json() # 评估AI智能体(AI Agent)的回复 result = httpx.post( "http://localhost:8000/api/tasks/easy_01/evaluate", json={"answer": agent_response}, ).json() print(result["reward"]) # 取值范围为[0, 1]的浮点数 ## 引用 如果您在研究中使用本数据集,请引用: bibtex @misc{lexenvs2026, title={LexEnvs: A Harbor RL Environment for Credit Card Optimization}, author={Imberman, Daniel and Book, Kenny and Loeber, John}, year={2026}, url={https://github.com/endishai/lexenvs} } ## 许可证 Apache License 2.0 — 详见 [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).
提供机构:
endishai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作