endishai/lexenvs-tasks

Name: endishai/lexenvs-tasks
Creator: endishai
Published: 2026-04-20 19:00:30
License: 暂无描述

Hugging Face2026-04-20 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/endishai/lexenvs-tasks

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - question-answering language: - en tags: - reinforcement-learning - evaluation - credit-cards - grpo - rl-environment - reward-model size_categories: - n<1K pretty_name: LexEnvs Credit Card Optimization Tasks --- # LexEnvs — Credit Card Optimization Tasks A dataset of 164 evaluation tasks for training and benchmarking RL agents on credit card optimization. Each task presents a user scenario with spending patterns, constraints, and preferences, and asks the agent to recommend optimal credit cards with expected value (EV) calculations. ## Dataset Description This dataset is the task suite for the [LexEnvs Harbor RL Environment](https://github.com/endishai/lexenvs), a stateless evaluation server that scores agent responses on a multi-dimensional rubric. ### Task Structure Each task is a JSON object containing: - **prompt** — A user scenario with spending profile and constraints, plus references to a shared knowledge base and system prompt - **scoring** — Weighted evaluation dimensions with automated and human-review components - **reference_solution** — Ground truth card recommendations, EV breakdowns, and expert notes ### Difficulty Levels | Difficulty | Count | Description | |-----------|-------|-------------| | Easy | ~30 | Single card recommendation, straightforward constraints | | Medium | ~60 | Multi-card optimization, interacting constraints | | Hard | ~74 | Complex portfolios, conflicting constraints, edge cases | Tasks prefixed with `objective_` use fully automated scoring (no human review needed). ### Scoring Dimensions Tasks are scored on weighted dimensions that vary by task type: **Standard tasks** (`easy_*`, `medium_*`, `hard_*`): - **EV Accuracy** (40%) — How close the agent's EV calculation is to the reference - **Constraint Compliance** (30%) — Correct cards recommended, housing options matched - **Reasoning Quality** (20%) — Quality of tradeoff analysis (human review) - **Constraint Prioritization** (10%) — Handling of ambiguous/conflicting constraints (human review) **Objective tasks** (`objective_*`): - **EV Accuracy** (30%) — Computed against card database ground truth - **Card Selection** (25%) — F1 score of recommended vs. optimal cards - **Factual Fidelity** (30%) — Accuracy of claims about card features - **Constraint Compliance** (15%) — Adherence to user constraints and issuer rules ## Included Files - `tasks/` — 164 task definition JSON files - `knowledge_base.md` — Shared knowledge base (~56K chars) covering credit card issuers, transfer partners, point valuations, and application rules - `system_prompt_template.md` — Shared system prompt template referenced by all tasks - `card_database.json` — Structured card data used for automated EV computation - `card_prefix_to_issuer.json` — Card name prefix to issuer mapping ## Usage ### With the Datasets Library ```python from datasets import load_dataset dataset = load_dataset("endishai/lexenvs-tasks") # Browse tasks for task in dataset["train"]: print(task["task_id"], task["metadata"]["difficulty"]) ``` ### With the Harbor Evaluation Server The tasks are designed to be served by the LexEnvs Harbor server, which handles knowledge base injection, scoring, and reward computation: ```python import httpx # List available tasks tasks = httpx.get("http://localhost:8000/api/tasks").json() # Get a task prompt (includes system prompt + knowledge base) task = httpx.get("http://localhost:8000/api/tasks/easy_01").json() # Evaluate an agent's answer result = httpx.post( "http://localhost:8000/api/tasks/easy_01/evaluate", json={"answer": agent_response}, ).json() print(result["reward"]) # float in [0, 1] ``` ## Citation If you use this dataset in your research, please cite: ```bibtex @misc{lexenvs2026, title={LexEnvs: A Harbor RL Environment for Credit Card Optimization}, author={Imberman, Daniel and Book, Kenny and Loeber, John}, year={2026}, url={https://github.com/endishai/lexenvs} } ``` ## License Apache License 2.0 — see [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).

许可证：Apache-2.0 任务类别： - 文本生成 - 问答语言： - 英语标签： - 强化学习（Reinforcement Learning, RL） - 评估 - 信用卡 - GRPO - RL环境 - 奖励模型样本规模： - n<1K 展示名称：LexEnvs 信用卡优化任务集 --- # LexEnvs — 信用卡优化任务集本数据集包含164个评估任务，用于训练和基准测试强化学习（Reinforcement Learning, RL）智能体（AI Agent）以开展信用卡优化任务。每个任务均提供包含消费模式、约束条件与偏好设定的用户场景，要求AI智能体（AI Agent）推荐最优信用卡并给出期望价值（Expected Value, EV）计算结果。 ## 数据集说明本数据集为 [LexEnvs Harbor 强化学习环境（RL Environment）](https://github.com/endishai/lexenvs) 的任务套件，该环境是一个无状态评估服务器，可基于多维度评分标准对AI智能体（AI Agent）的回复进行打分。 ### 任务结构每个任务均为JSON对象，包含以下字段： - **prompt** — 包含用户消费概况与约束条件的场景描述，以及对共享知识库与系统提示的引用 - **scoring** — 包含加权评估维度，涵盖自动化评审与人工复核环节 - **reference_solution** — 信用卡推荐的标准答案、期望价值拆解结果与专家注释 ### 难度等级 | 难度等级 | 任务数量 | 描述 | |---------|---------|-----| | 简单 | ~30 | 单张信用卡推荐，约束条件直观明确 | | 中等 | ~60 | 多张信用卡优化，约束条件存在交互影响 | | 困难 | ~74 | 复杂信用卡组合、约束冲突与边缘场景 | 以`objective_`为前缀的任务采用全自动化评分（无需人工评审）。 ### 评分维度任务基于不同类型的加权维度进行评分： **标准任务（`easy_*`、`medium_*`、`hard_*`）：** - **EV准确性（40%）** — AI智能体（AI Agent）的EV计算结果与参考值的贴合程度 - **约束合规性（30%）** — 推荐信用卡正确、住宿方案匹配 - **推理质量（20%）** — 权衡分析的质量（需人工评审） - **约束优先级处理（10%）** — 模糊/冲突约束的处理能力（需人工评审） **自动化任务（`objective_*`）：** - **EV准确性（30%）** — 基于信用卡数据库真实值进行计算 - **卡片选择（25%）** — 推荐卡片与最优卡片的F1分数 - **事实保真度（30%）** — 信用卡功能相关表述的准确性 - **约束合规性（15%）** — 遵守用户约束与发卡机构规则 ## 包含文件 - `tasks/` — 164个任务定义JSON文件 - `knowledge_base.md` — 共享知识库（约56,000字符），涵盖信用卡发卡机构、转账合作伙伴、积分估值与申请规则 - `system_prompt_template.md` — 所有任务均引用的共享系统提示模板 - `card_database.json` — 用于自动化EV计算的结构化信用卡数据 - `card_prefix_to_issuer.json` — 信用卡名称前缀与发卡机构的映射表 ## 使用方法 ### 使用数据集库 python from datasets import load_dataset dataset = load_dataset("endishai/lexenvs-tasks") # 浏览任务 for task in dataset["train"]: print(task["task_id"], task["metadata"]["difficulty"]) ### 使用Harbor评估服务器本数据集专为LexEnvs Harbor服务器设计，该服务器可处理知识库注入、评分与奖励计算： python import httpx # 列出可用任务 tasks = httpx.get("http://localhost:8000/api/tasks").json() # 获取任务提示（包含系统提示与知识库） task = httpx.get("http://localhost:8000/api/tasks/easy_01").json() # 评估AI智能体（AI Agent）的回复 result = httpx.post( "http://localhost:8000/api/tasks/easy_01/evaluate", json={"answer": agent_response}, ).json() print(result["reward"]) # 取值范围为[0, 1]的浮点数 ## 引用如果您在研究中使用本数据集，请引用： bibtex @misc{lexenvs2026, title={LexEnvs: A Harbor RL Environment for Credit Card Optimization}, author={Imberman, Daniel and Book, Kenny and Loeber, John}, year={2026}, url={https://github.com/endishai/lexenvs} } ## 许可证 Apache License 2.0 — 详见 [LICENSE](https://github.com/endishai/lexenvs/blob/main/LICENSE).

提供机构：

endishai

5,000+

优质数据集

54 个

任务类型

进入经典数据集