chrisiyer/cardgames-sftdata-trimmed

Name: chrisiyer/cardgames-sftdata-trimmed
Creator: chrisiyer
Published: 2026-04-16 19:47:53
License: 暂无描述

Hugging Face2026-04-16 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/chrisiyer/cardgames-sftdata-trimmed

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Cardgames SFT Data (Trimmed) task_categories: - image-text-to-text language: - en size_categories: - 10K<n<100K configs: - config_name: blackjack data_files: - split: train path: blackjack/train-* - split: validation path: blackjack/validation-* - split: test path: blackjack/test-* - config_name: numberline data_files: - split: train path: numberline/train-* - split: validation path: numberline/validation-* - split: test path: numberline/test-* - config_name: ezpoints data_files: - split: train path: ezpoints/train-* - split: validation path: ezpoints/validation-* - split: test path: ezpoints/test-* - config_name: points24 data_files: - split: train path: points24/train-* - split: validation path: points24/validation-* - split: test path: points24/test-* --- # Cardgames SFT Data (Trimmed) ## Dataset Summary This dataset is a modified version of data published alongside *Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning* ([website](https://rl4vlm.github.io), [data](https://huggingface.co/LEVI-Project/sft-data/tree/main)). If you use this data, please make sure to cite their work! It contains four card- and game-based vision-language decision tasks: 1. `numberline` 2. `ezpoints` 3. `points24` 4. `blackjack` Compared with the original published SFT data, this version simplifies prompts and target outputs so that each example contains only the image, prompt, and target output. In tasks that were originally structured as sequential action traces, we also collapse those traces into single supervised examples when appropriate. Each task is provided as its own dataset config with `train`, `validation`, and `test` splits, and each example contains the fields: - `id` - `image` - `prompt` - `output` The dataset is fully compatible with Hugging Face `datasets` and can be loaded directly with `load_dataset()`. Example: ```python from datasets import load_dataset dataset = load_dataset("chrisiyer/cardgames-sftdata-trimmed", "blackjack") train_split = dataset["train"] example = train_split[0] print(example["prompt"]) print(example["output"]) print(example["image"]) ``` ## Task Descriptions ### 1. Numberline Given a target number and a current number, presented in an image, the model must decide whether to move the current number up or down. This task is largely unmodified from the original version. The prompts and outputs were shortened to include just the image, prompt, and target output, excluding intermediate outputs such as target chain-of-thought text and other auxiliary fields. ### 2. EZPoints Given two cards that either add up to 12 or multiply to 12, the model must give a formula using their values that evaluates to 12, for example `3*4`. The prompts and outputs were shortened. We also eliminated the sequential design of the published SFT outputs, so that each trial is a single example in which the image is presented once and the full formula appears in the output. ### 3. Points24 Given four cards, the model must give a formula using their values that evaluates to 24. The prompts and outputs were shortened, and the sequential structure was eliminated. Note that models in the original paper performed poorly on this task. ### 4. Blackjack Given a dealer hand and a player hand, the model must decide whether to `hit` or `stand`. The prompts and outputs were shortened. The sequential trials were retained, for example when a trial following a `hit` contains the same cards plus one additional card, but every trial has a unique target label and can be treated as an independent supervised example. ## Source Data Original source data: - RL4VLM project website: [https://rl4vlm.github.io](https://rl4vlm.github.io) - Published SFT data: [https://huggingface.co/LEVI-Project/sft-data/tree/main](https://huggingface.co/LEVI-Project/sft-data/tree/main) ## Notes This dataset is intended as a simplified supervised fine-tuning version of the original task data for vision-language model training and continual-learning experiments.

提供机构：

chrisiyer

5,000+

优质数据集

54 个

任务类型

进入经典数据集