chrisiyer/cardgames-sftdata-trimmed
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/chrisiyer/cardgames-sftdata-trimmed
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Cardgames SFT Data (Trimmed)
task_categories:
- image-text-to-text
language:
- en
size_categories:
- 10K<n<100K
configs:
- config_name: blackjack
data_files:
- split: train
path: blackjack/train-*
- split: validation
path: blackjack/validation-*
- split: test
path: blackjack/test-*
- config_name: numberline
data_files:
- split: train
path: numberline/train-*
- split: validation
path: numberline/validation-*
- split: test
path: numberline/test-*
- config_name: ezpoints
data_files:
- split: train
path: ezpoints/train-*
- split: validation
path: ezpoints/validation-*
- split: test
path: ezpoints/test-*
- config_name: points24
data_files:
- split: train
path: points24/train-*
- split: validation
path: points24/validation-*
- split: test
path: points24/test-*
---
# Cardgames SFT Data (Trimmed)
## Dataset Summary
This dataset is a modified version of data published alongside *Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning* ([website](https://rl4vlm.github.io), [data](https://huggingface.co/LEVI-Project/sft-data/tree/main)).
If you use this data, please make sure to cite their work!
It contains four card- and game-based vision-language decision tasks:
1. `numberline`
2. `ezpoints`
3. `points24`
4. `blackjack`
Compared with the original published SFT data, this version simplifies prompts and target outputs so that each example contains only the image, prompt, and target output. In tasks that were originally structured as sequential action traces, we also collapse those traces into single supervised examples when appropriate.
Each task is provided as its own dataset config with `train`, `validation`, and `test` splits, and each example contains the fields:
- `id`
- `image`
- `prompt`
- `output`
The dataset is fully compatible with Hugging Face `datasets` and can be loaded directly with `load_dataset()`.
Example:
```python
from datasets import load_dataset
dataset = load_dataset("chrisiyer/cardgames-sftdata-trimmed", "blackjack")
train_split = dataset["train"]
example = train_split[0]
print(example["prompt"])
print(example["output"])
print(example["image"])
```
## Task Descriptions
### 1. Numberline
Given a target number and a current number, presented in an image, the model must decide whether to move the current number up or down.
This task is largely unmodified from the original version. The prompts and outputs were shortened to include just the image, prompt, and target output, excluding intermediate outputs such as target chain-of-thought text and other auxiliary fields.
### 2. EZPoints
Given two cards that either add up to 12 or multiply to 12, the model must give a formula using their values that evaluates to 12, for example `3*4`.
The prompts and outputs were shortened. We also eliminated the sequential design of the published SFT outputs, so that each trial is a single example in which the image is presented once and the full formula appears in the output.
### 3. Points24
Given four cards, the model must give a formula using their values that evaluates to 24.
The prompts and outputs were shortened, and the sequential structure was eliminated. Note that models in the original paper performed poorly on this task.
### 4. Blackjack
Given a dealer hand and a player hand, the model must decide whether to `hit` or `stand`.
The prompts and outputs were shortened. The sequential trials were retained, for example when a trial following a `hit` contains the same cards plus one additional card, but every trial has a unique target label and can be treated as an independent supervised example.
## Source Data
Original source data:
- RL4VLM project website: [https://rl4vlm.github.io](https://rl4vlm.github.io)
- Published SFT data: [https://huggingface.co/LEVI-Project/sft-data/tree/main](https://huggingface.co/LEVI-Project/sft-data/tree/main)
## Notes
This dataset is intended as a simplified supervised fine-tuning version of the original task data for vision-language model training and continual-learning experiments.
提供机构:
chrisiyer



