LeoChen085/SlipSFTDataset
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/LeoChen085/SlipSFTDataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: ecg_cot
features:
- name: answer
dtype: string
- name: post_prompt
dtype: string
- name: pre_prompt
dtype: string
- name: time_series
list:
list: float64
- name: time_series_text
list: string
- name: rationale
dtype: string
- name: template_id
dtype: int64
- name: question_type
dtype: string
- name: question
dtype: string
- name: ecg_id
list: int64
- name: ecg_paths
list: string
- name: clinical_contexts
list: string
- name: correct_answer
dtype: string
- name: possible_answers
list: string
splits:
- name: train
num_bytes: 16133017544
num_examples: 159313
- name: test
num_bytes: 4158528280
num_examples: 41093
- name: val
num_bytes: 3150877403
num_examples: 31137
download_size: 8509879577
dataset_size: 23442423227
- config_name: har_cot
features:
- name: answer
dtype: string
- name: post_prompt
dtype: string
- name: pre_prompt
dtype: string
- name: time_series
list:
list: float64
- name: time_series_text
list: string
- name: label
dtype: string
- name: x_axis
list: float64
- name: y_axis
list: float64
- name: z_axis
list: float64
splits:
- name: train
num_bytes: 551920015
num_examples: 68542
- name: test
num_bytes: 66205732
num_examples: 8222
- name: val
num_bytes: 70190023
num_examples: 8718
download_size: 409754270
dataset_size: 688315770
- config_name: m4_caption
features:
- name: answer
dtype: string
- name: post_prompt
dtype: string
- name: pre_prompt
dtype: string
- name: time_series
list:
list: float64
- name: time_series_text
list: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 229635248
num_examples: 80000
- name: test
num_bytes: 28368480
num_examples: 10000
- name: val
num_bytes: 28973908
num_examples: 10000
download_size: 162851450
dataset_size: 286977636
- config_name: sleep_cot
features:
- name: answer
dtype: string
- name: post_prompt
dtype: string
- name: pre_prompt
dtype: string
- name: time_series
list:
list: float64
- name: time_series_text
list: string
- name: label
dtype: string
- name: original_data
list: float64
splits:
- name: train
num_bytes: 191257251
num_examples: 7434
- name: test
num_bytes: 23927338
num_examples: 930
- name: val
num_bytes: 23930039
num_examples: 930
download_size: 82054039
dataset_size: 239114628
- config_name: tsqa
features:
- name: answer
dtype: string
- name: post_prompt
dtype: string
- name: pre_prompt
dtype: string
- name: time_series
list:
list: float64
- name: time_series_text
list: string
splits:
- name: train
num_bytes: 99951607
num_examples: 38400
- name: test
num_bytes: 12778327
num_examples: 4800
- name: val
num_bytes: 12596007
num_examples: 4800
download_size: 71388620
dataset_size: 125325941
configs:
- config_name: ecg_cot
data_files:
- split: train
path: ecg_cot/train-*
- split: test
path: ecg_cot/test-*
- split: val
path: ecg_cot/val-*
- config_name: har_cot
data_files:
- split: train
path: har_cot/train-*
- split: test
path: har_cot/test-*
- split: val
path: har_cot/val-*
- config_name: m4_caption
data_files:
- split: train
path: m4_caption/train-*
- split: test
path: m4_caption/test-*
- split: val
path: m4_caption/val-*
- config_name: sleep_cot
data_files:
- split: train
path: sleep_cot/train-*
- split: test
path: sleep_cot/test-*
- split: val
path: sleep_cot/val-*
- config_name: tsqa
data_files:
- split: train
path: tsqa/train-*
- split: test
path: tsqa/test-*
- split: val
path: tsqa/val-*
license: mit
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- time-series
- sensor
- question-answering
- captioning
- supervised-finetuning
size_categories:
- 100K<n<1M
---
# SLIP SFT Dataset
Supervised finetuning (SFT) data used to train [SLIP](https://github.com/yuc0805/SLIP)_SFT for sensor question answering and captioning tasks. This dataset is derived from the [OpenTSLM](https://github.com/Ilovecodinghhh/OpenTSLM) benchmark — please refer to the original OpenTSLM repository for full dataset details, licensing of individual sources, and documentation.
## Configurations
| Config | Task | Train | Val | Test |
|--------|------|-------|-----|------|
| `ecg_cot` | ECG question answering (free-form, chain-of-thought) | 159,313 | 31,137 | 41,093 |
| `har_cot` | Human activity recognition QA (free-form, chain-of-thought) | 68,542 | 8,718 | 8,222 |
| `sleep_cot` | Sleep stage QA (free-form, chain-of-thought) | 7,434 | 930 | 930 |
| `tsqa` | General time-series QA (multiple choice) | 38,400 | 4,800 | 4,800 |
| `m4_caption` | Time-series caption generation | 80,000 | 10,000 | 10,000 |
## Usage
```python
from datasets import load_dataset
# Load a specific config
ds = load_dataset("LeoChen085/SlipSFTDataset", "har_cot")
```
Each example contains `time_series` (nested list of float64), `time_series_text` (textual representation), `pre_prompt` / `post_prompt` (instruction framing), and `answer` (target output). Some configs include additional fields such as `rationale`, `label`, or `question`.
## Related Resources
- **SLIP code and models:** [https://github.com/yuc0805/SLIP](https://github.com/yuc0805/SLIP)
- **SLIP pretraining + evaluation data:** [LeoChen085/SlipDataset](https://huggingface.co/datasets/LeoChen085/SlipDataset)
- **Original SFT data source:** [OpenTSLM](https://github.com/Ilovecodinghhh/OpenTSLM)
- **Paper:** *Learning Transferable Sensor Models via Language-Informed Pretraining*
## Citation
```bibtex
@article{chen2026slip,
title={Learning Transferable Sensor Models via Language-Informed Pretraining},
author={Chen, Yuliang and Pillai, Arvind and Wu, Yu Yvonne and Griffin, Tess Z. and Marsch, Lisa and Heinz, Michael V. and Jacobson, Nicholas C. and Campbell, Andrew},
year={2026}
}
```
## Dataset Card Contact
Yuliang Chen — yuliang.chen.gr@dartmouth.edu
提供机构:
LeoChen085



