LeoChen085/SlipSFTDataset

Name: LeoChen085/SlipSFTDataset
Creator: LeoChen085
Published: 2026-03-12 15:15:40
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/LeoChen085/SlipSFTDataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: ecg_cot features: - name: answer dtype: string - name: post_prompt dtype: string - name: pre_prompt dtype: string - name: time_series list: list: float64 - name: time_series_text list: string - name: rationale dtype: string - name: template_id dtype: int64 - name: question_type dtype: string - name: question dtype: string - name: ecg_id list: int64 - name: ecg_paths list: string - name: clinical_contexts list: string - name: correct_answer dtype: string - name: possible_answers list: string splits: - name: train num_bytes: 16133017544 num_examples: 159313 - name: test num_bytes: 4158528280 num_examples: 41093 - name: val num_bytes: 3150877403 num_examples: 31137 download_size: 8509879577 dataset_size: 23442423227 - config_name: har_cot features: - name: answer dtype: string - name: post_prompt dtype: string - name: pre_prompt dtype: string - name: time_series list: list: float64 - name: time_series_text list: string - name: label dtype: string - name: x_axis list: float64 - name: y_axis list: float64 - name: z_axis list: float64 splits: - name: train num_bytes: 551920015 num_examples: 68542 - name: test num_bytes: 66205732 num_examples: 8222 - name: val num_bytes: 70190023 num_examples: 8718 download_size: 409754270 dataset_size: 688315770 - config_name: m4_caption features: - name: answer dtype: string - name: post_prompt dtype: string - name: pre_prompt dtype: string - name: time_series list: list: float64 - name: time_series_text list: string - name: id dtype: string splits: - name: train num_bytes: 229635248 num_examples: 80000 - name: test num_bytes: 28368480 num_examples: 10000 - name: val num_bytes: 28973908 num_examples: 10000 download_size: 162851450 dataset_size: 286977636 - config_name: sleep_cot features: - name: answer dtype: string - name: post_prompt dtype: string - name: pre_prompt dtype: string - name: time_series list: list: float64 - name: time_series_text list: string - name: label dtype: string - name: original_data list: float64 splits: - name: train num_bytes: 191257251 num_examples: 7434 - name: test num_bytes: 23927338 num_examples: 930 - name: val num_bytes: 23930039 num_examples: 930 download_size: 82054039 dataset_size: 239114628 - config_name: tsqa features: - name: answer dtype: string - name: post_prompt dtype: string - name: pre_prompt dtype: string - name: time_series list: list: float64 - name: time_series_text list: string splits: - name: train num_bytes: 99951607 num_examples: 38400 - name: test num_bytes: 12778327 num_examples: 4800 - name: val num_bytes: 12596007 num_examples: 4800 download_size: 71388620 dataset_size: 125325941 configs: - config_name: ecg_cot data_files: - split: train path: ecg_cot/train-* - split: test path: ecg_cot/test-* - split: val path: ecg_cot/val-* - config_name: har_cot data_files: - split: train path: har_cot/train-* - split: test path: har_cot/test-* - split: val path: har_cot/val-* - config_name: m4_caption data_files: - split: train path: m4_caption/train-* - split: test path: m4_caption/test-* - split: val path: m4_caption/val-* - config_name: sleep_cot data_files: - split: train path: sleep_cot/train-* - split: test path: sleep_cot/test-* - split: val path: sleep_cot/val-* - config_name: tsqa data_files: - split: train path: tsqa/train-* - split: test path: tsqa/test-* - split: val path: tsqa/val-* license: mit task_categories: - question-answering - text-generation language: - en tags: - time-series - sensor - question-answering - captioning - supervised-finetuning size_categories: - 100K<n<1M --- # SLIP SFT Dataset Supervised finetuning (SFT) data used to train [SLIP](https://github.com/yuc0805/SLIP)_SFT for sensor question answering and captioning tasks. This dataset is derived from the [OpenTSLM](https://github.com/Ilovecodinghhh/OpenTSLM) benchmark — please refer to the original OpenTSLM repository for full dataset details, licensing of individual sources, and documentation. ## Configurations | Config | Task | Train | Val | Test | |--------|------|-------|-----|------| | `ecg_cot` | ECG question answering (free-form, chain-of-thought) | 159,313 | 31,137 | 41,093 | | `har_cot` | Human activity recognition QA (free-form, chain-of-thought) | 68,542 | 8,718 | 8,222 | | `sleep_cot` | Sleep stage QA (free-form, chain-of-thought) | 7,434 | 930 | 930 | | `tsqa` | General time-series QA (multiple choice) | 38,400 | 4,800 | 4,800 | | `m4_caption` | Time-series caption generation | 80,000 | 10,000 | 10,000 | ## Usage ```python from datasets import load_dataset # Load a specific config ds = load_dataset("LeoChen085/SlipSFTDataset", "har_cot") ``` Each example contains `time_series` (nested list of float64), `time_series_text` (textual representation), `pre_prompt` / `post_prompt` (instruction framing), and `answer` (target output). Some configs include additional fields such as `rationale`, `label`, or `question`. ## Related Resources - **SLIP code and models:** [https://github.com/yuc0805/SLIP](https://github.com/yuc0805/SLIP) - **SLIP pretraining + evaluation data:** [LeoChen085/SlipDataset](https://huggingface.co/datasets/LeoChen085/SlipDataset) - **Original SFT data source:** [OpenTSLM](https://github.com/Ilovecodinghhh/OpenTSLM) - **Paper:** *Learning Transferable Sensor Models via Language-Informed Pretraining* ## Citation ```bibtex @article{chen2026slip, title={Learning Transferable Sensor Models via Language-Informed Pretraining}, author={Chen, Yuliang and Pillai, Arvind and Wu, Yu Yvonne and Griffin, Tess Z. and Marsch, Lisa and Heinz, Michael V. and Jacobson, Nicholas C. and Campbell, Andrew}, year={2026} } ``` ## Dataset Card Contact Yuliang Chen — yuliang.chen.gr@dartmouth.edu

提供机构：

LeoChen085

5,000+

优质数据集

54 个

任务类型

进入经典数据集