five

cmu-lti/stateful

收藏
Hugging Face2025-11-26 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/cmu-lti/stateful
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - software-engineering - code - swe-bench - stateful - user-modeling - theory-of-mind size_categories: - 1K<n<10K task_categories: - text-generation - question-answering --- # Stateful SWE Dataset ## Dataset Summary The **Stateful SWE Dataset** extends the [cmu-lti/interactive-swe](https://huggingface.co/datasets/cmu-lti/interactive-swe) dataset with user profile assignments for studying stateful interactions in software engineering tasks. Each instance from the original interactive-swe dataset is enriched with a randomly assigned user profile that defines interaction preferences and coding standards. This dataset enables research into: - **Theory of Mind (ToM)** modeling for AI agents - **Stateful user interactions** in software engineering - **Personalized code assistance** based on user preferences - **User behavior modeling** in programming contexts ## Dataset Details - **Total instances**: 500 - **User profiles**: 15 distinct profiles - **Base dataset**: cmu-lti/interactive-swe - **Assignment**: Random profile assignment with seed 42 - **Version**: 1.0.0 ## Dataset Structure ### Original Interactive-SWE Columns (15 columns) All original columns from cmu-lti/interactive-swe are preserved: - `repo`: Repository name - `instance_id`: Unique identifier from original dataset - `base_commit`: Base commit hash - `patch`: Code changes - `test_patch`: Test-related changes - `problem_statement`: Description of the issue - `hints_text`: Additional hints - `created_at`: Original timestamp - `version`: Version information - `FAIL_TO_PASS`: Test information - `PASS_TO_PASS`: Test information - `environment_setup_commit`: Environment setup details - `difficulty`: Problem difficulty level - `original_issue`: Link to original issue - `files`: Comma-separated list of affected files ### New Stateful Columns (8 columns) - `user_profile_id`: Assigned user profile identifier - `user_roleplay_prompt`: Second-person narrative describing the user - `interaction_preferences`: JSON string with verbosity, timing, and response style preferences - `coding_preferences`: Comma-separated string of user's technical preferences - `stateful_instance_id`: New unique identifier for stateful instances - `assignment_seed`: Random seed used for profile assignment - `dataset_version`: Version of the stateful dataset - `created_at_stateful`: Timestamp when stateful instance was created ## User Profile Types The dataset includes 15 diverse user profiles with varying: - **Verbosity preferences**: concise vs verbose - **Question timing**: upfront vs ongoing clarification - **Response style**: short vs long responses - **Coding preferences**: frameworks, testing, documentation, etc. ## Usage Example ```python from datasets import load_dataset import json # Load the stateful dataset dataset = load_dataset("cmu-lti/stateful", split="test") # Access an instance with its user profile instance = dataset[0] print(f"Problem: {instance['problem_statement']}") print(f"User Profile: {instance['user_profile_id']}") # Parse interaction preferences from JSON string prefs = json.loads(instance['interaction_preferences']) print(f"Interaction Style: {prefs}") # Parse coding preferences from comma-separated string coding_prefs = instance['coding_preferences'].split(',') print(f"Coding Preferences: {coding_prefs[:3]}...") # Show first 3 ``` ## Citation If you use this dataset, please cite both the original interactive-swe dataset and this stateful extension: ```bibtex @dataset{stateful_swe_2025, title={Stateful SWE Dataset: User Profile Extensions for Interactive Software Engineering}, author={CMU ToM-SWE Team}, year={2025}, url={https://huggingface.co/datasets/cmu-lti/stateful} } ``` ## License This dataset follows the same license as the original cmu-lti/interactive-swe dataset. ## Dataset Creation Created using the ToM-SWE framework for Theory of Mind modeling in software engineering contexts.
提供机构:
cmu-lti
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作