five

cmudrc/OpenSeeSimE-Structural-Mini

收藏
Hugging Face2026-04-24 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/cmudrc/OpenSeeSimE-Structural-Mini
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: file_name dtype: string - name: source_file dtype: string - name: question dtype: string - name: question_type dtype: string - name: question_id dtype: int32 - name: answer dtype: string - name: answer_choices list: string - name: correct_choice_idx dtype: int32 - name: image dtype: image - name: video dtype: video - name: media_type dtype: string splits: - name: test num_examples: 1120 configs: - config_name: default data_files: - split: test path: data/test-* license: mit task_categories: - visual-question-answering language: - en size_categories: - 1K<n<10K tags: - engineering - simulation - stratified-subset --- # OpenSeeSimE-Structural-Mini A **stratified 1% subset** of [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) for evaluating vision-language models at a reduced compute footprint while preserving the joint distribution of simulation type, question type, media type, and question id. ## Subset Provenance - **Parent dataset**: [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) (102,678 rows total) - **Rows in this subset**: **1,120** (1.09% of parent) - **Source classes**: `Beams`, `Dog Bone`, `Hip Implant`, `Pressure Vessel`, `Wall Bracket` - **Parquet shards**: 1 | **Storage**: ~1.73 GB - **Sampling**: per-stratum shuffle with `numpy.random.default_rng(42)`, then take `ceil(n * fraction)` from each stratum. Any non-empty stratum contributes at least 1 row. - **Strata**: `(source_file, question_type, media_type, question_id)` — all four jointly. - **Nesting**: the 1% subset is a literal subset of the 10% subset (same shuffled prefix is taken for every fraction). ## Composition ### By `source_file` | source_file | rows | pct | |:----------------|-------:|------:| | Beams | 240 | 21.43 | | Dog Bone | 220 | 19.64 | | Hip Implant | 220 | 19.64 | | Pressure Vessel | 220 | 19.64 | | Wall Bracket | 220 | 19.64 | ### By `media_type` | media_type | rows | |:-------------|-------:| | image | 560 | | video | 560 | ### By `(source_file, question_type)` | source_file | Binary | Multiple Choice | Spatial | Total | |:----------------|---------:|------------------:|----------:|--------:| | Beams | 72 | 120 | 48 | 240 | | Dog Bone | 66 | 110 | 44 | 220 | | Hip Implant | 66 | 110 | 44 | 220 | | Pressure Vessel | 66 | 110 | 44 | 220 | | Wall Bracket | 66 | 110 | 44 | 220 | ## Feature Schema Identical to the parent dataset. See [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) for full documentation of simulation generation, ground-truth extraction, preprocessing, limitations, and intended use. ```python { 'file_name': str, # Unique identifier 'source_file': str, # Base simulation model 'question': str, # Question text 'question_type': str, # 'Binary', 'Multiple Choice', 'Spatial' 'question_id': int, # Question identifier (1-20) 'answer': str, # Ground truth answer 'answer_choices': list[str], # Options 'correct_choice_idx': int, # Index of correct answer 'image': Image, # PIL Image (1920x1440) or null for video rows 'video': Video, # Video bytes or null for image rows 'media_type': str, # 'image' or 'video' } ``` ## Intended Use - Benchmark evaluation of vision-language models on engineering simulation question answering at reduced compute cost - Smoke-testing of evaluation pipelines before running the full benchmark - Comparative studies where storage or bandwidth constraints matter ## License MIT — same as parent. Free for academic and commercial use with attribution. ## Citation ```bibtex @article{ezemba2024opensesime, title={OpenSeeSimE: A Large-Scale Benchmark to Assess Vision-Language Model Question Answering Capabilities in Engineering Simulations}, author={Ezemba, Jessica and Pohl, Jason and Tucker, Conrad and McComb, Christopher}, year={2025} } ``` ## Contact **Jessica Ezemba** — jezemba@andrew.cmu.edu Department of Mechanical Engineering, Carnegie Mellon University
提供机构:
cmudrc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作