five

cmudrc/OpenSeeSimE-Structural-Small

收藏
Hugging Face2026-04-24 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/cmudrc/OpenSeeSimE-Structural-Small
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: file_name dtype: string - name: source_file dtype: string - name: question dtype: string - name: question_type dtype: string - name: question_id dtype: int32 - name: answer dtype: string - name: answer_choices list: string - name: correct_choice_idx dtype: int32 - name: image dtype: image - name: video dtype: video - name: media_type dtype: string splits: - name: test num_examples: 10343 configs: - config_name: default data_files: - split: test path: data/test-* license: mit task_categories: - visual-question-answering language: - en size_categories: - 10K<n<100K tags: - engineering - simulation - stratified-subset --- # OpenSeeSimE-Structural-Small A **stratified 10% subset** of [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) for evaluating vision-language models at a reduced compute footprint while preserving the joint distribution of simulation type, question type, media type, and question id. ## Subset Provenance - **Parent dataset**: [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) (102,678 rows total) - **Rows in this subset**: **10,343** (10.07% of parent) - **Source classes**: `Beams`, `Dog Bone`, `Hip Implant`, `Pressure Vessel`, `Wall Bracket` - **Parquet shards**: 4 | **Storage**: ~15.60 GB - **Sampling**: per-stratum shuffle with `numpy.random.default_rng(42)`, then take `ceil(n * fraction)` from each stratum. Any non-empty stratum contributes at least 1 row. - **Strata**: `(source_file, question_type, media_type, question_id)` — all four jointly. - **Nesting**: the 1% subset is a literal subset of the 10% subset (same shuffled prefix is taken for every fraction). ## Composition ### By `source_file` | source_file | rows | pct | |:----------------|-------:|------:| | Beams | 2088 | 20.19 | | Pressure Vessel | 2074 | 20.05 | | Dog Bone | 2061 | 19.93 | | Hip Implant | 2060 | 19.92 | | Wall Bracket | 2060 | 19.92 | ### By `media_type` | media_type | rows | |:-------------|-------:| | image | 5192 | | video | 5151 | ### By `(source_file, question_type)` | source_file | Binary | Multiple Choice | Spatial | Total | |:----------------|---------:|------------------:|----------:|--------:| | Beams | 627 | 1045 | 416 | 2088 | | Dog Bone | 619 | 1030 | 412 | 2061 | | Hip Implant | 618 | 1030 | 412 | 2060 | | Pressure Vessel | 622 | 1040 | 412 | 2074 | | Wall Bracket | 618 | 1030 | 412 | 2060 | ## Feature Schema Identical to the parent dataset. See [`cmudrc/OpenSeeSimE-Structural`](https://huggingface.co/datasets/cmudrc/OpenSeeSimE-Structural) for full documentation of simulation generation, ground-truth extraction, preprocessing, limitations, and intended use. ```python { 'file_name': str, # Unique identifier 'source_file': str, # Base simulation model 'question': str, # Question text 'question_type': str, # 'Binary', 'Multiple Choice', 'Spatial' 'question_id': int, # Question identifier (1-20) 'answer': str, # Ground truth answer 'answer_choices': list[str], # Options 'correct_choice_idx': int, # Index of correct answer 'image': Image, # PIL Image (1920x1440) or null for video rows 'video': Video, # Video bytes or null for image rows 'media_type': str, # 'image' or 'video' } ``` ## Intended Use - Benchmark evaluation of vision-language models on engineering simulation question answering at reduced compute cost - Smoke-testing of evaluation pipelines before running the full benchmark - Comparative studies where storage or bandwidth constraints matter ## License MIT — same as parent. Free for academic and commercial use with attribution. ## Citation ```bibtex @article{ezemba2024opensesime, title={OpenSeeSimE: A Large-Scale Benchmark to Assess Vision-Language Model Question Answering Capabilities in Engineering Simulations}, author={Ezemba, Jessica and Pohl, Jason and Tucker, Conrad and McComb, Christopher}, year={2025} } ``` ## Contact **Jessica Ezemba** — jezemba@andrew.cmu.edu Department of Mechanical Engineering, Carnegie Mellon University
提供机构:
cmudrc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作