AhmedMudasir/MAS-DATASET

Name: AhmedMudasir/MAS-DATASET
Creator: AhmedMudasir
Published: 2026-04-28 05:58:10
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/AhmedMudasir/MAS-DATASET

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - text-classification language: - en - ur tags: - multi-agent-system - human-evaluation - Pakistan - credibility - professional-ux pretty_name: DATASET-MAS size_categories: - "n<1K" configs: - config_name: default data_files: - split: train path: Metadata/metadata.csv --- # DATASET-MAS: Professional Evaluation of Multi-Agent Systems in Pakistan ### 1. Dataset Summary This dataset captures the first-use experiences of 42 domain-expert professionals in Pakistan with Atypica.ai, a four-stage multi-agent system (MAS) pipeline. It provides a specialized resource for assessing how expert users judge AI credibility, localization, and technical depth in an emerging market context. ### 2. Dataset Composition The dataset is organized by Participant ID (P01–P42) and includes: * 42 AI-Generated Reports (`Reports/` folder): Business intelligence documents produced by the Atypica.ai MAS pipeline. * 42 Anonymized Transcripts (`Transcripts/` folder): Clean text records of semi-structured interviews where experts evaluated the reports. * Metadata (`Metadata/metadata.csv`): Details on the age, gender, professional domain, and years of experience for all 42 participants. ### 3. Annotation Definitions (Data Dictionary) To ensure the analysis is reproducible, the following definitions were used to categorize expert feedback: * Domain error: A factual or logical mistake identified by a participant using their specific professional expertise (e.g., P24 identifying the omission of "Chromite"). * Localization failure: Missing, incorrect, or culturally insensitive information specific to the Pakistani context (e.g., incorrect tax rates or missing local landmarks). *Note: While the broader study also evaluates Efficiency and Process Transparency, "Domain Error" and "Localization Failure" serve as the primary categorical labels for the machine learning benchmark tasks in this dataset.* ### 4. Benchmarking Tasks This dataset supports the following research tasks: 1. Credibility Prediction: Using expert transcripts to predict trust levels in specific AI-generated business outputs. 2. Localization Quality Scoring: Measuring the accuracy of AI-generated cultural, legal, and economic content for non-Western regions. 3. Process Transparency Analysis: Evaluating user disorientation during complex multi-agent reasoning phases. ### 5. Ethics & Privacy * Informed Consent: All 42 participants provided explicit informed consent prior to the study. * Anonymization: All transcripts and reports have been manually scrubbed of real names, company identities, and sensitive contact information. * Institutional Oversight: This research was conducted at the Design AI Lab, College of Design and Innovation, Tongji University, under the supervision of Professor Fan Ling. ### 6. Licensing This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

提供机构：

AhmedMudasir

5,000+

优质数据集

54 个

任务类型

进入经典数据集