five

AhmedMudasir/MAS-DATASET

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/AhmedMudasir/MAS-DATASET
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-classification language: - en - ur tags: - multi-agent-system - human-evaluation - Pakistan - credibility - professional-ux pretty_name: DATASET-MAS size_categories: - "n<1K" configs: - config_name: default data_files: - split: train path: Metadata/metadata.csv --- # DATASET-MAS: Professional Evaluation of Multi-Agent Systems in Pakistan ### 1. Dataset Summary This dataset captures the first-use experiences of 42 domain-expert professionals in Pakistan with Atypica.ai, a four-stage multi-agent system (MAS) pipeline. It provides a specialized resource for assessing how expert users judge AI credibility, localization, and technical depth in an emerging market context. ### 2. Dataset Composition The dataset is organized by Participant ID (P01–P42) and includes: * 42 AI-Generated Reports (`Reports/` folder): Business intelligence documents produced by the Atypica.ai MAS pipeline. * 42 Anonymized Transcripts (`Transcripts/` folder): Clean text records of semi-structured interviews where experts evaluated the reports. * Metadata (`Metadata/metadata.csv`): Details on the age, gender, professional domain, and years of experience for all 42 participants. ### 3. Annotation Definitions (Data Dictionary) To ensure the analysis is reproducible, the following definitions were used to categorize expert feedback: * Domain error: A factual or logical mistake identified by a participant using their specific professional expertise (e.g., P24 identifying the omission of "Chromite"). * Localization failure: Missing, incorrect, or culturally insensitive information specific to the Pakistani context (e.g., incorrect tax rates or missing local landmarks). *Note: While the broader study also evaluates Efficiency and Process Transparency, "Domain Error" and "Localization Failure" serve as the primary categorical labels for the machine learning benchmark tasks in this dataset.* ### 4. Benchmarking Tasks This dataset supports the following research tasks: 1. Credibility Prediction: Using expert transcripts to predict trust levels in specific AI-generated business outputs. 2. Localization Quality Scoring: Measuring the accuracy of AI-generated cultural, legal, and economic content for non-Western regions. 3. Process Transparency Analysis: Evaluating user disorientation during complex multi-agent reasoning phases. ### 5. Ethics & Privacy * Informed Consent: All 42 participants provided explicit informed consent prior to the study. * Anonymization: All transcripts and reports have been manually scrubbed of real names, company identities, and sensitive contact information. * Institutional Oversight: This research was conducted at the Design AI Lab, College of Design and Innovation, Tongji University, under the supervision of Professor Fan Ling. ### 6. Licensing This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
提供机构:
AhmedMudasir
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作