AhmedMudasir/MAS-DATASET
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/AhmedMudasir/MAS-DATASET
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-classification
language:
- en
- ur
tags:
- multi-agent-system
- human-evaluation
- Pakistan
- credibility
- professional-ux
pretty_name: DATASET-MAS
size_categories:
- "n<1K"
configs:
- config_name: default
data_files:
- split: train
path: Metadata/metadata.csv
---
# DATASET-MAS: Professional Evaluation of Multi-Agent Systems in Pakistan
### 1. Dataset Summary
This dataset captures the first-use experiences of 42 domain-expert professionals in Pakistan with Atypica.ai, a four-stage multi-agent system (MAS) pipeline. It provides a specialized resource for assessing how expert users judge AI credibility, localization, and technical depth in an emerging market context.
### 2. Dataset Composition
The dataset is organized by Participant ID (P01–P42) and includes:
* 42 AI-Generated Reports (`Reports/` folder): Business intelligence documents produced by the Atypica.ai MAS pipeline.
* 42 Anonymized Transcripts (`Transcripts/` folder): Clean text records of semi-structured interviews where experts evaluated the reports.
* Metadata (`Metadata/metadata.csv`): Details on the age, gender, professional domain, and years of experience for all 42 participants.
### 3. Annotation Definitions (Data Dictionary)
To ensure the analysis is reproducible, the following definitions were used to categorize expert feedback:
* Domain error: A factual or logical mistake identified by a participant using their specific professional expertise (e.g., P24 identifying the omission of "Chromite").
* Localization failure: Missing, incorrect, or culturally insensitive information specific to the Pakistani context (e.g., incorrect tax rates or missing local landmarks).
*Note: While the broader study also evaluates Efficiency and Process Transparency, "Domain Error" and "Localization Failure" serve as the primary categorical labels for the machine learning benchmark tasks in this dataset.*
### 4. Benchmarking Tasks
This dataset supports the following research tasks:
1. Credibility Prediction: Using expert transcripts to predict trust levels in specific AI-generated business outputs.
2. Localization Quality Scoring: Measuring the accuracy of AI-generated cultural, legal, and economic content for non-Western regions.
3. Process Transparency Analysis: Evaluating user disorientation during complex multi-agent reasoning phases.
### 5. Ethics & Privacy
* Informed Consent: All 42 participants provided explicit informed consent prior to the study.
* Anonymization: All transcripts and reports have been manually scrubbed of real names, company identities, and sensitive contact information.
* Institutional Oversight: This research was conducted at the Design AI Lab, College of Design and Innovation, Tongji University, under the supervision of Professor Fan Ling.
### 6. Licensing
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
提供机构:
AhmedMudasir



