five

survey_social_value_th2025

收藏
魔搭社区2025-08-08 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/scb10x/survey_social_value_th2025
下载链接
链接失效反馈
官方服务:
资源简介:
# Social Attitudes and Values Survey Dataset This dataset contains survey questions and responses designed to explore social attitudes and values among people in Thailand in 2025. It includes a comprehensive set of carefully crafted questions and collected responses aimed at facilitating research on social perspectives, values, cultural attitudes, as well as crowdsourcing algorithm research. This dataset was used to evaluate our proposed crowdsourcing algorithm [[link to the paper to be updated]](https://arxiv.org/). ## Dataset Structure The dataset contains 1,000 examples (with 20 questions repeated for quality control). The data was constructed in English (using contexts relevant to the Thai culture) and then translated into Thai for Thai respondents. There are two splits: - English (en): 1,020 examples - Thai (th): 1,020 examples (the translated version which was presented to respondents) Note that the column `responses` are identical in both splits. ### Features: - `Q_ID` (string): Unique identifier for each question. - `context` (string): Contextual information provided for each question. - `question` (string): The survey question. - `scale` (string): Descriptions of the -5, 0, +5 scale points on an 11-point scale. - `responses` (list of integers): Collected responses from 39 workers* (in the same order from `respondent1`, ..., `respondent39`). - `repeat` (string): Indicates if a question was repeated for quality control purposes. If the value is not `None`, the value is `Q_ID` of the original question. *Originally, 40 workers were recruited but 1 worker did not follow the instruction, hence, their responses are not included in this dataset ## Dataset Creation Process ### Question Generation - **Topic Selection**: Initial topics (420 in total) were sourced from the Pew Research Center. Topics were evaluated by the Claude 3.5 Sonnet model based on clarity, general knowledge required, and sensitivity. Topics with ratings ≥ 3 (out of 5) were retained, resulting in 301 topics. - **Question Generation**: 10 survey questions were generated per topic using Claude 3.5 Sonnet, yielding 3,010 initial questions. Each data point includes context, a question, and descriptions for scores of -5, 0, and +5. - **Question Refinement**: Questions underwent further evaluation for clarity, relevance, accuracy, and neutrality of scale descriptions, narrowing the final set down to 1,000 unique questions. ### Data Annotation and Collection - **Worker Annotation**: Responses were collected from 40 recruited Thai annotators, predominantly students and freelancers. Annotators responded to 1,020 questions (1,000 unique + 20 repeated) presented in Thai via Google Forms. - **Quality Control**: - Workers were instructed to limit zero-score responses to under 10%. - Repeated questions were included to assess response consistency. - Data from annotators failing quality guidelines was excluded. ## Usage and Applications The dataset is suitable for research in: - Crowdsourcing Algorithm - Statistical Analysis for Cross-cultural Studies, Opinion mining, etc ## Dataset Statistics - **Total Size**: ~1.7 MB - **Download Size**: ~420 KB - **Number of Examples**: 1,020 - **Numer of Respondents**: 39 ## Citation If you use this dataset, please cite our paper: ```bibtex @inproceedings{your_paper_citation, title={Your Paper Title}, author={Authors}, booktitle={Conference or Journal}, year={Year} } ``` If you have any queries about the dataset, please contact potsawee@scb10x.com. ## Acknowledgments Thanks to Adisai Na-Thalang and Chanakan Wittayasakpan for facilitating the data collection from respondents. ## Licensing This dataset is licensed under a [Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) license. You are free to share, adapt, and use this dataset for any purpose, even commercially, provided that appropriate credit is given.

# 社会态度与价值观调查数据集 本数据集收录了用于探究2025年泰国民众社会态度与价值观的调查问卷及作答结果。数据集包含一系列精心设计的问卷题目与回收的有效作答,旨在为社会视角、价值观、文化态度相关研究,以及众包算法(Crowdsourcing Algorithm)研究提供支撑。本数据集曾用于评估我们提出的众包算法[[待更新论文链接]](https://arxiv.org/)。 ## 数据集结构 本数据集共包含1000条有效样本(另有20道重复题目用于质量管控)。数据先以英文构建(采用贴合泰国文化的语境),随后译为泰语供泰国受访者作答。数据集分为两个分支: - 英文分支(en):共1020条样本 - 泰文分支(th):共1020条样本(即面向受访者展示的翻译版本) 需注意,两个分支的`responses`字段内容完全一致。 ### 字段说明: - `Q_ID`(字符串型):每道题目的唯一标识符。 - `context`(字符串型):每道题目附带的背景信息。 - `question`(字符串型):调查问卷的具体题目。 - `scale`(字符串型):11分量表中-5、0、+5三个刻度点的说明文本。 - `responses`(整数列表):39名标注者的回收作答结果(按`respondent1`到`respondent39`的顺序排列)。 - `repeat`(字符串型):标记该题目是否为用于质量管控的重复题目。若该字段非`None`,则其值为对应原始题目的`Q_ID`。 *注:最初招募了40名标注者,但其中1名未遵循作答要求,因此其作答未被纳入本数据集。 ## 数据集构建流程 ### 题目生成流程 - **主题筛选**:初始主题共420个,源自皮尤研究中心(Pew Research Center)。由Claude 3.5 Sonnet模型基于清晰度、所需通识知识水平与敏感性对主题进行评分,保留评分≥3(满分5分)的主题,最终筛选出301个主题。 - **题目生成**:针对每个主题,使用Claude 3.5 Sonnet模型生成10道调查问卷,共得到3010道初始题目。每条数据均包含背景信息、具体题目以及-5、0、+5三个分数对应的刻度说明。 - **题目优化**:对题目从清晰度、相关性、准确性以及刻度说明的中立性进行二次评估,最终筛选出1000道独立题目。 ### 数据标注与采集流程 - **标注者作答**:从招募的40名泰国标注者处回收作答结果,标注者主要为学生与自由职业者。标注者通过Google Forms完成1020道泰语题目的作答(含1000道独立题目与20道重复题目)。 - **质量管控**: - 要求标注者将零分作答的比例控制在10%以内; - 加入重复题目以评估作答一致性; - 剔除未达到质量规范的标注者的数据。 ## 应用场景 本数据集适用于以下方向的研究: - 众包算法(Crowdsourcing Algorithm)研究 - 跨文化研究统计分析、观点挖掘(Opinion Mining)等领域。 ## 数据集统计信息 - **总大小**:约1.7 MB - **下载大小**:约420 KB - **样本总数**:1020条 - **有效作答者数量**:39名 ## 引用规范 若您使用本数据集,请引用我们的论文: bibtex @inproceedings{your_paper_citation, title={Your Paper Title}, author={Authors}, booktitle={Conference or Journal}, year={Year} } 若您对本数据集有任何疑问,请联系邮箱:potsawee@scb10x.com。 ## 致谢 感谢Adisai Na-Thalang与Chanakan Wittayasakpan协助完成受访者数据采集工作。 ## 授权协议 本数据集采用[知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International,CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)进行授权。 您可自由共享、改编并将本数据集用于任何用途,包括商业用途,但需注明恰当的原作者署名。
提供机构:
maas
创建时间:
2025-05-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作