School Leavers Study for Latent Code Identification Replication
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://data.mendeley.com/datasets/gzhfdtmhcm
下载链接
链接失效反馈官方服务:
资源简介:
Essay database for replication purposes for the study: Latent Code Identification [LACOID]: A Machine Learning-Based Integrative Framework [and Open-Source Software] to Classify Big Textual Data, Rebuild Contextualized/Unaltered Meanings, and Avoid Aggregation Bias
Manuel S. González Canché
Accepted version available here: https://drive.google.com/file/d/17uADm0_7ygT2FhYp_2krRrXS-Kj3sjXh/view?usp=sharing
Accepted version available here: https://drive.google.com/file/d/17uADm0_7ygT2FhYp_2krRrXS-Kj3sjXh/view?usp=sharing
Labeling or classifying textual data and qualitative evidence is an expensive and consequential challenge. The rigor and consistency behind the construction of these labels ultimately shape research findings and conclusions. A multifaceted methodological conundrum to address this challenge is the need for human reasoning for classification that leads to deeper and more nuanced understandings; however, this same manual human classification comes with the well-documented increase in classification inconsistencies and errors, particularly when dealing with vast amounts of documents and teams of coders. An alternative to human coding consists of machine learning-assisted techniques. These data science and visualization techniques offer tools for data classification that are cost-effective and consistent but are prone to losing participants’ meanings or voices for two main reasons: (a) these classifications typically aggregate all text inputs into a single topic or code and (b) these words configuring texts are analyzed outside of their original contexts. To address this challenge and analytic conundrum, we present an analytic framework and software tool, that addresses the following question: How to classify vast amounts of qualitative evidence effectively and efficiently without losing context or the original voices of our research participants and while leveraging the nuances that human reasoning bring to the qualitative and mixed methods analytic tables? This framework mirrors the line-by-line coding employed in human/manual code identification but relying on machine learning to classify texts in minutes rather than months. The resulting outputs provide complete transparency of the classification process and aid to recreate the contextualized, original, and unaltered meanings embedded in the input documents, as provided by our participants. We offer access to the textual database required to replicate all the analyses. We hope this opportunity to become familiar with the analytic framework and software, may result in expanded access of data science tools to analyze qualitative evidence.
Replication steps and outcomes (pages 12 and 13 in the paper)
First download and extract the data from this repository
创建时间:
2022-11-18



