Applications of Modern Machine Learning Approaches to Address Real World Problems
收藏DataCite Commons2024-11-11 更新2025-04-17 收录
下载链接:
https://curate.nd.edu/articles/dataset/Applications_of_Modern_Machine_Learning_Approaches_to_Address_Real_World_Problems/26131414/1
下载链接
链接失效反馈官方服务:
资源简介:
In an era with increasingly available complex real-world data, various quantitative methods have been developed to extract valuable insights from the data. Machine learning (ML) techniques have proven to be instrumental in modeling such intricate data. This dissertation encompasses a collection of illustrative examples that showcase the efficacy of ML models in analyzing data from various domains.
First, I introduce an innovative l0 regularization technique, coupled with Tucker decomposition, in the framework of tensor regression (TR) and apply it to simulated linear, binomial, and Poisson data and a real human face image dataset for age prediction. The results suggest improved predictions by TR with l0 regularization compared to other decomposition-based TR approaches, with or without regularization, while also being able to identify important predictors.
Second, I investigate the shift in sentiment during and after the COVID-19 pandemic utilizing college subreddit data and examine the effects of different community-level factors on the sentiment. A pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) was used to learn text embedding from the Reddit messages, and a graph attention network (GAT) was leveraged to learn the relational information among posted messages. I applied model stacking to combine the prediction
probabilities from RoBERTa and GAT to yield the final classification on sentiment and used a generalized
linear mixed-effects model to estimate the effects of various covariates. It's found that the odds of negative sentiments in years 2020, 2021, and 2022 increased statistically significantly compared to the year 2019, with the year 2020 having the highest increase. Factors including in-person learning, larger enrollment numbers, being public rather than private school, and very high research activities also increase the odds of negative sentiments statistically significantly.
Third, my collaborators and I leverage the CodeBERT model to predict simulated running time within gem5, a simulation framework for various computer architecture configurations. We generate a dataset that contains both C code scripts and their simulated running time in gem5. We applied the CodeBERT model in three distinct ways to predict the simulated running time and achieved a mean absolute error of 0.546 in regression, and an accuracy of 0.696 in classification. To our knowledge, this is the first work that uses ML models to predict gem5 simulation metrics.
In summary, the work in this dissertation and the findings from each of the three projects demonstrate the effectiveness of various ML techniques in different learning tasks using real-world data of different types in various domains.
提供机构:
University of Notre Dame
创建时间:
2024-07-17



