DSE Capstone - American Gut Project Cohort 2019. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects
收藏DataCite Commons2026-04-17 更新2026-05-06 收录
下载链接:
https://library.ucsd.edu/dc/object/bb2666864s
下载链接
链接失效反馈官方服务:
资源简介:
Abstract:
The American Gut Project (AGP) [1] is the largest citizen crowd-sourced collection of gut microbiome samples available today. Knowledge of the microbiome is in its beginning stages and the enormous amount of organism and gene effects that are ill-understood makes accurately interpreting results difficult. Reducing this high dimensional space with fundamentally different embedding techniques can be effective in capturing different aspects of the microbiome data to aide in research. Dimensionality reduction techniques like Word2Vec, Hyperbolic Embeddings, and Principal Coordinates Analysis (PCoA) were used to reduce a single sample’s dimensionality and explore their different strengths. Embeddings were validated by using them as features for a supervised machine learning model that classifies microbiome body sites (e.g. sebum, feces, saliva). Competing against the state of the art of PCoA using underlying phylogeny distances, the different embeddings kept the baseline logistic regression model’s F1 score within acceptable margins at +/- 0.1. These reduction comparisons included actual dimension sizes, metrics of the model prediction, and a representation of samples’ clusters. This paper will discuss the analysis, architecture, and visualization of the project that approached this main technical challenge of gaining a better understanding of microbiota.
This project was done in the Cohort 4 2017-2019 group for the MAS DSE Master's program. The data used comes from the Rob Knight UCSD Lab and is contained in the Qiita website under study #10317.
This project contains various analyses on microbiome data, survey data, drug data, and diet data. It also contains a Luigi pipeline and a Plotly Dash application for front end usage.
提供机构:
UC San Diego Library Digital Collections
创建时间:
2019-06-06



