Climate impacts to inland fishes: Topic modeling of literature trends analysis script
收藏DataCite Commons2023-08-08 更新2024-07-13 收录
下载链接:
https://www.sciencebase.gov/catalog/item/64c93cdbd34e70357a34c0cd
下载链接
链接失效反馈官方服务:
资源简介:
This script applies topic modeling to analyze literature trends of climate impacts to inland fish based on the papers within the Fish and Climate Change Database (FiCli, DOI: 10.5066/P9973SMC). Sections 1-8 loaded the .bib file with all of the papers in the database and cleaned the text. This included combining the title/abstract/keywords, removing non-informative words, stemming words, removing punctuation, and forming phrases (ie. climate change to climate_change). Sections 9-10 divided the papers into discrete topics by identifying the ideal number of topics and then using Latent Dirichlet Allocation (LDA) modeling and Gibbs sampling to assign topics to each paper. Sections 11-17 analyzed the topic modeling results and generated figures for the paper. Section 11 produced a graph of paper counts by year along with information about whether the paper described documented or projected climate change impacts. This included loading projected/documented data from FiCli, calculating the ratio of documented papers over time, and graphing the results. The graph also featured lines representing major climate change policy milestones. Section 12 plotted topic count over time as percent stacked bar plot with lines representing policy milestones. This plot was used in combination with Section 11 to see how policy changes corresponded to changes in the literature. Section 13 calculated the word weight matrix (weight of each word in each topic), article weight matrix (weight of each topic in each paper), and distance matrices (distance between words/articles of each topic pair). Section 14 identified topic similarity based on word distributions by plotting the word weight distance matrix using non-metric multidimensional scaling (NMDS). Closer points had more similar word distributions, points were sized by paper count, and points were colored by documented/projected. Section 15 identified topics as general or specific based on the article weight matrix, with specific papers having a greater weight for one topic. Points were colored and sized similarly. Section 16 identified research gaps between topics by using the word weight and article weight distance matrices to plot the results as a heatmap. Topic gaps had high word and article distances. Section 17 determined how topic weights varied over time by calculating the mean change in topic weight between successive years and plotting the results as a bar plot.
提供机构:
National and Regional Climate Adaptation Science Centers
创建时间:
2023-08-01



