five

Dataset on Bibliographic, Textual, and Embedding Data for General Relativity and Gravitation Publications (1911–2000)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14581502
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Overview This dataset supplements the paper “Trajectories of Change: Approaches for Tracking Knowledge Evolution,” currently under review. It includes bibliographic, textual, and embedding data for 180,785 publications in General Relativity and Gravitation (GRG), spanning 1911 to 2000 and is based on the NASA/ADS. The file is in Parquet format with 33 columns. Usage The dataset is directly compatible with the UnigramKLD and EmbeddingDensities classes of the semanticlayertools Python package. Data Structure Column Format Description Example Bibcode string Unique publication identifier. "1995PASP..107..803U" Author string Authors listed as comma-separated names. "Urry CM, Padovani P" Title string Title of the publication. "Unified Schemes for Radio-Loud Active Galactic Nuclei" Title_en string Title translated into English. "Unified Schemes for Radio-Loud Active Galactic Nuclei" Year integer Year of publication. 1995 Journal string Journal name. "Publications of the Astronomical Society of the Pacific" Journal Abbreviation string Abbreviated journal name. "PASP" Volume string Volume number (if applicable). "107" Issue string Issue number (if applicable). "19" First Page string Starting page. "803" Last Page string Ending page. "25" Abstract string Abstract text. "The appearance of active galactic nuclei (AGN) depends strongly on orientation, dominating classification..." Abstract_en string Abstract translated into English. "The appearance of active galactic nuclei (AGN) depends strongly on orientation, dominating classification..." Keywords string Comma-separated keywords. "galaxies: active, galaxies: fundamental parameters, astrophysics" DOI string Digital Object Identifier. "10.1086/133630" Affiliation string Author affiliations. "AA(University of XYZ), AB(-)" Category string Publication type (e.g., article, book). "article" Citation Count float Number of citations. 4380.0 References array of strings List of cited Bibcodes. ["1966Natur.209..751H", "1966Natur.211..468R", "1968ApJ...151..393S"] PDF_URL string Link to the publication PDF. "https://ui.adsabs.harvard.edu/link_gateway/1995PASP..107..803U/ADS_PDF" Title_lang string Language of the title. "en" Abstract_lang string Language of the abstract. "en" full_text string Full text of the publication (where available). "Unified Schemes for Radio-Loud Active Galactic Nuclei. The appearance of AGN depends so strongly on..." tokens array of strings Tokenized text of the title and abstract for computational analysis. ["unify", "schemes", "radio", "loud", "active", "galactic", "nuclei"] UMAP-1 float32 UMAP embedding coordinate 1. 10.423940 UMAP-2 float32 UMAP embedding coordinate 2. 7.890975 Cluster integer Cluster label for topic modeling or grouping. 15 Name string Descriptive cluster name. "15_radio_quasars_sources_galaxies" KeyBERT string Key phrases extracted via KeyBERT. "radio galaxies, high redshift, radio sources, optical imaging" OpenAI string Embedding-based descriptive phrases. "Cosmological Evolution of Radio-Loud Quasars" MMR string Extracted key phrases using Maximal Marginal Relevance (MMR). "quasars, radio sources, redshift, luminosity, star formation" POS string Key terms extracted via part-of-speech tagging. "radio, quasars, sources, galaxies, redshift, optical" full_embeddings array of floats Text embeddings generated using OpenAI's text-embedding-3-large model. "[ 0.01164897 -0.00343577 -0.03168862 ... 0.00237622]"
创建时间:
2024-12-31
二维码
社区交流群
二维码
科研交流群
商业服务