Dataset on Bibliographic, Textual, and Embedding Data for General Relativity and Gravitation Publications (1911–2000)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14581502
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Overview
This dataset supplements the paper “Trajectories of Change: Approaches for Tracking Knowledge Evolution,” currently under review. It includes bibliographic, textual, and embedding data for 180,785 publications in General Relativity and Gravitation (GRG), spanning 1911 to 2000 and is based on the NASA/ADS. The file is in Parquet format with 33 columns.
Usage
The dataset is directly compatible with the UnigramKLD and EmbeddingDensities classes of the semanticlayertools Python package.
Data Structure
Column
Format
Description
Example
Bibcode
string
Unique publication identifier.
"1995PASP..107..803U"
Author
string
Authors listed as comma-separated names.
"Urry CM, Padovani P"
Title
string
Title of the publication.
"Unified Schemes for Radio-Loud Active Galactic Nuclei"
Title_en
string
Title translated into English.
"Unified Schemes for Radio-Loud Active Galactic Nuclei"
Year
integer
Year of publication.
1995
Journal
string
Journal name.
"Publications of the Astronomical Society of the Pacific"
Journal Abbreviation
string
Abbreviated journal name.
"PASP"
Volume
string
Volume number (if applicable).
"107"
Issue
string
Issue number (if applicable).
"19"
First Page
string
Starting page.
"803"
Last Page
string
Ending page.
"25"
Abstract
string
Abstract text.
"The appearance of active galactic nuclei (AGN) depends strongly on orientation, dominating classification..."
Abstract_en
string
Abstract translated into English.
"The appearance of active galactic nuclei (AGN) depends strongly on orientation, dominating classification..."
Keywords
string
Comma-separated keywords.
"galaxies: active, galaxies: fundamental parameters, astrophysics"
DOI
string
Digital Object Identifier.
"10.1086/133630"
Affiliation
string
Author affiliations.
"AA(University of XYZ), AB(-)"
Category
string
Publication type (e.g., article, book).
"article"
Citation Count
float
Number of citations.
4380.0
References
array of strings
List of cited Bibcodes.
["1966Natur.209..751H", "1966Natur.211..468R", "1968ApJ...151..393S"]
PDF_URL
string
Link to the publication PDF.
"https://ui.adsabs.harvard.edu/link_gateway/1995PASP..107..803U/ADS_PDF"
Title_lang
string
Language of the title.
"en"
Abstract_lang
string
Language of the abstract.
"en"
full_text
string
Full text of the publication (where available).
"Unified Schemes for Radio-Loud Active Galactic Nuclei. The appearance of AGN depends so strongly on..."
tokens
array of strings
Tokenized text of the title and abstract for computational analysis.
["unify", "schemes", "radio", "loud", "active", "galactic", "nuclei"]
UMAP-1
float32
UMAP embedding coordinate 1.
10.423940
UMAP-2
float32
UMAP embedding coordinate 2.
7.890975
Cluster
integer
Cluster label for topic modeling or grouping.
15
Name
string
Descriptive cluster name.
"15_radio_quasars_sources_galaxies"
KeyBERT
string
Key phrases extracted via KeyBERT.
"radio galaxies, high redshift, radio sources, optical imaging"
OpenAI
string
Embedding-based descriptive phrases.
"Cosmological Evolution of Radio-Loud Quasars"
MMR
string
Extracted key phrases using Maximal Marginal Relevance (MMR).
"quasars, radio sources, redshift, luminosity, star formation"
POS
string
Key terms extracted via part-of-speech tagging.
"radio, quasars, sources, galaxies, redshift, optical"
full_embeddings
array of floats
Text embeddings generated using OpenAI's text-embedding-3-large model.
"[ 0.01164897 -0.00343577 -0.03168862 ... 0.00237622]"
创建时间:
2024-12-31



