five

PeytonT/paper_graph

收藏
Hugging Face2026-04-23 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/PeytonT/paper_graph
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Paper Universe Graph viewer: true tags: - datasets - graph - scientific-papers - arxiv - retrieval - embeddings size_categories: - 10M<n<100M configs: - config_name: paper_nodes default: true data_files: - split: train path: "paper_nodes/*.parquet" - config_name: paper_category_edges default: false data_files: - split: train path: "paper_category_edges/*.parquet" - config_name: paper_knn default: false data_files: - split: train path: "paper_knn/*.parquet" - config_name: category_nodes default: false data_files: - split: train path: "category_nodes/*.parquet" - config_name: category_knn default: false data_files: - split: train path: "category_knn/*.parquet" - config_name: topic_nodes default: false data_files: - split: train path: "topic_nodes/*.parquet" - config_name: paper_topic_edges default: false data_files: - split: train path: "paper_topic_edges/*.parquet" - config_name: year_nodes default: false data_files: - split: train path: "year_nodes/*.parquet" - config_name: paper_year_edges default: false data_files: - split: train path: "paper_year_edges/*.parquet" - config_name: paper_embeddings default: false data_files: - split: train path: "paper_embeddings/*.parquet" - config_name: paper_fulltext_embeddings default: false data_files: - split: train path: "paper_fulltext_embeddings/*.parquet" --- # Generated with Research Library: https://github.com/peytontolbert/Research_Library # Paper Universe Graph Dataset Parquet-first export of the paper-universe graph already built under the Repository Library. This dataset preserves: - paper nodes with metadata references and 3D coordinates - paper/category/year/topic graph layers - optional paper-to-paper and category-to-category similarity edges - metadata and full-text paper embedding splits ## Coverage - papers: `1000000` - categories: `156` - years: `19` - topics: `1483957` - embedding dimension: `384` - full-text embeddings included: `true` ## Configs - default viewer/config: `paper_nodes` - `paper_nodes`: `1000000` rows - `paper_category_edges`: `1744492` rows - `paper_knn`: `20000062` rows - `category_nodes`: `156` rows - `category_knn`: `1248` rows - `topic_nodes`: `1483957` rows - `paper_topic_edges`: `3000000` rows - `year_nodes`: `19` rows - `paper_year_edges`: `1000000` rows - `paper_embeddings`: `1000000` rows - `paper_fulltext_embeddings`: `1000000` rows ## Loading ```python from datasets import load_dataset paper_nodes = load_dataset("PeytonT/paper_graph", "paper_nodes") paper_category_edges = load_dataset("PeytonT/paper_graph", "paper_category_edges") paper_knn = load_dataset("PeytonT/paper_graph", "paper_knn") category_nodes = load_dataset("PeytonT/paper_graph", "category_nodes") category_knn = load_dataset("PeytonT/paper_graph", "category_knn") topic_nodes = load_dataset("PeytonT/paper_graph", "topic_nodes") paper_topic_edges = load_dataset("PeytonT/paper_graph", "paper_topic_edges") year_nodes = load_dataset("PeytonT/paper_graph", "year_nodes") paper_year_edges = load_dataset("PeytonT/paper_graph", "paper_year_edges") paper_embeddings = load_dataset("PeytonT/paper_graph", "paper_embeddings") paper_fulltext_embeddings = load_dataset("PeytonT/paper_graph", "paper_fulltext_embeddings") ``` ## Notes - `paper_nodes` stores metadata references and coordinates, not the full paper body. - The original full text remains in the source paper dataset referenced by the manifest. - `paper_embeddings` is the metadata/title+abstract embedding split. - `paper_fulltext_embeddings` is the aggregated full-body embedding split when available. ## Visualization Assets The export includes the local paper-universe visualizations and viewer payload when present: - `manifest.json` - `progress.json` - `render_manifest.json` - `viewer_manifest.json` - `universe_3d.png` - `universe_3d_detailed.png` - `nodes_3d_sample.html` - `universe_3d_hover.html` - `interactive/categories.json` - `interactive/manifest.json` - `interactive/papers_200000.json` - `interactive/papers_50000.json` - `interactive/years.json` ### Universe 3D Overview ![Paper universe 3D overview](./universe_3d.png) ### Universe 3D Detailed View ![Paper universe 3D detailed view](./universe_3d_detailed.png) - [Open the sampled 3D HTML view](./nodes_3d_sample.html) - [Open the interactive hover view](./universe_3d_hover.html) - Interactive viewer payload is included under `interactive/`.
提供机构:
PeytonT
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作