A Curated Dataset of Research Abstracts on AI and Large Language Models in Information Management and Librarianship (2020-2024)

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/hs9hy54hzv

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains 303 curated abstracts of research articles focusing on the application of Artificial Intelligence (AI) and Large Language Models (LLMs) in information management, library science, and related services. The data was systematically collected from the Scopus database for publications between 2020 and 2024. The original search result yielded 498 documents. This was refined to 307 records upon exporting abstract data. A final set of 303 high-quality abstracts was established after a rigorous preprocessing and cleaning pipeline to ensure data integrity and suitability for natural language processing (NLP) and semantic analysis tasks. This collection is ideal for researchers working in: Text Mining and Semantic Clustering Topic Modeling and Research Trend Analysis Natural Language Processing (NLP) Applications Bibliometric and Scientometric Studies AI in Libraries and Information Services Keywords: Artificial Intelligence; Large Language Models; Natural Language Processing; Information Management; Librarianship; Research Abstracts; Text Dataset; Scopus. 2. For the Cluster Assignments File File Name: abstracts_AI_LLM_librarianship_cluster_assignments.csv Title: Semantic Cluster Assignments for a Corpus of AI/LLM in Librarianship Research Abstracts Description: This file provides the semantic cluster assignments for the corresponding dataset "A Curated Dataset of Research Abstracts on AI and Large Language Models in Information Management and Librarianship (2020-2024)". The clusters were generated by applying K-means clustering to average GloVe word embeddings of the article abstracts. The analysis identified 7 distinct, semantically coherent research themes, which are detailed in the associated research article. This data is provided to facilitate reproducibility, further analysis, and to serve as a ground truth for comparative studies in semantic clustering and topic discovery. The file includes the document identifier and its corresponding cluster label (0-6). The dominant themes for each cluster are: Cluster 0: Healthcare Information Systems Cluster 1: Information Retrieval Systems Cluster 2: Research Data Management (RDM) Cluster 3: Digital Library Adoption Cluster 4: Knowledge Management Cluster 5: Reference Services Cluster 6: Scientific Publishing Keywords: Semantic Clustering; K-means; Cluster Labels; Research Themes; Topic Discovery; GloVe Embeddings; Text Mining.

创建时间：

2025-10-01