Topic Extraction Dataset
收藏DataCite Commons2024-04-03 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Topic_Extraction_Dataset/25533532
下载链接
链接失效反馈官方服务:
资源简介:
In this dataset, a total of 9691 articles within the medical domain were collected for analysis. Topic extraction was conducted utilizing two distinct methodologies: Textrank and LLM. These approaches were applied in conjunction with the keywords present in the articles, forming the dataset for analysis. The dataset encompasses various fields such as article title, publication year, PMID, keyword listings, topics derived through the Textrank algorithm, and topics identified through LLM.
本数据集共收录9691篇医学领域学术文章以供分析。本次主题提取任务采用两种不同方法:Textrank算法与大语言模型(LLM)。上述方法结合文章自带的关键词开展处理,最终形成本分析所用数据集。本数据集包含多类信息字段,具体包括文章标题、发表年份、PMID、关键词列表、通过Textrank算法提取得到的主题,以及通过大语言模型识别得到的主题。
提供机构:
figshare
创建时间:
2024-04-03



