Machine Learning Articles Extracted from Google Scholar
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14063123
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was created as part of a web scraping practice aimed at capturing academic information from Google Scholar. It contains data on Machine Learning research articles, including the article's title, authors, summary, direct link, citation count, and APA reference. This dataset was collected using Python and Selenium to develop skills in web scraping tools for extracting data from websites with dynamic content.
The dataset was generated specifically as part of an academic exercise to learn and apply web scraping techniques, without a deep analysis intent for the data obtained. This dataset is intended as a resource for learning and evaluating the methods used in web data collection.
Included Fields:
title: Title of the research article.
link: Direct link to the article.
authors: Names of the article’s authors.
description: Summary or brief description of the article.
citations: Number of times the article has been cited on Google Scholar.
APA_citation: APA-formatted citation of the article.
This dataset was created solely for educational purposes and to demonstrate the application of web scraping techniques in a controlled environment.
本数据集系为开展网页抓取实践而构建,旨在从谷歌学术(Google Scholar)中获取学术信息。其收录机器学习领域学术论文的相关数据,涵盖论文标题、作者、摘要、直接访问链接、被引频次以及APA格式参考文献。本数据集通过Python与Selenium工具采集完成,旨在练习从动态内容网站中抓取数据的网页抓取技术实操能力。
本数据集专为学术练习打造,用于学习与应用网页抓取技术,未对采集得到的数据进行深度分析,仅作为学习与评估网页数据采集方法的教学资源。
收录字段:
title:研究论文标题
link:论文的直接访问链接
authors:论文作者姓名
description:论文摘要或简要说明
citations:该论文在谷歌学术(Google Scholar)中的被引频次
APA_citation:该论文的APA格式参考文献
本数据集仅用于教学目的,以及展示网页抓取技术在受控环境中的应用场景。
创建时间:
2024-11-11



