links.bulk.csv
收藏DataCite Commons2020-08-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/links_bulk_csv/8094362/1
下载链接
链接失效反馈官方服务:
资源简介:
Using a web scraper, we checked the status of internet links detected in scientific papers.<br><br>The data schema is detailed below:<br>* type - In which part of the manuscript the link was found.<br>* journal - Title of the journal where the paper was published.<br>* id - Pubmed's primary identifier for the paper.<br>* year - When the paper was published.<br>* link - URL parsed from the manuscript text.<br>* code - HTTP/FTP status code returned when trying to access the URL. (-1 indicates a timeout.)<br>* flag.uniqueness - Whether the link appears only once in the data. '0' means it is unique.<br>* newtest - The protocol used to determine the status code in our revised pipeline. Only listed for links that were reevaluated.<br>* oldcode - The status recorded for this link prior to the pipeline revision. Only listed for links that were reevaluated.
本研究借助网络爬虫(web scraper)工具,对学术论文中检测到的互联网链接的可用性状态开展了校验工作。
数据集结构详情如下:
* 类型(type):该链接在论文手稿中的出现位置。
* 期刊(journal):论文发表所在期刊的全称。
* 编号(id):该论文在PubMed中的核心唯一标识符。
* 发表年份(year):论文的正式发表年份。
* 链接(link):从论文手稿文本中解析得到的统一资源定位符(URL)。
* 状态码(code):尝试访问该URL时返回的超文本传输协议(HTTP)/文件传输协议(FTP)状态码。其中-1代表访问超时。
* 唯一性标记(flag.uniqueness):该链接在当前数据集中是否仅出现一次,'0' 表示该链接为唯一链接。
* 重新测试协议(newtest):在修订后的处理流程中,用于确定该链接状态码所采用的协议。该字段仅对经过重新评估的链接进行展示。
* 旧状态码(oldcode):在处理流程完成修订前,该链接被记录的原始状态码。该字段仅对经过重新评估的链接进行展示。
提供机构:
figshare
创建时间:
2019-05-08



