Sources for a reproducible IT blog corpus
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4569733
下载链接
链接失效反馈官方服务:
资源简介:
The dataset entail homepages for several hundred IT-blogs and websites which have been hand-picked with the intention to represent discourses dedicated to questions at the intersection of technology and society from Germany and the United States.
The corresponding text collection can be reproduced with a method to duplicate the data by updating its contents and downloading it to the user’s local machine: see https://zenodo.org/record/4552529 and https://github.com/adbar/trafilatura.
Online searches on the text corpus are also available: https://www.dwds.de/d/korpora/it_blogs
Paper "A Reproducible IT-Blog Corpus": doi.org/10.5334/johd.35
创建时间:
2022-03-17



