five

Parsed DMOZ data

收藏
DataCite Commons2025-05-11 更新2025-05-17 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/OMV93V
下载链接
链接失效反馈
官方服务:
资源简介:
DMOZ is a large communally maintained open directory that categorizes web content. The data are posted in a complex XML format. The python scripts posted <a hred="https://github.com/suriyan/dmoz_csv">here</a> were used to parse the data posted at: http://rdf.dmoz.org/ on June 12, 2016 to produce a csv file posted here. The structure of the file is "URL","Category 1","Category 2",.......... Given the categories are separated by commas, doing read_csv without the right options can be problematic Here's some code to read in the file: https://gist.github.com/soodoku/a97e6cf2800429d1c541ac2fb65e4c98
提供机构:
Harvard Dataverse
创建时间:
2016-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作