five

Language-agnostic Website Embedding and Classification (Curlie dataset)

收藏
DataCite Commons2022-05-02 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Language-agnostic_Website_Embedding_and_Classification_Curlie_dataset_/16621669/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>curlie_labels.csv.gz </b>&gt; <i>curlie scrapped content with urls, category path and language</i><b>curlie_labels_langs_matchings.json.gz </b>&gt; <i>matchings between english categories and other languages</i><b>html_content.json.gz </b>&gt; <i>content of the html files for retained websites (homepage, accessible, non-regional)</i><b>enriched_curlie_emb_and_pred.gz</b> &gt; <i>embedding and probabilities vector for retained websites</i><br>For the probability vector, the index is defined by the alphabetical order:<br>Arts: 0Business: 1Computers: 2Games: 3Health: 4Home: 5Kids_and_Teens: 6News: 7Recreation: 8Reference: 9Science: 10Shopping: 11Society: 12Sports: 13<br><br>
提供机构:
figshare
创建时间:
2022-01-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作