snats/url-classifications
收藏Hugging Face2024-08-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/snats/url-classifications
下载链接
链接失效反馈官方服务:
资源简介:
URL分类数据集是一个包含PDF文档URL分类的集合,主要源自SafeDocs语料库。数据集包含多个CSV文件,涵盖了不同子集的分类数据,包括原始数据和经过处理的数据。数据集支持的任务包括文本分类、基于URL的文档分类和PDF内容推断。数据集中主要包含英文内容和分类标签。数据集的结构包括多个CSV文件,每个文件包含URL和分类标签两个主要字段。数据集没有官方的训练/验证/测试集划分,用户需要根据自己的需求进行划分。
The URL Classifications Dataset is a collection of URL classifications for PDF documents, primarily derived from the SafeDocs corpus. It contains multiple CSV files with different subsets of classifications, including both raw and processed data. The dataset supports tasks such as text classification, URL-based document classification, and PDF content inference. The dataset primarily contains English language content and classification labels. The dataset structure includes several CSV files, each containing the URL and classification label as the main fields. The data is not officially split into train/validation/test sets, and users are encouraged to create their own splits based on their specific needs.
提供机构:
snats



