sdananya/wiki_data_with_label_chunk_74
收藏Hugging Face2025-02-13 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/sdananya/wiki_data_with_label_chunk_74
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了网页文章的相关信息,如文章标题(title)、文章内容(text)、文章URL(url)、文章在维基百科中的ID(wiki_id)、浏览量(views)、段落的ID(paragraph_id)、语言种类(langs)、嵌入表示(emb)、关键词(keywords)、标签(labels)和分类(categories)。数据集被划分为训练集,大小为4488716字节,共有1000个示例。这些特征表明数据集可能用于文本分类、信息检索或自然语言处理相关任务。
The dataset contains information related to web articles, such as article title (title), article content (text), article URL (url), article ID in Wikipedia (wiki_id), page views (views), paragraph ID (paragraph_id), language type (langs), embedded representation (emb), keywords (keywords), labels (labels), and categories (categories). The dataset is split into a training set, which is 4488716 bytes in size and contains 1000 examples. These features suggest that the dataset may be used for text classification, information retrieval, or natural language processing related tasks.
提供机构:
sdananya



