Lao News classification
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14967274
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Card for Lao News classification
Lao News classification dataset
This dataset are collected Lao News for News classification from laopost.com.
Dataset Details
Dataset Description
Curated by: Wannaphong Phatthiyaphaibun
Language(s) (NLP): Lao
License: cc-by-4.0
Uses
Direct Use
News classification
Dataset Structure
The dataset is divided into three splits: train, validation, and test. Each split contains news articles, with each article represented as a dictionary with the following fields:
title: The title of the news article (string).
text: The main content of the news article (string).
category: The category of the news article (string).
date: The publication date of the news article (string).
url: The URL of the news article (string).
The dataset is structured as a DatasetDict object, which contains three Dataset objects, one for each split.
The train split contains 9196 news articles.
The validation split contains 3066 news articles.
The test split contains 3066 news articles.
The splits likely represent a standard train/validation/test split, designed for training, evaluating, and testing machine learning models. The exact criteria used to create these splits are not explicitly stated, but are implied to provide a representative distribution of the data.
Categorys
Train
ຂ່າວຕ່າງປະເທດ 3417
ຂ່າວພາຍໃນ 3219
ຂ່າວທ້ອງຖິ່ນ 1307
ຂ່າວເຫດການ 459
ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 404
ຂ່າວບັນເທິງ 240
ຂ່າວທ່ອງທ່ຽວ 150
Validation
ຂ່າວຕ່າງປະເທດ 1163
ຂ່າວພາຍໃນ 1026
ຂ່າວທ້ອງຖິ່ນ 449
ຂ່າວເຫດການ 157
ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 137
ຂ່າວບັນເທິງ 86
ຂ່າວທ່ອງທ່ຽວ 48
Test
ຂ່າວຕ່າງປະເທດ 1185
ຂ່າວພາຍໃນ 1059
ຂ່າວທ້ອງຖິ່ນ 431
ຂ່າວເຫດການ 147
ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 136
ຂ່າວບັນເທິງ 64
ຂ່າວທ່ອງທ່ຽວ 44
Dataset Creation
We are collected news and categorys from laopost.com.
Categorys
ຂ່າວຕ່າງປະເທດ: Foreign news
ຂ່າວພາຍໃນ: Laos internal news
ຂ່າວທ້ອງຖິ່ນ: Local news
ຂ່າວເຫດການ: Event news, such as accidents, crimes, illegal activities
ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ: Health and environmental news
ຂ່າວບັນເທິງ: Entertainment news
ຂ່າວທ່ອງທ່ຽວ: Travel News
Other categories are not collect to this dataset because it has few news in the tag, duplicate categories (Example ອຸບັດເຫດແລະປາກົດການຫຍໍ້ທໍ້ and ຂ່າວເຫດການ), or the tag are out-of-date update in the website (Example ມູມໄອທີລາວ or IT news latest update 22/11/2024 ).
Licensing Information
The dataset is released under the Creative Commons Attribution 4.0 International license. The use of this dataset is also subject to CommonCrawl's Terms of Use.
Citation
If you use this dataset in your project or research, you can cite as follows:
BibTeX:
@dataset{phatthiyaphaibun_2025_14967275,
author = {Phatthiyaphaibun, Wannaphong},
title = {Lao News classification},
month = mar,
year = 2025,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.14967275},
url = {https://doi.org/10.5281/zenodo.14967275},
}
APA:
Phatthiyaphaibun, W. (2025). Lao News classification (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14967275
创建时间:
2025-03-04



