five

Lao News classification

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14967274
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Card for Lao News classification Lao News classification dataset This dataset are collected Lao News for News classification from laopost.com. Dataset Details Dataset Description Curated by: Wannaphong Phatthiyaphaibun Language(s) (NLP): Lao License: cc-by-4.0 Uses Direct Use News classification Dataset Structure The dataset is divided into three splits: train, validation, and test. Each split contains news articles, with each article represented as a dictionary with the following fields: title: The title of the news article (string). text: The main content of the news article (string). category: The category of the news article (string). date: The publication date of the news article (string). url: The URL of the news article (string). The dataset is structured as a DatasetDict object, which contains three Dataset objects, one for each split. The train split contains 9196 news articles. The validation split contains 3066 news articles. The test split contains 3066 news articles. The splits likely represent a standard train/validation/test split, designed for training, evaluating, and testing machine learning models. The exact criteria used to create these splits are not explicitly stated, but are implied to provide a representative distribution of the data. Categorys Train ຂ່າວຕ່າງປະເທດ 3417 ຂ່າວພາຍໃນ 3219 ຂ່າວທ້ອງຖິ່ນ 1307 ຂ່າວເຫດການ 459 ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 404 ຂ່າວບັນເທິງ 240 ຂ່າວທ່ອງທ່ຽວ 150 Validation ຂ່າວຕ່າງປະເທດ 1163 ຂ່າວພາຍໃນ 1026 ຂ່າວທ້ອງຖິ່ນ 449 ຂ່າວເຫດການ 157 ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 137 ຂ່າວບັນເທິງ 86 ຂ່າວທ່ອງທ່ຽວ 48 Test ຂ່າວຕ່າງປະເທດ 1185 ຂ່າວພາຍໃນ 1059 ຂ່າວທ້ອງຖິ່ນ 431 ຂ່າວເຫດການ 147 ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ 136 ຂ່າວບັນເທິງ 64 ຂ່າວທ່ອງທ່ຽວ 44 Dataset Creation We are collected news and categorys from laopost.com. Categorys ຂ່າວຕ່າງປະເທດ: Foreign news ຂ່າວພາຍໃນ: Laos internal news ຂ່າວທ້ອງຖິ່ນ: Local news ຂ່າວເຫດການ: Event news, such as accidents, crimes, illegal activities ສຸຂະພາບ ແລະ ສີ່ງແວດລ້ອມ: Health and environmental news ຂ່າວບັນເທິງ: Entertainment news ຂ່າວທ່ອງທ່ຽວ: Travel News Other categories are not collect to this dataset because it has few news in the tag, duplicate categories (Example ອຸບັດເຫດແລະປາກົດການຫຍໍ້ທໍ້ and ຂ່າວເຫດການ), or the tag are out-of-date update in the website (Example ມູມໄອທີລາວ or IT news latest update 22/11/2024 ). Licensing Information The dataset is released under the Creative Commons Attribution 4.0 International license. The use of this dataset is also subject to CommonCrawl's Terms of Use. Citation If you use this dataset in your project or research, you can cite as follows: BibTeX: @dataset{phatthiyaphaibun_2025_14967275, author = {Phatthiyaphaibun, Wannaphong}, title = {Lao News classification}, month = mar, year = 2025, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.14967275}, url = {https://doi.org/10.5281/zenodo.14967275}, } APA: Phatthiyaphaibun, W. (2025). Lao News classification (1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.14967275
创建时间:
2025-03-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作