five

Boredoom17/Nepali-Flow-Formal

收藏
Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Boredoom17/Nepali-Flow-Formal
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Nepali-Flow-Formal task_categories: - text-generation - text-classification - other language: - ne tags: - nepali - corpus - formal - news - devanagari - low-resource license: other size_categories: - 1M<n<10M --- # Nepali-Flow-Formal ## What's This? This dataset has formal Nepali writing—the kind you'd find in news articles, encyclopedias, and research papers. Good for training language models on clear, well-written Nepali. ## What's Inside **6,735,808 rows** from three places: - IRIISNEPAL dataset (MIT license) - Nepali Wikipedia - Nepali news outlets (Kantipur, Setopati, etc.) Mostly in Devanagari script. Formal writing—no slang or memes. ## Schema - text - source - domain - script - lang - date_collected - license Notes: - domain values include formal, encyclopedia, and news - script is predominantly devanagari ## Construction Notes - Source material is normalized to text rows. - Duplicate and malformed records are reduced during preprocessing. - Metadata is preserved to enable source-aware and license-aware filtering. ## Good For - Training language models on formal Nepali - News classification and topic modeling - Tasks where proper grammar matters - Understanding how formal Nepali differs from casual speech ## Fair Warnings - Mostly news articles—might not help with casual speech understanding - Some articles lean toward specific outlets' editorial styles - A few random formatting hiccups might be hiding in there ## License Statement Mixed-source formal aggregate: - MIT (IRIISNEPAL) - CC BY-SA 4.0 (Wikipedia) - source-dependent (scraped news) ## How to Cite If you use this in research, cite it like: ``` Aadarsha Chhetri. (2026). Nepali-Flow-Formal. https://huggingface.co/datasets/Boredoom17/Nepali-Flow-Formal ``` Or in BibTeX: ```bibtex @dataset{aadarsha2026formal, author = {Aadarsha Chhetri}, title = {Nepali-Flow-Formal}, year = {2026}, url = {https://huggingface.co/datasets/Boredoom17/Nepali-Flow-Formal} } ```
提供机构:
Boredoom17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作