Boredoom17/Nepali-Flow-Roman
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Boredoom17/Nepali-Flow-Roman
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Nepali-Flow-Roman
task_categories:
- text-classification
- translation
language:
- ne
tags:
- nepali
- roman-nepali
- romanized
- transliteration
- social-media
- low-resource
license: cc-by-4.0
size_categories:
- 100K<n<1M
---
# Nepali-Flow-Roman
## What's This?
Nepali written in Latin letters. How people type when they don't have a Nepali keyboard.
**307,999 comments** from YouTube, all in Latin script.
## Why It Matters
If you're building a chatbot or NLP tool for Nepali, you need to handle Roman input. This dataset helps.
## Good For
- Models that understand Roman Nepali
- Roman → Devanagari converters
- Chatbots
- Understanding how people type
## Heads Up
- Spelling is all over the place (no standard way to romanize)
- Mix of languages sometimes
- Real YouTube data—unfiltered
## How to Cite
```
Aadarsha Chhetri. (2026). Nepali-Flow-Roman. https://huggingface.co/datasets/Boredoom17/Nepali-Flow-Roman
```
---
---
pretty_name: 尼泊尔语流罗马化数据集(Nepali-Flow-Roman)
task_categories:
- 文本分类
- 机器翻译
language:
- 尼泊尔语(ne)
tags:
- 尼泊尔语
- 罗马化尼泊尔语
- 罗马化
- 转写(transliteration)
- 社交媒体
- 低资源语言
license: CC BY 4.0
size_categories:
- 10万<样本量<100万
---
# 尼泊尔语流罗马化数据集(Nepali-Flow-Roman)
## 数据集简介
本数据集收录采用拉丁字母书写的尼泊尔语文本,对应用户未配备尼泊尔语键盘时的日常输入场景。
数据集包含来自YouTube的**307,999条评论**,全部采用拉丁字母书写。
## 数据集价值
若你正在开发面向尼泊尔语的聊天机器人或自然语言处理工具,需支持罗马化尼泊尔语输入,本数据集可为此提供有效支撑。
## 适用场景
- 可用于理解罗马化尼泊尔语的模型训练
- 罗马化尼泊尔语→天城文(Devanagari)转换工具开发
- 聊天机器人开发
- 研究用户的实际输入习惯
## 注意事项
- 拼写规范性参差不齐,目前尚无统一的罗马化标准
- 文本中偶尔混杂其他语言
- 数据直接取自YouTube原生评论,未经过滤处理
## 引用格式
Aadarsha Chhetri. (2026). Nepali-Flow-Roman. https://huggingface.co/datasets/Boredoom17/Nepali-Flow-Roman
提供机构:
Boredoom17



