five

planwise-data/somali-stem-dataset

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/planwise-data/somali-stem-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - translation - token-classification - text-classification language: - so - en tags: - somali - stem - low-resource - nlp - africa - biology - chemistry - physics - mathematics - geography - education - terminology pretty_name: Somali STEM Terminology Dataset size_categories: - 1K<n<10K --- # Somali STEM Terminology Dataset (Sample) ## Overview A bilingual English–Somali terminology dataset covering 5 STEM domains. This is a **100-row public sample**. The full dataset contains 3,000+ terms. Somali is spoken by 20+ million people but is critically under-represented in AI training data. This is the only known structured Somali STEM lexicon in machine-readable format. ## Fields | Field | Description | |-------|-------------| | English | Scientific term in English | | Somali | Somali translation | | Subject | Physics / Mathematics / Chemistry / Biology / Geography | | Source | Historical Somali educational materials | | definition_en | Definition in English | | definition_so | Definition in Somali | | example_en | Usage example in English | | example_so | Usage example in Somali | | difficulty_level | basic / intermediate / advanced | ## Distribution | Subject | Rows | |---------|------| | Physics | 20 | | Mathematics | 20 | | Chemistry | 20 | | Biology | 20 | | Geography | 20 | | **Total** | **100** | ## Example ```json { "English": "Photosynthesis", "Somali": "Iftiinka-cunid", "Subject": "Biology", "Source": "Historical Somali educational materials", "definition_en": "The process by which green plants use sunlight, water, and carbon dioxide to produce glucose and oxygen.", "definition_so": "Habka ay dhirta cagaartu ku isticmaalaan iftiinka qorraxda, biyaha, iyo carbon dioxide si ay u soo saaraan glucose iyo ogsajiin.", "example_en": "During photosynthesis, a leaf absorbs sunlight and converts water and CO2 into sugar stored for energy.", "example_so": "Muddada iftiinka-cunidda, caleen waxay nuugtaa iftiinka qorraxda oo u bedeshaa biyaha iyo CO2 sonkor kaydiyay tamarta.", "difficulty_level": "basic" } ``` ## Use Cases - Machine translation Somali ↔ English - NLP for low-resource languages - STEM education tools for Somalia and diaspora - Domain-specific fine-tuning of language models - Named Entity Recognition for scientific text ## Source Compiled and structured from historical Somali educational materials. ## Notes - This repository contains a public sample only - Full dataset available for research collaboration and licensing upon request ## License CC BY 4.0 — Free to use with attribution, including commercial use. Contact via HuggingFace for full dataset access and licensing.
提供机构:
planwise-data
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作