five

MihaiPopa-1/OmniSurgical-1.0

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MihaiPopa-1/OmniSurgical-1.0
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - abk - abq - abs - acm - adh - adi - ady - aeb - afr - agx - aii - aim - ain - ajz - akb - aln - als - alt - amh - anp - aoz - apc - apt - arb - arg - arq - ars - ary - arz - asm - ast - atb - ava - awa - ayp - ayr - azb - azj - bak - bam - ban - bar - bas - bbc - bbk - bcl - bdq - bel - ben - bew - bho - bhp - bis - biu - bjn - bod - bos - brh - brx - bts - btx - bug - bul - bwi - bxr - cat - cbk - ccp - ceb - ces - cfm - cha - che - chr - chu - chv - cjs - ckb - ckt - cmn - cnh - cnw - cos - crh - crj - crk - crl - crs - csb - csw - csy - ctd - cym - czt - dak - dan - dar - deu - dik - diu - div - dje - dks - dln - dng - dnw - doi - dru - dsb - dtp - dty - dzo - ekk - ell - enl - enm - epo - ess - eus - eve - ewo - ext - fao - fas - ffm - fij - fil - fin - fit - fkv - fmu - fra - fro - frp - fry - fuf - fur - fuv - gag - gaz - gcf - gla - gle - glg - glk - glv - gmh - gnb - goh - gom - gos - grc - gsw - gug - guj - guz - hac - hae - hak - hat - hau - haw - hbo - heb - her - hif - hil - hin - hmr - hne - hns - hrv - hrx - hsb - hun - hwc - hye - hyw - iba - ibg - ibo - ife - ike - ikt - ilo - ina - ind - inh - isl - ita - ivv - jav - jpn - jun - kaa - kab - kac - kak - kal - kam - kan - kas - kat - kaz - kbd - kca - kdh - kdr - kea - kei - kgp - kha - khk - khm - kik - kin - kir - kiu - kjb - kjh - kmr - knc - koi - kor - kos - kpv - krj - krl - kru - ksh - ksw - ktj - ktz - kua - kum - kwn - kyu - kzj - lad - lao - lat - lbe - ldn - lew - lez - lfn - lim - lin - lis - lit - lki - lld - lmk - lnd - lrc - ltg - ltz - lud - lug - luo - lus - lvs - lwg - lzh - mag - mah - mai - mak - mal - mar - mas - mbf - mdf - mer - mfe - mfg - mfy - mhi - mhr - mhy - min - mip - mjw - mkd - mlt - mni - mnk - mns - mnw - moh - mph - mqy - mri - mrj - mrw - mtg - mui - mup - mus - mvp - mwf - mwl - mww - mya - myv - myx - mzh - nah - nan - nap - naq - nbu - nde - ndo - nds - new - nio - njn - njo - nld - nmf - nmz - nno - nob - nog - non - npi - npo - nrf - nri - nrm - nse - nus - nya - nyn - nzm - obo - oci - ojb - olo - orv - ory - oss - ota - oto - otw - pam - pan - pap - pbt - pcd - pck - pcm - pfl - plt - pmq - pmx - pnb - pnt - pol - por - pov - ppk - pps - prg - pui - pxm - quc - qul - qup - qus - quz - raw - rcf - rel - rhg - ria - rjs - rmc - rml - rmn - rmy - rnl - roh - ron - rtm - rue - run - rus - sah - san - sat - sck - scn - sda - sdc - sdh - ses - sgc - sgh - sid - sin - sju - skr - slk - slv - sma - sme - smj - smn - smo - sms - smt - sna - snd - som - sot - spa - srd - srp - ssw - sun - swe - swg - swh - syc - syl - szl - tab - tam - taq - tat - tcy - tcz - tel - tet - tgk - tha - thl - tig - tir - tkl - tkr - tlh - tly - tok - ton - tpi - tpw - trc - trp - trs - ttj - tuk - tur - tuv - twx - tyv - tzl - tzm - udm - uig - ukr - urd - uzn - uzs - vap - vie - vot - vro - war - way - wba - wbm - wes - whk - wlx - wol - wsg - wwa - xal - xho - xmm - xmv - xog - yaz - ydd - yor - yrk - yrl - yua - yue - zea - zgh - zom - zsm - zul task_categories: - text-generation - translation datasets: - HuggingFaceFW/finetranslations license: apache-2.0 size_categories: - 10K<n<100K --- # OmniSurgical 1.0 OmniSurgical is a dataset which you can train your very own massively multilingual machine translation models by fine-tuning existing LLMs! # Formats We give the dataset in 2 formats: JSONL and JSONZ (zipped JSONL) And the names speak for themselves: `OmniSurgical_120_Clean.jsonz` is the processed file and `train.jsonz` is the shuffled version of the same file, used to fine-tune existing LLMs (I fine-tuned Qwen 3 0.6B for this!) # Data Used I used only 120 sentences per each language of [HF's FineTranslations](https://huggingface.co/datasets/HuggingFaceFW/finetranslations) dataset, that means 60 sentences per language pair! The original dataset was translated back into English using [Gemma 3 27B](https://huggingface.co/google/gemma-3-27b-it)
提供机构:
MihaiPopa-1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作