five

MihaiPopa-1/OmniSurgical-1.1

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MihaiPopa-1/OmniSurgical-1.1
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - emj - ron - sul sulask_categories: - text-generation - translation datasets: - HuggingFaceFW/finetranslations license: apache-2.0 size_categories: - 10K<n<100K task_categories: - text-generation - translation --- # OmniSurgical 1.1 OmniSurgical 1.1 is expansion (DLC) for [OmniSurgical 1.0](https://huggingface.co/datasets/MihaiPopa-1/OmniSurgical-1.0) to fix some issues that were in the previous version. # What's Fixed and What's New * 2 new languages: Emoji (teaching to speak in another completly different script) and Sulfuristic Speak (my own simple language for OmniTranslate 1.1 to quite fit the Chaos Cubed Minecraft vibe) * Fix the bug with diacritics when translating English to Romanian! # Formats We give the dataset in 1 format: JSONL (because JSONZ will be so small!) And the names speak for themselves: `OmniSurgical_120_Clean.jsonl` is the processed file and `OmniSurgical_120_Shuffled.jsonl` is the shuffled version of the same file, used to fine-tune existing LLMs (I fine-tuned Qwen 3 0.6B for this!) # Data Used I useda mix of 3 sources: * 200 English to Romanian sentences from [HF's FineTranslations](https://huggingface.co/datasets/HuggingFaceFW/finetranslations) (only with diacritics) for fixing the Romanian diacritic bug. * 506 English to Sulfuristic Speak (and vice-versa) sentences in total (~10 written by mine plus 493 generated using a Python program, vibe-coded by Gemini 2.5 Flash in Colab, Gemini 3.1 Flash Lite and Gemini 3.1 Pro in AI Studio) * 284 English to Emoji (and vice-versa) sentences in total (all generated by Gemini 3.1 Flash Lite) The original [FineTranslations](https://huggingface.co/datasets/HuggingFaceFW/finetranslations) was translated back into English using [Gemma 3 27B](https://huggingface.co/google/gemma-3-27b-it)
提供机构:
MihaiPopa-1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作