MihaiPopa-1/OmniSurgical-1.1
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MihaiPopa-1/OmniSurgical-1.1
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- emj
- ron
- sul
sulask_categories:
- text-generation
- translation
datasets:
- HuggingFaceFW/finetranslations
license: apache-2.0
size_categories:
- 10K<n<100K
task_categories:
- text-generation
- translation
---
# OmniSurgical 1.1
OmniSurgical 1.1 is expansion (DLC) for [OmniSurgical 1.0](https://huggingface.co/datasets/MihaiPopa-1/OmniSurgical-1.0) to fix some issues that were in the previous version.
# What's Fixed and What's New
* 2 new languages: Emoji (teaching to speak in another completly different script) and Sulfuristic Speak (my own simple language for OmniTranslate 1.1 to quite fit the Chaos Cubed Minecraft vibe)
* Fix the bug with diacritics when translating English to Romanian!
# Formats
We give the dataset in 1 format: JSONL (because JSONZ will be so small!)
And the names speak for themselves: `OmniSurgical_120_Clean.jsonl` is the processed file and `OmniSurgical_120_Shuffled.jsonl` is the shuffled version of the same file, used to fine-tune existing LLMs (I fine-tuned Qwen 3 0.6B for this!)
# Data Used
I useda mix of 3 sources:
* 200 English to Romanian sentences from [HF's FineTranslations](https://huggingface.co/datasets/HuggingFaceFW/finetranslations) (only with diacritics) for fixing the Romanian diacritic bug.
* 506 English to Sulfuristic Speak (and vice-versa) sentences in total (~10 written by mine plus 493 generated using a Python program, vibe-coded by Gemini 2.5 Flash in Colab, Gemini 3.1 Flash Lite and Gemini 3.1 Pro in AI Studio)
* 284 English to Emoji (and vice-versa) sentences in total (all generated by Gemini 3.1 Flash Lite)
The original [FineTranslations](https://huggingface.co/datasets/HuggingFaceFW/finetranslations) was translated back into English using [Gemma 3 27B](https://huggingface.co/google/gemma-3-27b-it)
提供机构:
MihaiPopa-1



