dipteshkanojia/llama-2-qe-2023-indic-multi
收藏数据集概述
许可证
- CC BY-NC-SA 4.0
语言
- 英语 (en)
- 印地语 (hi)
- 马拉地语 (mr)
- 古吉拉特语 (gu)
- 泰米尔语 (ta)
- 泰卢固语 (te)
标签
- 质量评估
- llama-2-format
- 指令调优
- WMT 2023 数据
数据规模
- 10K<n<100K
描述
- 该数据集用于微调 meta-llama/Llama-2-13b-chat-hf 模型,作为 WMT 2023 共享任务的一部分。
- 数据集包含从训练和验证集中拼接和打乱的 En-Gu, Hi, Mr, Ta, Te 数据。
- 排除了约 10 个以上的样本提示,用于上下文学习场景的测试集。
样本提示
<s>[INST] <<SYS>> You are a quality estimation model which accuractely predicts the translation quality as mean z_score. For perfectly meaningful translation, predict high z_score and for a meaningless or erroneous translation predict low z_score. Penalize the z_score on translation errors within [TGT] based on source sentence in [SRC]. Do not consider any other exisiting translation evaluation metrics. <</SYS>> For the following translation from English to Marathi, [SRC] Mudiyettu performers purify themselves through fasting and prayer, then draw a huge image of goddess Kali, called as kalam, on the temple floor with coloured powders, wherein the spirit of the goddess is invoked. [/SRC][TGT] मुडियेट्टु कलाकार उपवास आणि प्रार्थनेद्वारे स्वतःला शुद्ध करतात, त्यानंतर मंदिराच्या मजल्यावर काळम नावाच्या देवीची मोठी प्रतिमा काढतात, ज्यात देवीच्या आत्म्याची प्रार्थना केली जाते. [/TGT], predict the z_score [/INST] z_score: -0.4986</s>



