bethgelab/lit-benchmark
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/bethgelab/lit-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
LiT是一个多语言基准测试,用于评估多跳翻译链和针对性鲁棒性案例中的意义保留。该版本围绕三个公开的基准视图组织:`lit`(主基准,包含200个示例,结合了摘要、语用和非正式语言)、`robustness`(60个针对性困难案例,用于压力测试模型)和`extended`(完整的260个示例版本)。此外,还提供了子集特定视图、类别元数据、多跳痕迹和原始评分输出等额外文件。
LiT is a multilingual benchmark for evaluating meaning preservation across multihop translation chains and targeted robustness cases. The release is organized around three public benchmark views: `lit` (the main 200-example benchmark, combining abstracts, pragmatics, and informal language), `robustness` (60 targeted hard cases for stress-testing models), and `extended` (the full 260-example release). Additional files include subset-specific views, category metadata, multihop traces, and raw judge outputs.
提供机构:
bethgelab



