BWB
收藏📐 The BlonDe Package:
Package Overview
BlonDe is an automatic evaluation metric designed for document-level machine translation (MT). It addresses the limitations of traditional metrics like BLEU by explicitly tracking discourse phenomena, enhancing the evaluation of translations at the document level. BlonDe categorizes discourse-related spans and computes a similarity-based F1 measure for these categorized spans, providing a more selective and context-aware metric compared to sentence-level metrics.
Features
- BlonDe: The main metric, integrating
dBlonDewith sentence-level measurements. - dBlonDe: Measures discourse phenomena such as entities, tense, pronouns, and discourse markers.
- BlonDe+: An enhanced version that incorporates human annotations for ambiguous or omitted phrases and manually annotated named entities.
⏳ Installation
BlonDe requires Python 3.6 or higher. Installation steps include updating necessary Python packages and installing BlonDe from PyPI or directly from the GitHub repository.
Usage
BlonDe offers both command-line interface (CLI) and Python module usage. Example inputs are provided for demonstration.
Command-line Usage
Basic usage involves specifying reference and system files. Additional options include using human-annotated spans for BlonDe+ and refined named entities.
Using BlonDe from Python
BlonDe can be used programmatically by creating an instance of the BLONDE class. It supports scoring for both single documents and entire corpora, with options to include human annotations and refined named entities.
📙 The BWB Dataset:
Dataset Overview
The BWB dataset is a large-scale Chinese-English document-level parallel corpus, consisting of Chinese online novels and their professionally translated English counterparts. It spans various genres and is the largest known document-level translation dataset.
Statistics
- Train: 196,304 documents, 9,576,566 sentences, 325.4M words
- Test: 80 documents, 2,632 sentences, 68.0K words
- Dev: 79 documents, 2,618 sentences, 67.4K words
- Total: 196K documents, 9.58M sentences, 460.8M words
Annotation Format
The test set is annotated with detailed information including original Chinese text, reference English text, named entities, and error types with corresponding spans.
Error Types
- ambiguity
- ellipsis-pronoun
- ellipsis-other
- named entity
- tense
- sentence-level
Example
Provided are examples of chs_re.txt (original Chinese text) and ref_re.txt (reference English text), showcasing the sentence-level alignment and content of the dataset.




