kytrungchauwork/eng_viet_paralell_sense_tagger

Name: kytrungchauwork/eng_viet_paralell_sense_tagger
Creator: kytrungchauwork
Published: 2026-04-08 20:32:44
License: 暂无描述

Hugging Face2026-04-08 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/kytrungchauwork/eng_viet_paralell_sense_tagger

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - vi - en size_categories: - 10K<n<100K --- # EN-VI Parallel Sense Dataset This dataset contains parallel English-Vietnamese sentences annotated with word senses. It is designed for token classification tasks such as predicting the sense of each word in a sentence. ## Dataset Details - **Languages:** English (en), Vietnamese (vi) - **Task:** Token Classification / Word Sense Disambiguation - **Number of examples:** ~[Điền số lượng thực tế] - **Format:** CSV or JSONL - Columns for CSV: `en_tokens`, `en_labels`, `vi_tokens`, `vi_labels` - Each token is separated by a space, and each label corresponds to a token. - **Labels:** - `O` for non-sense tokens - Sense labels follow WordNet or custom sense inventory, e.g., `self.n.01`, `bản_ngã.n.01`. ## Example ### CSV format | en_tokens | en_labels | vi_tokens | vi_labels | |--------------------------------------------|---------------------------------|--------------------------------------------|---------------------------------| | Why not a self-fueling cycle in which we all can participate ? | O O O self.n.01 O O O O O O O can.n.01 O | Tại_sao lại không có một vòng lặp_tự hoạt_động mà tất_cả chúng_ta có_thể tham_gia ? | O O O O O O bản_ngã.n.01 O O O O lon.n.01 O O O | ### JSONL format ```json { "en": { "tokens": ["Why", "not", "a", "self", "-", "fueling", "cycle", "in", "which", "we", "all", "can", "participate", "?"], "labels": ["O", "O", "O", "self.n.01", "O", "O", "O", "O", "O", "O", "O", "can.n.01", "O", "O"] }, "vi": { "tokens": ["Tại_sao", "lại", "không", "có", "một", "vòng", "lặp_tự", "hoạt_động", "mà", "tất_cả", "chúng_ta", "có_thể", "tham_gia", "chứ", "?"], "labels": ["O", "O", "O", "O", "O", "O", "bản_ngã.n.01", "O", "O", "O", "O", "lon.n.01", "O", "O", "O"] } }

提供机构：

kytrungchauwork

5,000+

优质数据集

54 个

任务类型

进入经典数据集