five

MHBS-IHB/fishmt5

收藏
Hugging Face2025-03-07 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/MHBS-IHB/fishmt5
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - translation language: - zh - la size_categories: - 1M<n<10M tags: - bilingual pretty_name: Fish Names Chinese-Latin Parallel Corpora --- # Fish Names Chinese-Latin Parallel Corpora ## Dataset Overview We curated over 60,000 authoritative Chinese-Latin bilingual parallel corpora for fish names by integrating cross-source data, including Eschmeyer's Catalog of Fishes online database. Using a dual translation approach, we applied the Multilingual Text-to-Text Transfer Transformer (mT5) model to generate missing Chinese names. *Note: The current release provides 10,000 paired data entries.* ## Dataset Details - **Total Curated Records:** > 60,000 authoritative pairs - **Current Release:** 10,000 carefully reviewed Chinese-Latin name pairs - **Languages:** Chinese, Latin - **Data Sources:** - Eschmeyer's Catalog of Fishes online database - Other cross-source authoritative fish name databases - **Methodology:** - Data integration from multiple sources - Dual translation approach using mT5 to generate missing Chinese names - Rigorous quality control and review ## Intended Use - **Research:** Fish taxonomy, biodiversity studies, and ecological research - **Translation:** Evaluation and development of bilingual translation models - **Corpus Development:** Creation of high-quality multilingual corpora for biocultural diversity studies ## Limitations - **Data Size:** Although the full dataset includes over 60,000 pairs, only a subset of 10,000 pairs is provided in the current release. - **Review Status:** The associated research article is currently under review. Future updates will expand the dataset and include additional metadata. ## Citation If you use this dataset in your research, please cite the forthcoming publication (currently under review). ## Contact For any questions or further information, please contact the dataset curators.
提供机构:
MHBS-IHB
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作