SAMPLE Chinese Language Datasets | 583K Translations | 141K Words | Translations Data | Textual ...
收藏Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/823d4e25-0595-404e-bdd5-6e7f4448ada4/Oxford-Languages_SAMPLE-Chinese-Language-Datasets-583K-Translations-141K-Words-Translations-Data-Textual-
下载链接
链接失效反馈官方服务:
资源简介:
Comprehensive Chinese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Covering Simplified and Traditional writing systems.
Our Chinese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets are available for license:
1. Mandarin Chinese (simplified) Monolingual Dictionary Data
2. Mandarin Chinese (traditional) Monolingual Dictionary Data
3. Mandarin Chinese (simplified) Bilingual Dictionary Data
4. Mandarin Chinese (traditional) Bilingual Dictionary Data
5. Mandarin Chinese (simplified) Synonyms and Antonyms Data
Key Features (approximate numbers):
1. Mandarin Chinese (simplified) Monolingual Dictionary Data
Our Mandarin Chinese (simplified) monolingual features clear definitions, headwords, examples, and comprehensive coverage of the Mandarin Chinese language spoken today.
- Words: 81,300
- Senses: 62,400
- Example sentences: 80,700
- Format: XML and JSON formats
- Delivery: Email (link-based file sharing) and REST API
2. Mandarin Chinese (traditional) Monolingual Dictionary Data
Our Mandarin Chinese (traditional) monolingual features clear definitions, headwords, examples, and comprehensive coverage of the Mandarin Chinese language spoken today.
- Words: 60,100
- Senses: 144,700
- Example sentences: 29,900
- Format: XML format
- Delivery: Email (link-based file sharing)
3. Mandarin Chinese (simplified) Bilingual Dictionary Data
The bilingual data provides translations in both directions, from English to Mandarin Chinese (simplified) and from Mandarin Chinese (simplified) to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality.
- Translations: 367,600
- Senses: 204,500
- Example translations: 150,900
- Format: XML and JSON formats
- Delivery: Email (link-based file sharing) and REST API
- Updated frequency: annually
4. Mandarin Chinese (traditional) Bilingual Dictionary Data
The bilingual data provides translations in both directions, from English to Mandarin Chinese (traditional) and from Mandarin Chinese (traditional) to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality.
- Translations: 215,600
- Senses: 202,800
- Example sentences: 149,700
- Format: XML format
- Delivery: Email (link-based file sharing)
5. Mandarin Chinese (simplified) Synonyms and Antonyms Data
The Mandarin Chinese (simplified) Synonyms and Antonyms Dataset is a leading resource offering comprehensive, up-to-date coverage of word relationships in contemporary Mandarin Chinese. It includes rich linguistic detail such as precise definitions and part-of-speech (POS) tags, making it an essential asset for developing AI systems and language technologies that require deep semantic understanding.
- Synonyms: 3,800
- Antonyms: 3,180
- Format: XML format
- Delivery: Email (link-based file sharing)
Use Cases:
We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD).
If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.
Pricing:
Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.
Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.
Please note that some datasets may have rights restrictions. Contact us for more information.
About the sample:
The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.
If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information.
提供机构:
Oxford Languages
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



