five

SAMPLE German Language Datasets | 393K Translations | NLP | Dictionary Display | Machine ...

收藏
Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/405c179c-ee0c-4687-9059-94973f47c904/Oxford-Languages_SAMPLE-German-Language-Datasets-393K-Translations-NLP-Dictionary-Display-Machine-
下载链接
链接失效反馈
官方服务:
资源简介:
Comprehensive German language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Our German language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in German are available for license: 1. German Monolingual Dictionary Data 2. German Bilingual Dictionary Data 3. German Word List Data Key Features (approximate numbers): 1. German Monolingual Dictionary Data Our German monolingual features clear definitions, headwords, examples, and comprehensive coverage of the German language spoken today. - Words: 85,500 - Senses: 78,000 - Example sentences: 55,000 - Format: XML format - Delivery: Email (link-based file sharing) 2. German Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to German and from German to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality. - Translations: 393,000 - Senses: 207,500 - Example translations: 129,500 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 3. German Word List Data This language data contains a carefully curated and comprehensive list of 338,000 German words. - Wordforms: 338,000 - Format: CSV and TXT formats - Delivery: Email (link-based file sharing) Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals. About the sample: The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only. If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information.
提供机构:
Oxford Languages
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作