five

SAMPLE EMEA Data Suite | 3.3M Translations | 1.9M Words | 22 Languages | Natural Language ...

收藏
Databricks2025-10-09 收录
下载链接:
https://marketplace.databricks.com/details/fc9d18ba-54b4-449a-a016-8292763d267f/Oxford-Languages_SAMPLE-EMEA-Data-Suite-3.3M-Translations-1.9M-Words-22-Languages-Natural-Language-
下载链接
链接失效反馈
官方服务:
资源简介:
EMEA Data Suite offers 43 high-quality language datasets covering 23 languages spoken in the region. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies. Discover our expertly curated language datasets in the EMEA Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes: - Monolingual and Bilingual Dictionary Data Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata. - Sentence Corpora Curated examples of real-world usage with contextual annotations for training and evaluation. - Synonyms & Antonyms Lexical relations to support semantic search, paraphrasing, and language understanding. - Audio Data Native speaker recordings for speech recognition, TTS, and pronunciation modeling. - Word Lists Frequency-ranked and thematically grouped lists for vocabulary training and NLP tasks. Each language may contain one or more types of language data. Depending on the dataset, we can provide these in formats such as XML, JSON, TXT, XLSX, CSV, WAV, MP3, and more. Delivery is currently available via email (link-based sharing) or REST API. If you require more information about a specific dataset, please contact us Growth.OL@oup.com. Below are the different types of datasets available for each language, along with their key features and approximate metrics. If you have any questions or require additional assistance, please don't hesitate to contact us. 1. Arabic Monolingual Dictionary Data: 66,500 words | 98,700 senses | 70,000 example sentences. 2. Arabic Bilingual Dictionary Data: 116,600 translations | 88,300 senses | 74,700 example translations. 3. Arabic Synonyms and Antonyms Data: 55,100 synonyms. 4. British English Monolingual Dictionary Data: 146,000 words | 230,000 senses | 149,000 example sentences. 5. British English Synonyms and Antonyms Data: 600,000 synonyms | 22,000 antonyms 6. British English Pronunciations with Audio: 250,000 transcriptions (IPA) |180,000 audio files. 7. Catalan Monolingual Dictionary Data: 29,800 words | 47,400 senses | 25,600 example sentences. 8. Catalan Bilingual Dictionary Data: 76,800 translations | 109,350 senses | 26,900 example translations. 9. Croatian Monolingual Dictionary Data: 129,600 words | 164,760 senses | 34,630 example sentences. 10. Croatian Bilingual Dictionary Data: 100,700 translations | 91,600 senses | 10,180 example translations. 11. Czech Bilingual Dictionary Data: 426,473 translations | 199,800 senses | 95,000 example translations. 12. Danish Bilingual Dictionary Data: 129,000 translations | 91,500 senses | 23,000 example translations. 13. French Monolingual Dictionary Data: 42,000 words | 56,000 senses | 43,000 example sentences. 14. French Bilingual Dictionary Data: 380,000 translations | 199,000 senses | 146,000 example translations. 15. German Monolingual Dictionary Data: 85,500 words | 78,000 senses | 55,000 example sentences. 16. German Bilingual Dictionary Data: 393,000 translations | 207,500 senses | 129,500 example translations. 17. German Word List Data: 338,000 wordforms. 18. Greek Monolingual Dictionary Data: 47,800 translations | 46,309 senses | 2,388 example sentences. 19. Hebrew Monolingual Dictionary Data: 85,600 words | 104,100 senses | 94,000 example sentences. 20. Hebrew Bilingual Dictionary Data: 67,000 translations | 49,000 senses | 19,500 example translations. 21. Hungarian Monolingual Dictionary Data: 90,500 words | 155,300 senses | 42,500 example sentences. 22. Italian Monolingual Dictionary Data: 102,500 words | 231,580 senses | 48,200 example sentences. 23. Italian Bilingual Dictionary Data: 492,000 translations | 251,600 senses | 157,100 example translations. 24. Italian Synonyms and Antonyms Data: 197,000 synonyms | 62,000 antonyms. 25. Latvian Monolingual Dictionary Data: 36,000 words | 43,600 senses | 73,600 example sentences. 26. Polish Bilingual Dictionary Data: 287,400 translations | 216,900 senses | 19,800 example translations. 27. Portuguese Monolingual Dictionary Data: 143,600 words | 285,500 senses | 69,300 example sentences. 28. Portuguese Bilingual Dictionary Data: 300,000 translations | 158,000 senses | 117,800 example translations. 29. Portuguese Synonyms and Antonyms Data: 196,000 synonyms | 90,000 antonyms. 30. Romanian Monolingual Dictionary Data: 66,900 words | 113,500 senses | 2,700 example sentences. 31. Romanian Bilingual Dictionary Data: 77,500 translations | 63,870 senses | 33,730 example translations. 32. Russian Monolingual Dictionary Data: 65,950 words | 57,500 senses | 51,900 example sentences. 33. Russian Bilingual Dictionary Data: 230,100 translations | 122,200 senses | 69,600 example translations. 34. Slovak Bilingual Dictionary Data: 254,300 translations | 172,100 senses | 85,000 example translations. 35. Spanish Monolingual Dictionary Data: 73,000 words | 123,000 senses | 104,000 example sentences. 36. Spanish Bilingual Dictionary Data: 221,300 translations | 103,500 senses | 83,800 example translations. 37. Spanish Sentences Data: 1,840,000 sentences 38. Spanish Synonyms and Antonyms Data: 127,700 synonyms | 9,500 antonyms. 39. Spanish Audio Data: 20,900 audio files. 40. Spanish Word List Data: 450,000 wordforms. 41. Turkish Bilingual Dictionary Data: 70,000 translations | 47,800 senses | 4,000 example translations. 42. Ukrainian Bilingual Dictionary Data: 81,700 translations | 74,300 senses | 8,000 example translations. 43. Welsh Bilingual Dictionary Data: 19,900 translations | 10,925 senses | 9,500 example translations. Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals. Please note that some datasets may have rights restrictions. Contact us for more information. About the sample: The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only. If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information.
提供机构:
Oxford Languages
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作