five

SAMPLE Portuguese Language Datasets | 300K Translations | Natural Language Processing (NLP) Data ...

收藏
Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/54b96115-3255-4146-8fa5-af98d376155d/Oxford-Languages_SAMPLE-Portuguese-Language-Datasets-300K-Translations-Natural-Language-Processing-(NLP)-Data-
下载链接
链接失效反馈
官方服务:
资源简介:
Comprehensive Portuguese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Perfect for powering dictionary platforms, NLP, AI models, and translation systems. Our Portuguese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets in Portuguese are available for license: 1. Portuguese Monolingual Dictionary Data 2. Portuguese Bilingual Dictionary Data Key Features (approximate numbers): 1. Portuguese Monolingual Dictionary Data Our Portuguese monolingual covers both EU and LATAM varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language. - Words:143,600 - Senses: 285,500 - Example sentences: 69,300 - Format: XML format - Delivery: Email (link-based file sharing) 2. Portuguese Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both EU and LATAM Portuguese varieties. - Translations: 300,000 - Senses: 158,000 - Example translations: 117,800 - Format: XML and JSON format - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals. About the sample: The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only. If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information
提供机构:
Oxford Languages
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作