SAMPLE French Language Datasets | 150+ Years of Research | AI | NLP | LLMs | Dictionary Display ...
收藏Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/13260340-fe98-41a4-825e-e5ea060fafaf/Oxford-Languages_SAMPLE-French-Language-Datasets-150+-Years-of-Research-AI-NLP-LLMs-Dictionary-Display-
下载链接
链接失效反馈官方服务:
资源简介:
This linguistically rich French dataset offers comprehensive annotations including headwords, definitions, senses, real-world examples, POS tags, semantic metadata, and usage info. Optimized for use in NLP, dictionary tools, and language model fine-tuning.
Our French language datasets are meticulously curated and annotated by experienced linguistics and language experts, ensuring exceptional accuracy, consistency, and linguistic depth. The below datasets in French are available for license:
1. French Monolingual Dictionary Data
2. French Bilingual Dictionary Data
Key Features (approximate numbers):
1. French Monolingual Dictionary Data
Our French monolingual dataset delivers clear, reliable definitions and authentic usage examples, featuring a high volume of headwords and in-depth coverage.
- Words: 42,000
- Senses: 56,000
- Example sentences: 43,000
- Format: XML format
- Delivery: Email (link-based file sharing)
- Updated frequency: annually
2. French Bilingual Dictionary Data
The bilingual data provides translations in both directions, from English to French and from French to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span European, African and Canadian French varieties.
- Translations: 380,000
- Senses: 199,000
- Example translations: 146,000
- Format: XML and JSON formats
- Delivery: Email (link-based file sharing) and REST API
- Updated frequency: annually
Use Cases:
We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD).
If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation.
Pricing:
Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.
Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.
About the sample:
The samples offer a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). To help you explore the structure and features of our dataset, we provide a sample in CSV format for preview purposes only.
If you need the complete original sample or more details about any dataset, please contact us (Growth.OL@oup.com) to request access or further information.
提供机构:
Oxford Languages



