five

SAMPLE British English Language Datasets | 150+ Years of Research | Audio Data | Natural ...

收藏
Databricks2025-11-22 收录
下载链接:
https://marketplace.databricks.com/details/444217d6-9f0c-4a62-ae3b-93ca65210eb9/Oxford-Languages_SAMPLE-British-English-Language-Datasets-150+-Years-of-Research-Audio-Data-Natural-
下载链接
链接失效反馈
官方服务:
资源简介:
Derived from over 150 years of lexical research, these comprehensive textual and audio data, focused on British English, provide linguistically annotated data. Ideal for NLP applications, Machine Learning (ML), LLM training and/or fine-tuning, as well as educational and game apps. Our British English language datasets are meticulously curated and annotated by experienced linguists and language experts, ensuring exceptional accuracy, consistency, and linguistic depth. The below datasets in British English are available for license: 1. British English Monolingual Dictionary Data 2. British English Synonyms and Antonyms Data 3. British English Pronunciations with Audio Key Features (approximate numbers): 1. British English Monolingual Dictionary Data Our British English monolingual dataset delivers clear, reliable definitions and authentic usage examples, featuring a high volume of headwords and in-depth coverage of the British English variant of English. As one of the world’s most authoritative lexical resources, it’s trusted by leading academic, AI, and language technology organizations. - Headwords: 146,000 - Senses: 230,000 - Sentence examples: 149,000 - Format: XML and JSON format - Delivery: Email (link-based file sharing) and REST API - Updated frequency: twice a year 2. British English Synonyms and Antonyms Data This British English language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for NLP tasks such as semantic search, word sense disambiguation, and language generation. - Synonyms: 600,000 - Antonyms: 22,000 - Usage Examples: 39,000 - Format: XML and JSON format - Delivery: Email (link-based file sharing) - Updated frequency: annually 3. British English Pronunciations with audio (word-level) This dataset provides IPA transcriptions and clean audio data in contemporary British English. It includes syllabified transcriptions, variant spellings, POS tags, and pronunciation group identifiers. The audio files are supplied separately and linked where available for seamless integration - perfect for teams building TTS systems, ASR models, and pronunciation engines. - Transcriptions (IPA): 250,000 - Audio files: 180,000 - Format: XLSX (for transcriptions), MP3 and WAV (audio files) - Updated frequency: annually Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation machine, AI training and fine-tuning, word embedding, and word sense disambiguation (WSD).   If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing:   Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs.  Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.  Please note that some datasets may have rights restrictions. Contact us for more information.  About the sample:  To help you explore the structure and features of our dataset on this platform, we provide a sample in CSV and/or JSON formats for one of the presented datasets, for preview purposes only, as shown on this page. This sample offers a quick and accessible overview of the data's contents and organization.   Our full datasets are available in various formats, depending on the language and type of data you require. These may include XML, JSON, TXT, XLSX, CSV, WAV, MP3, and other file types. Please contact us (Growth.OL@oup.com) if you would like to receive the original sample with full details.
提供机构:
Oxford Languages
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作