five

cstr/de-wiktionary-semantic

收藏
Hugging Face2025-11-22 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/cstr/de-wiktionary-semantic
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-3.0 task_categories: - text-retrieval language: - de tags: - wiktionary - dictionary - german - linguistics - morphology - semantics - normalized - lossless size_categories: - 1M<n<10M --- # German Wiktionary - Normalized SQLite Database with semantic web data etc This is a lossless, fully normalized SQLite database of German Wiktionary, capturing every field from the `cstr/de-wiktionary-extracted` dataset. ## 🎯 Key Features - **✅ 100% Lossless**: ALL fields captured including: - 🔗 **Wikilinks** in definitions (semantic connections) - 📝 **Qualifiers** (e.g., "archaic", "Swiss", "informal") - 🏷️ **Sense IDs** (unique identifiers) - 🌐 **Wikidata IDs** (for semantic web linking) - 📚 **Attestations** (historical citations) - 🎭 **Head templates** (morphological data like Genus/Plural) - 📖 **Info templates** (structured metadata) - **⚡ Fast Queries**: Fully indexed schema for sub-20ms queries - **🔗 Complete Semantic Web**: All relations preserved with sense-level granularity - **📱 Mobile-ready**: Optimized for sqflite (Flutter) and local DB use cases ## 📊 Database Statistics - **Entries**: 970,801 - **Word Senses**: 3,098,364 - **Definitions (Glosses)**: 3,087,300 - **Wikilinks**: 0 - **Sense IDs**: 3,098,364 - **Qualifiers**: Embedded in senses - **Translations**: 1,131,251 - **Word Forms**: 6,100,090 - **Head Templates**: 0 - **Pronunciations**: 2,327,762 - **Examples**: 427,322 - **Attestations**: 0 - **Wikidata IDs**: 0 - **Synonyms**: 161,563 - **Antonyms**: 76,054 - **Hypernyms**: 133,059 - **Hyponyms**: 0 ## 🏗️ Database Schema (40+ Tables) ### New Tables (vs Previous Versions) - **head_templates**: Morphological templates (Crucial for German inflection/gender) - **entry_wikipedia**: Wikipedia cross-references - **sense_links**: Wikilinks in definitions - **sense_raw_tags**: Unstructured tags - **sense_wikidata**: Wikidata identifiers - **sense_wikipedia**: Wikipedia at sense level - **attestations**: Historical citations - **info_templates**: Structured metadata ### Core Tables - **entries**: Core word data with etymology - **senses**: Definitions with qualifier, senseid, head_nr - **translations**: Multi-language translations - **examples**: Usage examples - **semantic relations**: hypernyms/hyponyms/meronyms/holonyms/coordinate_terms ## 📖 Usage ### Download ```python from huggingface_hub import hf_hub_download import sqlite3 import gzip import shutil # Download compressed database db_gz_path = hf_hub_download( repo_id="cstr/de-wiktionary-semantic", filename="de_wiktionary_normalized_full.db.gz", repo_type="dataset" ) # Decompress db_path = db_gz_path.replace('.gz', '') with gzip.open(db_gz_path, 'rb') as f_in: with open(db_path, 'wb') as f_out: shutil.copyfileobj(f_in, f_out) # Connect conn = sqlite3.connect(db_path) ```` ### Example Queries ```python # Get definition with wikilinks for "Hund" cursor.execute(''' SELECT g.gloss_text, GROUP_CONCAT(l.link_text, ', ') as links FROM entries e JOIN senses s ON e.id = s.entry_id JOIN glosses g ON s.id = g.sense_id LEFT JOIN sense_links l ON s.id = l.sense_id WHERE e.word = ? AND e.lang = 'Deutsch' GROUP BY g.id ''', ('Hund',)) # Find Wikidata ID for a sense cursor.execute(''' SELECT e.word, w.wikidata_id FROM entries e JOIN senses s ON e.id = s.entry_id JOIN sense_wikidata w ON s.id = w.sense_id WHERE e.word = ? ''', ('Katze',)) ``` ## 📜 License CC-BY-SA 4.0 (same as source)
提供机构:
cstr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作