five

RichardDelome/wikidata_truthy

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/RichardDelome/wikidata_truthy
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 language: - en - multilingual tags: - wikidata - knowledge-graph - parquet - duckdb size_categories: - 1B<n<10B pretty_name: Wikidata Truthy (Parquet) --- # Wikidata Truthy — Parquet The complete [Wikidata](https://www.wikidata.org/) truthy snapshot (March 2026), converted from the raw N-Triples dump into optimized Parquet files. **1.7 billion statements** across 108 million entities, queryable directly with DuckDB — no SPARQL endpoint needed. ## Files | File | Rows | Size | Description | |------|------|------|-------------| | `statements.parquet` | 1,703,849,656 | 7.6 GB | All truthy statements (Q-item as subject) | | `labels.parquet` | 372,000,059 | 3.4 GB | Labels for all entities in all available languages | | `items.parquet` | 108,425,819 | 2.1 GB | One row per entity with best available label (English preferred) | | `items_descriptions.parquet` | 101,268,772 | 340 MB | English descriptions for entities | | `property_statements.parquet` | 300,032 | 2.9 MB | Statements about properties (P-item as subject) | | `properties.parquet` | 13,304 | 446 KB | Property labels and descriptions | ## Schemas ### statements.parquet The main table. Each row is one truthy statement about a Wikidata item. | Column | Type | Description | |--------|------|-------------| | `subject` | INT32 | Item ID (e.g. `31` for Q31/Belgium) | | `property` | INT16 | Property ID (e.g. `31` for P31/instance-of) | | `object_id` | INT32 | Target entity ID when the object is a Wikidata entity, NULL otherwise | | `object_value` | VARCHAR | Literal value (number, date, string, coordinate), NULL when `object_id` is set | | `datatype` | TINYINT | Type code for the object (see below) | **Datatype codes:** | Code | Type | Example | |------|------|---------| | 0 | Entity reference (Q) | Q5 (human) | | 1 | Time | 2026-03-23T13:50:18Z | | 2 | Quantity | +5821746 | | 3 | String | "1000063" | | 4 | Monolingual text | Belgium@en | | 5 | Globe coordinate | Point(5.47 49.49) | | 6 | URL | http://... | | 7 | Other | External URIs, unknown types | | 8 | Entity reference (P) | Property reference | ### items.parquet Lookup table: one row per entity with the best available label. | Column | Type | Description | |--------|------|-------------| | `qid` | INT32 | Entity ID | | `label` | VARCHAR | Best label (English preferred, falls back through 40 languages) | | `label_lang` | VARCHAR | Language of the chosen label | ### items_descriptions.parquet | Column | Type | Description | |--------|------|-------------| | `qid` | INT32 | Entity ID | | `description` | VARCHAR | English description | ### labels.parquet All labels in all languages. | Column | Type | Description | |--------|------|-------------| | `qid` | INT32 | Entity ID | | `lang` | VARCHAR | Language code | | `label` | VARCHAR | Label text | ### properties.parquet | Column | Type | Description | |--------|------|-------------| | `pid` | INT16 | Property ID | | `label` | VARCHAR | Property label (English) | | `description` | VARCHAR | Property description (English) | ### property_statements.parquet Same schema as `statements` but for statements where the subject is a property (P-entity). | Column | Type | Description | |--------|------|-------------| | `subject` | INT16 | Property ID | | `property` | INT16 | Property ID | | `object_id` | INT32 | Target entity ID (nullable) | | `object_label` | VARCHAR | Resolved label of target entity (nullable) | | `object_value` | VARCHAR | Literal value (nullable) | | `datatype` | TINYINT | Type code | ## Quick Start with DuckDB No download needed — query directly from Hugging Face: ```sql -- Find all "instance of" (P31) values for Q42 (Douglas Adams) SELECT s.subject, p.label AS property, i.label AS value FROM 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s JOIN 'hf://datasets/RichardDelome/wikidata_truthy/properties.parquet' p ON p.pid = s.property JOIN 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet' i ON i.qid = s.object_id WHERE s.subject = 42 AND s.property = 31; ``` ```sql -- Look up an entity by name SELECT qid, label FROM 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet' WHERE label = 'Douglas Adams'; ``` ```sql -- All items that are "instance of" "human" (Q5) born in a specific year SELECT i.label, s2.object_value AS birth_date FROM 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s1 JOIN 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s2 ON s2.subject = s1.subject AND s2.property = 569 -- P569 = date of birth JOIN 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet' i ON i.qid = s1.subject WHERE s1.property = 31 AND s1.object_id = 5 -- P31 = instance of, Q5 = human AND s2.object_value LIKE '+1952%' LIMIT 20; ``` ## Source Built from [Wikidata's latest-truthy N-Triples dump](https://dumps.wikimedia.org/wikidatawiki/entities/) (March 26, 2026). Only **truthy** statements are included — the current best-ranked value for each property on each item, excluding deprecated and non-preferred ranks. ## License The data is from Wikidata and is available under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).
提供机构:
RichardDelome
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作