RichardDelome/wikidata_truthy
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/RichardDelome/wikidata_truthy
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
language:
- en
- multilingual
tags:
- wikidata
- knowledge-graph
- parquet
- duckdb
size_categories:
- 1B<n<10B
pretty_name: Wikidata Truthy (Parquet)
---
# Wikidata Truthy — Parquet
The complete [Wikidata](https://www.wikidata.org/) truthy snapshot (March 2026), converted from the raw N-Triples dump into optimized Parquet files.
**1.7 billion statements** across 108 million entities, queryable directly with DuckDB — no SPARQL endpoint needed.
## Files
| File | Rows | Size | Description |
|------|------|------|-------------|
| `statements.parquet` | 1,703,849,656 | 7.6 GB | All truthy statements (Q-item as subject) |
| `labels.parquet` | 372,000,059 | 3.4 GB | Labels for all entities in all available languages |
| `items.parquet` | 108,425,819 | 2.1 GB | One row per entity with best available label (English preferred) |
| `items_descriptions.parquet` | 101,268,772 | 340 MB | English descriptions for entities |
| `property_statements.parquet` | 300,032 | 2.9 MB | Statements about properties (P-item as subject) |
| `properties.parquet` | 13,304 | 446 KB | Property labels and descriptions |
## Schemas
### statements.parquet
The main table. Each row is one truthy statement about a Wikidata item.
| Column | Type | Description |
|--------|------|-------------|
| `subject` | INT32 | Item ID (e.g. `31` for Q31/Belgium) |
| `property` | INT16 | Property ID (e.g. `31` for P31/instance-of) |
| `object_id` | INT32 | Target entity ID when the object is a Wikidata entity, NULL otherwise |
| `object_value` | VARCHAR | Literal value (number, date, string, coordinate), NULL when `object_id` is set |
| `datatype` | TINYINT | Type code for the object (see below) |
**Datatype codes:**
| Code | Type | Example |
|------|------|---------|
| 0 | Entity reference (Q) | Q5 (human) |
| 1 | Time | 2026-03-23T13:50:18Z |
| 2 | Quantity | +5821746 |
| 3 | String | "1000063" |
| 4 | Monolingual text | Belgium@en |
| 5 | Globe coordinate | Point(5.47 49.49) |
| 6 | URL | http://... |
| 7 | Other | External URIs, unknown types |
| 8 | Entity reference (P) | Property reference |
### items.parquet
Lookup table: one row per entity with the best available label.
| Column | Type | Description |
|--------|------|-------------|
| `qid` | INT32 | Entity ID |
| `label` | VARCHAR | Best label (English preferred, falls back through 40 languages) |
| `label_lang` | VARCHAR | Language of the chosen label |
### items_descriptions.parquet
| Column | Type | Description |
|--------|------|-------------|
| `qid` | INT32 | Entity ID |
| `description` | VARCHAR | English description |
### labels.parquet
All labels in all languages.
| Column | Type | Description |
|--------|------|-------------|
| `qid` | INT32 | Entity ID |
| `lang` | VARCHAR | Language code |
| `label` | VARCHAR | Label text |
### properties.parquet
| Column | Type | Description |
|--------|------|-------------|
| `pid` | INT16 | Property ID |
| `label` | VARCHAR | Property label (English) |
| `description` | VARCHAR | Property description (English) |
### property_statements.parquet
Same schema as `statements` but for statements where the subject is a property (P-entity).
| Column | Type | Description |
|--------|------|-------------|
| `subject` | INT16 | Property ID |
| `property` | INT16 | Property ID |
| `object_id` | INT32 | Target entity ID (nullable) |
| `object_label` | VARCHAR | Resolved label of target entity (nullable) |
| `object_value` | VARCHAR | Literal value (nullable) |
| `datatype` | TINYINT | Type code |
## Quick Start with DuckDB
No download needed — query directly from Hugging Face:
```sql
-- Find all "instance of" (P31) values for Q42 (Douglas Adams)
SELECT s.subject, p.label AS property, i.label AS value
FROM 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s
JOIN 'hf://datasets/RichardDelome/wikidata_truthy/properties.parquet' p ON p.pid = s.property
JOIN 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet' i ON i.qid = s.object_id
WHERE s.subject = 42 AND s.property = 31;
```
```sql
-- Look up an entity by name
SELECT qid, label
FROM 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet'
WHERE label = 'Douglas Adams';
```
```sql
-- All items that are "instance of" "human" (Q5) born in a specific year
SELECT i.label, s2.object_value AS birth_date
FROM 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s1
JOIN 'hf://datasets/RichardDelome/wikidata_truthy/statements.parquet' s2
ON s2.subject = s1.subject AND s2.property = 569 -- P569 = date of birth
JOIN 'hf://datasets/RichardDelome/wikidata_truthy/items.parquet' i
ON i.qid = s1.subject
WHERE s1.property = 31 AND s1.object_id = 5 -- P31 = instance of, Q5 = human
AND s2.object_value LIKE '+1952%'
LIMIT 20;
```
## Source
Built from [Wikidata's latest-truthy N-Triples dump](https://dumps.wikimedia.org/wikidatawiki/entities/) (March 26, 2026).
Only **truthy** statements are included — the current best-ranked value for each property on each item, excluding deprecated and non-preferred ranks.
## License
The data is from Wikidata and is available under [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/).
提供机构:
RichardDelome



