Labeled Dataset of Wikidata Edit History Changes
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19764414
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains manually labeled changes extracted from Wikidata's edit history, used to train and evaluate a classifier for change type classification (See ML-based Change Type Classification in Wikidata).
Note that we provide 2 files:
wikidata_edit_history_labeled_changes.csv: contains labeled changes only for the datatypes quantity, time, entity, string
wikidata_edit_history_labeled_changes_globecoordinate.csv: contains labeled changes only for the datatype globecoordinate. In this case, we labeled latitude and longitude changes separately; therefore, there are 2 label columns, one for latitude and one for longitude (label_latitude, label_longitude)
Each row corresponds to a single change and includes the following columns:
Column
Datatype
Description
revision_id
bigint
Wikidata's revision id of the change
entity_id
int
Numeric part of the Q-id of the entity
entity_label
string
Label of the entity
value_id
string
Identifier of the statement value. A property can have multiple values.
change_target
string
Can be "rank" (change in the rank), "" (change in a value) or a language code (e.g., "en"). The latter corresponds to the language for a multilingual text.
property_id
int
Numeric part of the P-id of the property
property_label
string
Label of the property
old_value
string (json)
Old value for the statement/rank
old_value_label
string
Entity label of the old value. Only applicable for changes where datatype in (wikibase-item, wikibase-entityid, wikibase-property, wikibase-lexeme, wikibase-sense, wikibase-form)
new_value
string (json)
New value for the statement/rank
new_value_label
string
Entity label of the new value. Only applicable for changes where datatype in (wikibase-item, wikibase-entityid, wikibase-property, wikibase-lexeme, wikibase-sense, wikibase-form)
datatype
string
Datatype of the values changing. Can be one of: quantity, time, globecoordinate,
monolingualtext, string, external-id, url, commonsMedia, geo-shape, tabular-data, math, musical-notation, wikibase-item, wikibase-entityid, wikibase-property, wikibase-lexeme, wikibase-sense, wikibase-form.
action
string
Edit type. Can be "UPDATE", "DELETE" or "CREATE"
target
string
Target of the change. Can be "PROPERTY_VALUE"
entity_types_31
string
List of entity labels of the values for the property instance of (P31) for the entity suffering the change.
entity_description
string
Description of the entity suffering the change
new_value_description
string
Description of the new value. Only applicable for changes where datatype in (wikibase-item, wikibase-entityid, wikibase-property, wikibase-lexeme, wikibase-sense, wikibase-form)
old_value_description
string
Description of the old value. Only applicable for changes where datatype in (wikibase-item, wikibase-entityid, wikibase-property, wikibase-lexeme, wikibase-sense, wikibase-form)
entity_types_279
string
List of entity labels of the values for the property subclass of (P279) for the entity suffering the change.
label
string
Label of the change type
Labels correspond to the change types defined below:
Label
Change Type
Description
refinement
Refinement
a property value is replaced by a more specific or precise value, without changing the statement's meaning. The refinement may add more contextual information or rephrase a text to convey the same meaning more clearly, increase numerical precision, or provide a more specific classification while remaining semantically compatible with the original value.
unrefinement
Unrefinement
a property value is replaced by a less specific or precise value, without changing the statement’s meaning. The unrefinement may remove contextual information, decrease numerical precision, or generalize to a broader classification while remaining semantically compatible with the original value.
textual change
Textual change
a property value of type text is modified to correct or introduce language errors, such as spelling, typos, or grammar, without altering sentence structure or the statement's meaning.
link_change
Link change
an entity reference is replaced by another one with a similar or identical label but representing a different concept.
re_formatting
Re-formatting
a property value's representation is modified on a surface-level, without altering its underlying meaning. This change type can vary depending on the datatype. For text values, re-formatting covers changes to visual presentation, such as spacing, capitalization, hyphenation, and other typographical elements. For quantity, re-formatting covers changes in numerical precision that do not alter the value (e.g., adding or removing trailing zeros)
property_value_update
Value update
a property value is replaced with a semantically different value, altering the statement's meaning. This includes corrections of incorrect values and updates reflecting real-world changes. Additionally, for time, quantity and globecoordinate changes, this also includes sign changes (e.g., -1 -> +1)
提供机构:
Zenodo
创建时间:
2026-04-25



