AutomatedScientist/wikikg-trajectories
收藏Hugging Face2025-12-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AutomatedScientist/wikikg-trajectories
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
tags:
- knowledge-graph
- tool-calling
- trajectories
size_categories:
- 1M<n<10M
configs:
- config_name: triplets
data_files: triplets_all.jsonl
default: true
- config_name: trajectories_query_relations
data_files: trajectories_query_relations_1m.jsonl
- config_name: trajectories_get_neighbors
data_files: trajectories_get_neighbors_1m.jsonl
- config_name: paths
data_files: paths_1m.jsonl
---
# WikiKG Trajectories
2M tool-calling trajectories + 366k triplets from Wikipedia knowledge graph.
## Configurations
| Config | Records | Size | Description |
|--------|---------|------|-------------|
| `triplets` (default) | 365,923 | 30 MB | Subject-relation-object triplets |
| `trajectories_query_relations` | 1,000,000 | 5.1 GB | Tool-call conversations (query_relations style) |
| `trajectories_get_neighbors` | 1,000,000 | 5.0 GB | Tool-call conversations (get_neighbors style) |
| `paths` | 1,500,000 | 230 MB | Random walk paths through the graph |
## Tool-Calling Styles
Each trajectory is a multi-turn conversation where an LLM calls tools to traverse the knowledge graph:
- **query_relations**: Calls `query_relations(subject, obj, rel_type)` to filter triplets by entity/relation
- **get_neighbors**: Calls `get_neighbors(entity, direction)` for graph exploration
## Example Trajectory
```json
{
"messages": [
{"role": "user", "content": "Starting from Einstein, follow FIELD_OF forward, then STUDIED_BY backward. What's the final entity?"},
{"role": "assistant", "tool_calls": [{"function": {"name": "query_relations", "arguments": "{\"subject\": \"Einstein\", \"rel_type\": \"FIELD_OF\"}"}}]},
{"role": "tool", "content": "[{\"subject\": \"Einstein\", \"relation\": \"FIELD_OF\", \"object\": \"Physics\"}]"},
...
],
"metadata": {
"path_entities": ["Einstein", "Physics", "Feynman"],
"path_relations": ["FIELD_OF", "STUDIED_BY"],
"num_hops": 2
}
}
```
## Example Triplet
```json
{"subject": "GirardDesargues", "relation": "CREATOR_OF", "object": "DesarguesianPlane"}
```
## Source
Generated from [wiki-kg-dataset](https://huggingface.co/datasets/amanrangapur/wiki-kg-dataset) containing 45,416 Wikipedia articles and 180+ relation types.
## Usage
```python
from datasets import load_dataset
# Load default (triplets)
ds = load_dataset("AutomatedScientist/wikikg-trajectories")
# Load specific config
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_query_relations")
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_get_neighbors")
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "paths")
```
许可证:MIT
任务类别:
- 文本生成
标签:
- 知识图谱(knowledge-graph)
- 工具调用(tool-calling)
- 轨迹(trajectories)
数据规模:
- 100万<数据量<1000万
配置项:
- 配置名称:三元组(triplets),数据文件:triplets_all.jsonl,设为默认
- 配置名称:轨迹-查询关系(trajectories_query_relations),数据文件:trajectories_query_relations_1m.jsonl
- 配置名称:轨迹-获取邻居(trajectories_get_neighbors),数据文件:trajectories_get_neighbors_1m.jsonl
- 配置名称:路径(paths),数据文件:paths_1m.jsonl
# 维基知识图谱轨迹(WikiKG Trajectories)
本数据集包含200万条工具调用轨迹与来自维基百科知识图谱的36.6万条三元组。
## 配置项说明
| 配置名称 | 数据条数 | 大小 | 描述 |
|--------|---------|------|-------------|
| `triplets`(默认配置) | 365,923 | 30 MB | 主体-关系-客体三元组(subject-relation-object triplets) |
| `trajectories_query_relations` | 1,000,000 | 5.1 GB | 工具调用对话(query_relations风格) |
| `trajectories_get_neighbors` | 1,000,000 | 5.0 GB | 工具调用对话(get_neighbors风格) |
| `paths` | 1,500,000 | 230 MB | 知识图谱上的随机游走路径 |
## 工具调用风格
每条轨迹均为多轮对话,其中大语言模型(Large Language Model,LLM)会调用工具以遍历知识图谱:
- **query_relations**:调用`query_relations(subject, obj, rel_type)`接口,通过实体或关系筛选三元组
- **get_neighbors**:调用`get_neighbors(entity, direction)`接口以进行图谱探索
## 示例轨迹
json
{
"messages": [
{"role": "user", "content": "Starting from Einstein, follow FIELD_OF forward, then STUDIED_BY backward. What's the final entity?"},
{"role": "assistant", "tool_calls": [{"function": {"name": "query_relations", "arguments": "{"subject": "Einstein", "rel_type": "FIELD_OF"}"}}]},
{"role": "tool", "content": "[{"subject": "Einstein", "relation": "FIELD_OF", "object": "Physics"}]"},
...
],
"metadata": {
"path_entities": ["Einstein", "Physics", "Feynman"],
"path_relations": ["FIELD_OF", "STUDIED_BY"],
"num_hops": 2
}
}
## 示例三元组
json
{"subject": "GirardDesargues", "relation": "CREATOR_OF", "object": "DesarguesianPlane"}
## 数据集来源
本数据集基于[wiki-kg-dataset](https://huggingface.co/datasets/amanrangapur/wiki-kg-dataset)生成,该原始数据集涵盖45,416篇维基百科文章与180余种关系类型。
## 使用方法
python
from datasets import load_dataset
# 加载默认配置(三元组数据集)
ds = load_dataset("AutomatedScientist/wikikg-trajectories")
# 加载指定配置
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_query_relations")
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_get_neighbors")
ds = load_dataset("AutomatedScientist/wikikg-trajectories", "paths")
提供机构:
AutomatedScientist



