five

AutomatedScientist/wikikg-trajectories

收藏
Hugging Face2025-12-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AutomatedScientist/wikikg-trajectories
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation tags: - knowledge-graph - tool-calling - trajectories size_categories: - 1M<n<10M configs: - config_name: triplets data_files: triplets_all.jsonl default: true - config_name: trajectories_query_relations data_files: trajectories_query_relations_1m.jsonl - config_name: trajectories_get_neighbors data_files: trajectories_get_neighbors_1m.jsonl - config_name: paths data_files: paths_1m.jsonl --- # WikiKG Trajectories 2M tool-calling trajectories + 366k triplets from Wikipedia knowledge graph. ## Configurations | Config | Records | Size | Description | |--------|---------|------|-------------| | `triplets` (default) | 365,923 | 30 MB | Subject-relation-object triplets | | `trajectories_query_relations` | 1,000,000 | 5.1 GB | Tool-call conversations (query_relations style) | | `trajectories_get_neighbors` | 1,000,000 | 5.0 GB | Tool-call conversations (get_neighbors style) | | `paths` | 1,500,000 | 230 MB | Random walk paths through the graph | ## Tool-Calling Styles Each trajectory is a multi-turn conversation where an LLM calls tools to traverse the knowledge graph: - **query_relations**: Calls `query_relations(subject, obj, rel_type)` to filter triplets by entity/relation - **get_neighbors**: Calls `get_neighbors(entity, direction)` for graph exploration ## Example Trajectory ```json { "messages": [ {"role": "user", "content": "Starting from Einstein, follow FIELD_OF forward, then STUDIED_BY backward. What's the final entity?"}, {"role": "assistant", "tool_calls": [{"function": {"name": "query_relations", "arguments": "{\"subject\": \"Einstein\", \"rel_type\": \"FIELD_OF\"}"}}]}, {"role": "tool", "content": "[{\"subject\": \"Einstein\", \"relation\": \"FIELD_OF\", \"object\": \"Physics\"}]"}, ... ], "metadata": { "path_entities": ["Einstein", "Physics", "Feynman"], "path_relations": ["FIELD_OF", "STUDIED_BY"], "num_hops": 2 } } ``` ## Example Triplet ```json {"subject": "GirardDesargues", "relation": "CREATOR_OF", "object": "DesarguesianPlane"} ``` ## Source Generated from [wiki-kg-dataset](https://huggingface.co/datasets/amanrangapur/wiki-kg-dataset) containing 45,416 Wikipedia articles and 180+ relation types. ## Usage ```python from datasets import load_dataset # Load default (triplets) ds = load_dataset("AutomatedScientist/wikikg-trajectories") # Load specific config ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_query_relations") ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_get_neighbors") ds = load_dataset("AutomatedScientist/wikikg-trajectories", "paths") ```

许可证:MIT 任务类别: - 文本生成 标签: - 知识图谱(knowledge-graph) - 工具调用(tool-calling) - 轨迹(trajectories) 数据规模: - 100万<数据量<1000万 配置项: - 配置名称:三元组(triplets),数据文件:triplets_all.jsonl,设为默认 - 配置名称:轨迹-查询关系(trajectories_query_relations),数据文件:trajectories_query_relations_1m.jsonl - 配置名称:轨迹-获取邻居(trajectories_get_neighbors),数据文件:trajectories_get_neighbors_1m.jsonl - 配置名称:路径(paths),数据文件:paths_1m.jsonl # 维基知识图谱轨迹(WikiKG Trajectories) 本数据集包含200万条工具调用轨迹与来自维基百科知识图谱的36.6万条三元组。 ## 配置项说明 | 配置名称 | 数据条数 | 大小 | 描述 | |--------|---------|------|-------------| | `triplets`(默认配置) | 365,923 | 30 MB | 主体-关系-客体三元组(subject-relation-object triplets) | | `trajectories_query_relations` | 1,000,000 | 5.1 GB | 工具调用对话(query_relations风格) | | `trajectories_get_neighbors` | 1,000,000 | 5.0 GB | 工具调用对话(get_neighbors风格) | | `paths` | 1,500,000 | 230 MB | 知识图谱上的随机游走路径 | ## 工具调用风格 每条轨迹均为多轮对话,其中大语言模型(Large Language Model,LLM)会调用工具以遍历知识图谱: - **query_relations**:调用`query_relations(subject, obj, rel_type)`接口,通过实体或关系筛选三元组 - **get_neighbors**:调用`get_neighbors(entity, direction)`接口以进行图谱探索 ## 示例轨迹 json { "messages": [ {"role": "user", "content": "Starting from Einstein, follow FIELD_OF forward, then STUDIED_BY backward. What's the final entity?"}, {"role": "assistant", "tool_calls": [{"function": {"name": "query_relations", "arguments": "{"subject": "Einstein", "rel_type": "FIELD_OF"}"}}]}, {"role": "tool", "content": "[{"subject": "Einstein", "relation": "FIELD_OF", "object": "Physics"}]"}, ... ], "metadata": { "path_entities": ["Einstein", "Physics", "Feynman"], "path_relations": ["FIELD_OF", "STUDIED_BY"], "num_hops": 2 } } ## 示例三元组 json {"subject": "GirardDesargues", "relation": "CREATOR_OF", "object": "DesarguesianPlane"} ## 数据集来源 本数据集基于[wiki-kg-dataset](https://huggingface.co/datasets/amanrangapur/wiki-kg-dataset)生成,该原始数据集涵盖45,416篇维基百科文章与180余种关系类型。 ## 使用方法 python from datasets import load_dataset # 加载默认配置(三元组数据集) ds = load_dataset("AutomatedScientist/wikikg-trajectories") # 加载指定配置 ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_query_relations") ds = load_dataset("AutomatedScientist/wikikg-trajectories", "trajectories_get_neighbors") ds = load_dataset("AutomatedScientist/wikikg-trajectories", "paths")
提供机构:
AutomatedScientist
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作