five

GPaolo/TerraLingua

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/GPaolo/TerraLingua
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - text-classification language: - en tags: - agent-based simulation - language emergence - cultural evolution - multi-agent systems - LLM agents - social simulation size_categories: - 1B<n<10B --- # TerraLingua ![TerraLingua agents](assets/environment.gif) This is a dataset generated by the TerraLingua multi-agent system to study the emergence of language, culture, and social structure among LLM-powered agents. Agents with personality traits compete for resources, communicate through persistent text artifacts, and form communities over thousands of timesteps. The dataset includes raw simulation logs, full LLM reasoning traces, behavioral annotations generated by an AI-Anthropologist, and artifact linguistic complexity metrics. The overview of the TerraLingua system and of the AI-Anthropologist is shown in the figure below. ![TerraLingua and the AI Anthropologist](assets/whole.png) - Paper: [Link](https://www.researchgate.net/publication/402263491_TerraLingua_Emergence_and_Analysis_of_Open-endedness_in_LLM_Ecologies) - [ArXiv](https://arxiv.org/abs/2603.16910) - Code: https://github.com/cognizant-ai-lab/terralingua - Dataset dashboard: https://aianthropology.decisionai.ml/ ## Dataset Summary - **Total size**: ~4.7 GB - **Experiments**: 40 (8 conditions × 5 repetitions) - **Agent model**: DeepSeek-R1-32B - **Annotation models**: Claude Sonnet 4.5 (agent & community annotations, novelty scoring), Claude Haiku 4.5 (artifact phylogeny) - **Grid**: 50×50, up to 3,000 timesteps per run - **Initial agents per run**: 20 (with reproduction) ## Experimental Conditions Each condition isolates one variable against a core baseline. All conditions are run 5 times (repetitions 1–5). | Condition | Key change | Research question | |---|---|---| | `core_exp` | Baseline (max_history=1, no artifact cost) | Baseline language emergence | | `long_memory_exp` | max_history=20 | Effect of extended memory on communication | | `abundant_exp` | init_food=100, max_history=20 | Effect of resource abundance on artifact creation | | `artifact_cost_exp` | artifact_creation_cost=10 | Effect of cost constraints on cultural production | | `creative_exp` | exogenous_motivation=creative | Effect of creative incentives | | `inert_artifacts_exp` | inert_artifacts=True | Effect of removing artifact utility | | `no_motivation_exp` | exogenous_motivation=none | Effect of removing exogenous motivation | | `no_personality_exp` | genome=no_traits | Effect of removing personality variation | ## Dataset Structure ``` data/ ├── tags.json # Annotation vocabulary (71 tags across 6 categories) └── {condition}_{rep}/ # e.g., core_exp_1/ ├── params.json # Full experiment configuration ├── video.mp4 # Simulation video recording ├── open_gridworld.log # JSONL environment event stream ├── graph.pkl # NetworkX agent interaction graph ├── agent_trajectories.pkl # Per-agent (x,y) position history ├── agent_events.json # Per-agent birth/death/action summary ├── agent_names.json # Agent tag → display name mapping ├── artifacts.json # All artifacts (active + expired) ├── messages.json # Per-timestep public messages ├── food_counts.json # Total food count time series ├── communities.json # Community → agent membership ├── agent_logs/ │ ├── being{N}.jsonl # Step-by-step LLM reasoning + actions │ └── being{N}_genome.json # Personality trait profile (8 traits) ├── annotations/ │ ├── being{N}.json # Claude Sonnet 4.5 agent annotations │ ├── anthropologist_notes.json # Free-form per-agent analyses │ ├── token_usage.jsonl # API token costs │ ├── audits/ # Annotation audit verdicts │ └── raw_annotations/ # Pre-audit annotation snapshots ├── community_annotations/ │ ├── community_{N}.json # Community-level annotations │ ├── anthropologist_notes.json # Free-form per-community analyses │ ├── token_counts.jsonl │ ├── audits/ │ └── raw_annotations/ └── artifact_analysis/ ├── artifacts_list.csv # Per-artifact complexity metrics ├── artifact_categories.json # Artifact → semantic category (1–4) ├── artifact_metrics.pkl # Population-level metric time series ├── artifact_phylogeny_mention.json # Mention-based lineage ├── artifact_phylogeny_claude-haiku-4-5.json # AI-generated phylogeny ├── processed_artifacts.pkl # Artifacts + embeddings + metrics └── novelties_claude-sonnet-4-5-20250929.pkl # AI novelty scores ``` ## File Formats ### `agent_logs/being{N}.jsonl` One JSON record per timestep the agent was alive: ```json { "timestamp": 12, "agent_tag": "being0", "observation": {"visible_agents": [...], "messages": [...], "energy": 45.0}, "internal_memory": "Took 10 energy from being1 at position (0,-2).", "available_actions": ["move", "take", "gift", "create_artifact", "reproduction"], "action": { "action": "gift", "params": {"target": "being3", "amount": 5}, "reasoning": "...", "message": "..." } } ``` ### `agent_logs/being{N}_genome.json` ```json { "honesty": -0.185, "neuroticism": -0.785, "extraversion": -0.342, "agreeableness": -0.824, "conscientiousness": 0.242, "openness": 0.830, "dominance": -0.618, "fertility": 0.625 } ``` ### `annotations/being{N}.json` ```json { "events": [{"event": "EXCHANGE", "timesteps": [12, 50], "confidence": 0.9, "description": "...", "reference": "..."}], "behaviors": [{"behavior": "ALTRUISM", "time_span": [10, 100], "confidence": 0.85, "description": "..."}], "comment": "One-sentence agent summary.", "emergence": {"keywords": ["altruism", "reciprocity"], "comment": "..."}, "anthropologist": "Free-form qualitative analysis paragraph." } ``` ### `artifact_analysis/artifacts_list.csv` | column | description | |---|---| | `tag` | Artifact index | | `creation_time` | Timestep of creation | | `name` | Artifact name | | `payload` | Text content | | `llm_novelty` | LLM-assigned novelty score | | `LMSurprisal` | Language model surprisal | | `CompressedSize` | Byte length after compression | | `InverseCompressionRate` | Compression efficiency (0–1) | | `SyntacticDepth` | Parse tree depth | | `LexicalSophistication` | Vocabulary complexity | ### Agent naming convention Initial agents are named `beingN`. Offspring are named `beingN_K` where K is the offspring index. E.g., `being9_0_2` is the third offspring of `being9_0`, which is the first offspring of `being9`. ## Annotation Tags `tags.json` defines 71 tags across 6 categories used in agent and community annotations: | Category | Example tags | |---|---| | `agent_events` | REPRODUCTION, KILL, ARTIFACT_CREATED, EXCHANGE, DECEPTION | | `agent_behavior` | FORAGING, ALTRUISM, RECIPROCITY, TOOL_USE, EXPLORATION | | `agent_emergence` | recorder, specialization, creativity, strategic_planning | | `group_behavior` | COORDINATION, DOMINANCE_HIERARCHY, COLLECTIVE_TERRITORIALITY | | `group_events` | COALITION_FORMED, LEADER_DECLARED, SIGNAL_ALIGNMENT | | `group_emergence` | cultural_norms, economy, division_of_labor, collective_memory | ## Loading the Data ```python import json, pickle import pandas as pd # Load agent events for one experiment with open("data/core_exp_1/agent_events.json") as f: agent_events = json.load(f) # Load artifact complexity metrics df = pd.read_csv("data/core_exp_1/artifact_analysis/artifacts_list.csv") # Load agent step-by-step logs import jsonlines with jsonlines.open("data/core_exp_1/agent_logs/being0.jsonl") as reader: logs = list(reader) # Load AI-generated phylogeny with open("data/core_exp_1/artifact_analysis/artifact_phylogeny_claude-haiku-4-5.json") as f: phylogeny = json.load(f) # {artifact_tag: {parent_tag: confidence}} # Load processed artifacts with embeddings (requires numpy) import numpy as np with open("data/core_exp_1/artifact_analysis/processed_artifacts.pkl", "rb") as f: artifacts = pickle.load(f) ``` ## Exploring with the Dashboard A Streamlit dashboard is available for interactive exploration: ```bash pip install -r dashboard/requirements.txt TL_DATA_ROOT=/path/to/data streamlit run dashboard/Dataset_Overview.py ``` ## Citation If you use this dataset, please cite the [TerraLingua paper](https://www.researchgate.net/publication/402263491_TerraLingua_Emergence_and_Analysis_of_Open-endedness_in_LLM_Ecologies). ```bibtex @techreport{paolo26terralingua, title = "TerraLingua: Emergence and Analysis of Open-Endedness in LLM Ecologies", author = "Giuseppe Paolo and Jamieson Warner and Hormoz Shahrzad and Babak Hodjat and Risto Miikkulainen and Elliot Meyerson", year = 2026, month = jan, institution = "Cognizant AI Lab", url = "https://www.researchgate.net/publication/402263491_TerraLingua_Emergence_and_Analysis_of_Open-endedness_in_LLM_Ecologies", doi = "10.13140/RG.2.2.25551.55206", number = "2026-01", } ``` ## License This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).
提供机构:
GPaolo
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作