five

CharacterCodex

收藏
魔搭社区2025-11-12 更新2024-06-22 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/CharacterCodex
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Character Codex ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/2qPIzxcnzXrEg66VZDjnv.png) ## Dataset Summary The Character Codex is a comprehensive dataset featuring popular characters from a wide array of media types and genres. Each entry includes detailed information about the character, the media source, and a unique scenario involving the character. This dataset is valuable for synthetic data, RAG for generative AI, writers, game developers, and fans who want to explore and utilize rich character descriptions for various creative projects. ## Dataset Structure ### Data Fields - **media_type**: The type of media the character originates from (e.g., Webcomics, Novels, Movies, TV Shows). - **genre**: The specific genre of the media type (e.g., Fantasy Webcomics, Martial Arts Fiction). - **character_name**: The name of the character. - **media_source**: The title of the media source where the character is from. - **description**: A detailed description of the character, including their role and significance in the story. - **scenario**: A creative scenario involving the character that can be used for interactive storytelling or role-playing purposes. ### Example Data ```json [ { "media_type": "Webcomics", "genre": "Fantasy Webcomics", "character_name": "Alana", "media_source": "Saga", "description": "Alana is one of the main characters from the webcomic \"Saga.\" She is a strong-willed and fiercely protective mother who is on the run with her family in a war-torn galaxy. The story blends elements of fantasy and science fiction, creating a rich and complex narrative.", "scenario": "You are a fellow traveler in the galaxy needing help, and Alana offers her assistance while sharing stories of her family's struggles and triumphs." }, { "media_type": "Novels", "genre": "Martial Arts Fiction", "character_name": "Yilin", "media_source": "The Smiling, Proud Wanderer", "description": "Yilin is a young nun from the Hengshan Sect in Jin Yong's novel \"The Smiling, Proud Wanderer.\" Known for her innocence and kindness, she becomes friends with the protagonist Linghu Chong. Her gentle nature often puts her at odds with the violent world of martial arts.", "scenario": "You are a fellow disciple of the Hengshan Sect seeking Yilin's comfort and advice after a particularly brutal conflict. Her gentle demeanor and compassionate words provide solace in a harsh world." } ] ``` # Usage ## Accessing the Data To load the dataset in your project, you can use the following code snippet: ```python from datasets import load_dataset dataset = load_dataset("NousResearch/CharacterCodex") ``` ## Use Cases - Seed Data: Useful for generating synthetic data or use in interactive experiences with generative AI. - Creative Writing: Use the detailed character descriptions and scenarios to inspire creative writing projects. - Educational: Study character development and storytelling techniques from various genres and media types. # Dataset Creation ## Data Collection The characters in this dataset were meticulously selected from a diverse range of media, ensuring a rich and varied collection. The descriptions and scenarios were crafted to provide insightful and engaging context for each character. ## Annotations Each character entry includes: - The media type (i.e. Novel, Magazine, Anime), the genre (i.e. action, historical), and the specific source/title of the media they are from (i.e. Pokemon) - A detailed description highlighting the character's role, traits, and significance. - A scenario designed to stimulate interactive and immersive experiences. # Citation ```bibtex @dataset{character_codex_2024, title={Character Codex}, author={"Teknium"}, year={2024}, note={https://huggingface.co/datasets/NousResearch/CharacterCodex} } ```

# 角色法典(Character Codex)数据集卡片 ![图像/PNG](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/2qPIzxcnzXrEg66VZDjnv.png) ## 数据集概述 角色法典(Character Codex)是一款综合性数据集,收录了来自多元媒体类型与创作流派的热门角色。每条数据均包含对应角色的详细档案、其所属的媒体作品来源,以及围绕该角色定制的专属情境设定。本数据集可用于生成式AI(Generative AI)的合成数据构建与检索增强生成(Retrieval-Augmented Generation, RAG)流程,同时能为创作者、游戏开发者,以及希望借助丰富角色描述开展各类创意项目的爱好者提供核心支撑。 ## 数据集结构 ### 数据字段 - **media_type**:角色所属的媒体类型(例如网络漫画、小说、电影、电视剧) - **genre**:该媒体类型对应的具体创作流派(例如奇幻网络漫画、武侠小说) - **character_name**:角色姓名 - **media_source**:角色所属媒体作品的官方标题 - **description**:角色的详细档案,涵盖其在剧情中的定位与核心意义 - **scenario**:围绕该角色设计的创意情境,可用于交互式叙事或角色扮演活动 ### 示例数据 json [ { "media_type": "Webcomics", "genre": "Fantasy Webcomics", "character_name": "阿拉娜", "media_source": "《萨迦》", "description": "阿拉娜是网络漫画《萨迦》的核心主角之一。她意志坚定、护犊心切,正与家人一同在战火纷飞的银河系中逃亡。该作品融合了奇幻与科幻元素,构建出层次丰富且复杂度极高的叙事世界。", "scenario": "你是一名身处银河系的旅人,正寻求帮助,阿拉娜会向你伸出援手,并与你分享其家庭经历的磨难与荣耀。" }, { "media_type": "Novels", "genre": "Martial Arts Fiction", "character_name": "仪琳", "media_source": "《笑傲江湖》", "description": "仪琳是金庸武侠小说《笑傲江湖》中衡山派的年轻尼姑。她天性纯真善良,与主角令狐冲结为好友。其温柔内敛的性格,时常与残酷的武侠世界产生冲突。", "scenario": "你是衡山派的同门师弟/师妹,在经历一场惨烈的战斗后,前来寻求仪琳的慰藉与指引。她温和的举止与共情的话语,将为你在冰冷的江湖中带来一丝安宁。" } ] ## 使用指南 ### 数据加载 如需在项目中加载本数据集,可使用以下代码片段: python from datasets import load_dataset dataset = load_dataset("NousResearch/CharacterCodex") ### 应用场景 - 种子数据:可用于生成合成数据,或为生成式AI(Generative AI)打造交互式体验 - 创意写作:借助详实的角色描述与情境设定,为各类创意写作项目提供灵感 - 学术教育:可用于研究不同流派与媒体类型下的角色塑造手法与叙事技巧 ## 数据集构建 ### 数据采集 本数据集的角色均经过严格筛选,来自多元化的媒体范畴,以确保收录内容的丰富性与多样性。每条数据的角色描述与情境设定均经过精心打磨,为每个角色提供兼具洞察力与吸引力的背景信息。 ### 标注规范 每条角色数据均包含以下内容: - 角色所属媒体类型(如小说、杂志、动画)、对应创作流派(如动作题材、历史题材),以及具体的媒体来源/作品标题(如《宝可梦》) - 详细阐述角色定位、性格特质与剧情意义的描述文本 - 旨在激发交互式沉浸式体验的专属情境设定 ## 引用格式 bibtex @dataset{character_codex_2024, title={Character Codex}, author={"Teknium"}, year={2024}, note={https://huggingface.co/datasets/NousResearch/CharacterCodex} }
提供机构:
maas
创建时间:
2024-06-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作