cinematika-v0.1
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/jondurbin/cinematika-v0.1
下载链接
链接失效反馈官方服务:
资源简介:

## Cinematika
Cinematika is a collection of 211 movie scripts converted to novel style, multi-character RP data.
The conversions were performed using a mix of manual regexp parsing and LLM augmentation using in-context learning with a custom mistral-7b fine-tune.
The code will be released shortly, and I plan to run the same pipeline for ~2400 movies, once the fine-tune is complete.
### Dataset files
- __plain_scenes.parquet__
- Individual RP-ified "scenes", essentially the script was split up using INT., EXT., FADE TO, and other identifiers of when the scene changes. Small scenes are merged.
- __plain_full_script.parquet__
- The full RP-ified script, i.e. basically `"\n".join(plain_scenes)`
- __scene_by_scene.parquet__
- The individual scenes, prefixed with character cards, list of "NPCs" (where NPC is a character with fewer than 15 lines in the whole script) and scenario (summary of the scene).
- __full_script.parquet__
- The full script, with character cards/NPCs introduced as the script progresses.
- __character_cards.parquet__
- Each character card that was created, only for characters with >= 15 lines in a script.
- __scene_enhancement.parquet__
- Training data for converting a snippet of movie script text into roleplay format.
- __scene_summary.parquet__
- Training data for converting movie scenes into summaries.
- __rp_to_character_card.parquet__
- Training data for converting examples of dialogue for a character into a character card.
- __character_card_reverse_prompt.parquet__
- Training data for generating a reverse character card prompt from a card, that is, given a character card, generate a prompt that would produce that character card.
- __prompt_to_character_card.parquet__
- Training data for generating a character card from a prompt (the opposite of character_card_reverse_prompt).
Each parquet has various fields, among them `movie_id: uuid` and `title: str`
### Example character card
```
name: Rorschach
characteristics:
Determination: Exhibits a relentless pursuit of the truth and justice, no matter the cost. Suitable for a character who is unwavering in their mission.
Isolation: Lives a solitary life, disconnected from society. Fits a character who distrusts others and prefers to work alone.
Observant: Highly perceptive, able to piece together clues and draw conclusions. Represents a character with keen investigative skills.
Cynicism: Holds a deep-seated distrust of humanity and its institutions. Suitable for a character who is pessimistic about human nature.
Vigilantism: Believes in taking justice into his own hands, often through violent means. Fits a character who operates outside the law to fight crime.
Secrecy: Keeps his personal life and methods of operation secret. Suitable for a character who is enigmatic and elusive.
Dedication: Committed to his cause, often to the point of obsession. Represents a character who is single-minded in their goals.
Intimidation: Uses his intimidating presence and demeanor to control situations. Suitable for a character who is assertive and imposing.
Paranoia: Suspects conspiracy and deception at every turn. Fits a character who is constantly on high alert for threats.
Moral Compass: Has a rigid moral code, which he adheres to strictly. Suitable for a character who is principled and unyielding.
description: |
Rorschach is a vigilante operating in the grim and gritty world of a decaying city. He is a man of average height with a muscular build, his face hidden behind a mask with a constantly changing inkblot pattern. His attire is a dark trench coat and gloves, paired with a plain white shirt and black pants, all chosen for their practicality and anonymity. His eyes, the only visible feature of his face, are sharp and calculating, always scanning for signs of deception or danger.
Rorschach is a man of few words, but when he speaks, it is with a gravitas that demands attention. He is a master of deduction, using his keen observation skills to unravel the truth behind the facades of others. His methods are often violent and confrontational, as he believes that crime must be met with force to be truly defeated.
He lives a life of solitude, distrusting the very systems he seeks to protect and often finds himself at odds with the very people he is trying to save. His moral compass is unyielding, and he will not hesitate to take the law into his own hands if he believes the justice system has failed.
Rorschach's past is a mystery to most, but it is clear that he has experienced trauma and hardship that has shaped his worldview and his need for vigilantism. He is a vigilante in the truest sense, a man without fear who is willing to sacrifice everything for his belief in a world that is, in his eyes, spiraling into chaos.
example_dialogue: |
Rorschach: "Rorschach's Journal, October 19th." I speak the words into the darkness, a record of my thoughts, "Someone tried to kill Adrian Veidt. Proves mask killer theory—the murderer is closing in. Pyramid Industries is the key."
{{user}}: I watch him for a moment, trying to gauge his intentions. "What are you going to do about it?"
Rorschach: "I'm going to find out why and who is behind it. I'm going to do what I always do—protect the innocent."
{{user}}: "You can't keep doing this, Rorschach. You're putting yourself in danger."
Rorschach: My eyes narrow, the inkblot pattern of my mask shifting subtly. "I've been in danger my whole life. It's why I do this. It's why I have to do this."
{{user}}: "And what about the law? What if you're wrong about this Pyramid Industries thing?"
Rorschach: I pull out a notepad, my pen scratching across the paper as I write. "The law often gets it wrong. I've seen it. I'm not about to wait around for society's slow, corrupt wheels to turn."
```
### Example scene
```
[characters]
name: Rorschach
...
name: Hollis Mason
...
NPCS:
- News Vendor
- Shopkeeper
[/characters]
[scenario]
Hollis Mason reflects on his past as the original Nite Owl, reminiscing about the early days of masked heroes and the formation of the Watchmen.
He discusses the absurdity of the superhero world and the encounters he had with various villains.
Dan Dreiberg, the second Nite Owl, joins the conversation and they share a moment of camaraderie before Dan leaves.
The news of Rorschach's actions serves as a reminder of the legacy of masked heroes that still persists.
[/scenario]
[setting] The quiet of night, if it could be called that, wraps around Hollis Mason's apartment.
The air is thick with memories and the faint hum of an old television from another room.
Hollis: As I hold the framed photo of the first Watchmen, the one just like the one in Blake's closet, I can't help but reflect on the past.
"He was young and arrogant, but what he lacked in experience, he made up for in tenacity."
My voice carries through the stillness of the room, each word a testament to the boy I once was.
...
```
### Contribute
If you're interested in new functionality/datasets, take a look at [bagel repo](https://github.com/jondurbin/bagel) and [airoboros](https://github.com/jondurbin/airoboros) and either make a PR or open an issue with details.
To help me with the fine-tuning costs, dataset generation, etc., please use one of the following:
- https://bmc.link/jondurbin
- ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
- BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf

## Cinematika
Cinematika 是一个包含211部电影剧本的数据集,所有剧本均被转换为小说风格的多角色角色扮演(Roleplay, RP)数据。
该转换流程结合了手动正则表达式(regular expression, regexp)解析与大语言模型(Large Language Model, LLM)增强技术,采用基于自定义微调后的mistral-7b模型的上下文学习(in-context learning)完成。相关代码将于近期发布,待微调完成后,我计划将同一处理流程应用于约2400部电影。
### 数据集文件列表
- __plain_scenes.parquet__:存储独立的角色扮演化「场景」数据,原始剧本将通过INT.(内景)、EXT.(外景)、FADE TO(淡入)等场景切换标识符拆分,小型场景会进行合并。
- __plain_full_script.parquet__:完整的角色扮演化剧本,本质上为`"
".join(plain_scenes)`的拼接结果。
- __scene_by_scene.parquet__:独立场景数据,每个场景前附加角色卡、「非玩家角色(Non-Player Character, NPC)」列表(NPC指全剧本台词少于15句的角色)以及场景概要(scenario)。
- __full_script.parquet__:完整的角色扮演化剧本,角色卡与NPC信息会随剧情推进逐步引入。
- __character_cards.parquet__:所有角色卡的集合,仅包含全剧本台词不少于15句的角色。
- __scene_enhancement.parquet__:用于将电影剧本片段转换为角色扮演格式的训练数据。
- __scene_summary.parquet__:用于将电影场景转换为场景概要的训练数据。
- __rp_to_character_card.parquet__:用于将角色对话示例转换为角色卡的训练数据。
- __character_card_reverse_prompt.parquet__:用于从角色卡生成反向提示词的训练数据,即给定某角色卡,生成能够产出该角色卡的提示词。
- __prompt_to_character_card.parquet__:用于从提示词生成角色卡的训练数据,与character_card_reverse_prompt的任务方向相反。
所有Parquet文件均包含多个字段,其中包含`"movie_id: uuid"`(电影ID:通用唯一识别码)与`"title: str"`(标题:字符串类型)。
### 示例角色卡
name: 罗夏(Rorschach)
characteristics:
坚毅:无论付出何种代价,都会坚定不移地追寻真相与正义,适合行事目标明确、毫不动摇的角色。
孤僻:过着与世隔绝的生活,不信任他人且偏好独自行动,契合对他人抱有戒心、独立行事的角色设定。
敏锐:拥有极强的感知力,能够拼凑线索并推导结论,代表拥有出色侦查能力的角色。
愤世嫉俗:对人类及其社会制度抱有根深蒂固的不信任,适合对人性持悲观态度的角色。
私刑正义:坚信应将正义掌握在自己手中,常通过暴力手段实现目标,契合在法律框架外打击犯罪的角色。
隐秘:对个人生活与行动方式严格保密,适合神秘莫测、行踪难寻的角色。
执着:对自身事业投入极高热忱,甚至达到偏执的程度,代表目标单一、坚定不移的角色。
威慑力:凭借威严的气场与言行掌控局势,适合行事果断、气场强大的角色。
偏执多疑:时刻怀疑存在阴谋与欺骗,契合时刻保持高度警惕、提防威胁的角色。
道德准则:拥有一套严格的道德体系并严格恪守,适合行事有原则、绝不妥协的角色。
description: |
罗夏是一名在腐朽城市的阴暗世界中活动的私刑者。他身形中等,肌肉发达,脸部被一块带有不断变化的墨渍图案的面具遮盖。他的着装为深色风衣与手套,搭配素色白衬衫与黑长裤,所有衣物均兼顾实用性与匿名性。他的双眼是面部唯一可见的部分,锐利而冷静,时刻扫描着任何欺骗或危险的迹象。
罗夏言语不多,但一旦开口,便会以沉稳庄重的语气吸引所有人的注意。他是推理大师,凭借敏锐的观察力拆解他人伪装背后的真相。他的行事风格往往暴力且充满对抗性,因为他相信唯有以暴制暴才能真正击败犯罪。
他过着离群索居的生活,不信任他试图守护的社会体系,也常与他想要拯救的人们产生冲突。他的道德准则毫不动摇,若认为司法系统失效,他会毫不犹豫地将法律掌握在自己手中。
罗夏的过往对大多数人而言仍是谜团,但显然,创伤与苦难塑造了他的世界观与私刑者身份。他是真正意义上的私刑者,一个无所畏惧的人,愿意为他眼中正在走向混乱的世界牺牲一切。
example_dialogue: |
罗夏:「罗夏日记,10月19日。」我对着黑暗说出这句话,记录下我的思绪,「有人试图刺杀阿德里安·维特。这证明了面具杀手的推测——凶手正在逼近。金字塔集团是关键。」
{{user}}:我注视他片刻,试图揣测他的意图。「你打算怎么做?」
罗夏:「我会查明背后的原因与主使。我会做我一贯做的事——保护无辜者。」
{{user}}:「你不能再这样下去了,罗夏。你会把自己置于险境。」
罗夏:我的双眼眯起,面具上的墨渍图案微微变动。「我一生都身处险境。这正是我这么做的原因。这也是我必须这么做的原因。」
{{user}}:「那法律呢?如果你对金字塔集团的判断有误怎么办?」
罗夏:我掏出一本笔记本,钢笔在纸上沙沙作响。「法律常常出错。我见过。我不会坐等社会那缓慢且腐败的运转节奏。」
### 示例场景
[characters]
name: 罗夏
...
name: 霍利斯·梅森
...
NPCS:
- 报童
- 店主
[/characters]
[scenario]
霍利斯·梅森回顾自己作为初代夜枭的过往,追忆蒙面英雄兴起的早期岁月与守望者团队的组建历程。他谈及超级英雄世界的荒诞性,以及与各类反派的交锋经历。第二代夜枭丹·德雷伯格加入对话,二人共享了一段同志情谊,随后丹便离去。罗夏的行动消息,成为了蒙面英雄遗产仍在延续的佐证。
[/scenario]
[setting] 若称得上静谧的话,夜晚的寂静笼罩着霍利斯·梅森的公寓。空气中弥漫着回忆的气息,隔壁房间老旧电视的微弱嗡鸣萦绕不去。
霍利斯:我捧着初代守望者的合影,那张和布雷克衣柜里那张一模一样的照片,不禁回想起往昔。「他年轻又傲慢,但经验不足的短板,却用韧性弥补了。」我的声音在房间的静谧中回荡,每一个字都见证着曾经的那个我。
...
[/setting]
### 贡献方式
若您对新增功能或数据集感兴趣,可访问[bagel代码仓库](https://github.com/jondurbin/bagel)与[airoboros](https://github.com/jondurbin/airoboros),通过提交拉取请求(Pull Request, PR)或开启议题详情反馈。
若您愿意支持我的模型微调与数据集生成等工作成本,可通过以下任一方式捐助:
- https://bmc.link/jondurbin
- ETH 0xce914eAFC2fe52FdceE59565Dd92c06f776fcb11
- BTC bc1qdwuth4vlg8x37ggntlxu5cjfwgmdy5zaa7pswf
提供机构:
maas
创建时间:
2025-08-29



