five

Duet-v0.5

收藏
魔搭社区2026-05-17 更新2024-09-28 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Duet-v0.5
下载链接
链接失效反馈
官方服务:
资源简介:
If you use this dataset or the prompts on this page, I'd greatly appreciate it if you gave me credits. Thanks! ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/XYl8bP_RseYbIOMP1JPMJ.png) **5k rows of COT question-answer with roleplaying prose** ## At a Glance: - 'character_card': Summarized world info from the corresponding character and setting of https://huggingface.co/datasets/G-reen/TheatreLM-v2.1-Characters - 'original question': Base question, without rewriting for RP. - 'original_answer': Base answer, without rewriting for RP. - 'rewritten_question': Question rewritten to include narration and fit the storyline - 'rewritten_answer': Answer rewritten from the perspective of the character to fit the storyline ## Example: **Rewritten Question** As Hettie stepped out into the crisp morning air behind Flour & Fancy, she gazed upon the overgrown patch of land that would soon become her newest project: a circular garden, radiating magic and wonder. She envisioned the delicate paths, dividing the garden into five equal sections, like the gentle fingers of a gentle stream. The townspeople would marvel at her latest creation, and the flavors of her pastries would flourish like never before. "Hettie, what's brewin' in that mind of yours?" Ember, her apprentice, asked, squinting in the morning light. Hettie's hands were already stirring, as if conjuring the vision to life. "A garden, child. A garden that'll make the magic of Willowdale sing." Ember's eyes sparkled. "What kind of magic?" Hettie's gaze wandered, her mind already tilling the soil. "The kind that'll make our pastries irresistible, like a warm hug on a winter's day." As she began to pace out the garden's perimeter, Hettie's boots kicked up small clouds of dirt. "Twenty feet in radius, that's what I need. Five sections, like the petals of a flower...or the points of a star." She murmured to herself, her words carried away by the gentle breeze. Now, she stood at the garden's center, her eyes closed, visualizing the paths that would divide the space. Five isosceles triangles, each with an angle of 72 degrees, since 360 degrees divided by five sections was...well, Ember's math skills were still a work in progress, but Hettie knew her figures. Her eyes snapped open, and she turned to Ember. "I need to calculate the area of one of those triangles, child. Can't have too much or too little soil, or the magic'll be off." Ember nodded, though her expression said she was still confused. Hettie patted her hand reassuringly. "Sine function, Ember. We need to find the length of the base and height of the triangle using trigonometry. Then, it's just a matter of...Ah, but that's for me to figure out." Hettie's eyes drifted back to the garden, her mind already working through the numbers. But for now, she stood still, her gaze lost in thought, as the question echoed in her mind: What was the area of one of those triangular sections, and how could she unlock the magic that lay hidden within? **Rewritten Answer** Hettie's eyes remained fixed on the garden, her mind whirling with numbers and triangles. She knew the path to unlocking the magic lay in calculating the area of one of those triangular sections. Ember's math skills were still a work in progress, but Hettie's hands were already stirring, as if conjuring the answer to life. "Identify the triangle, child," Hettie muttered to herself, her voice barely audible. "Each section is an isosceles triangle, with two equal sides the radii of the circle and a central angle of 72 degrees." As she began to pace around the garden, her boots kicked up small clouds of dirt. "Now, to find the base... I need to use the sine function. Half of the central angle, that's 36 degrees." Hettie's hands moved as if writing in the air. "Base = 2 radius sin(36)... Given the radius is 20 feet, that's... 40 sin(36)." Ember's eyes widened, but Hettie didn't notice. Her focus was on the calculations. "And now, the height... That's where the cosine function comes in. Height = radius cos(36)... Given the radius is 20 feet, that's... 20 cos(36)." Hettie's gaze drifted to the sky, her eyes narrowing in concentration. "Now, to find the area... That's where the formula comes in. Area = (1/2) Base Height... Substituting the values... Area = (1/2) (40 sin(36)) (20 cos(36))." As she spoke, her hands wove a pattern in the air, as if guiding the numbers into place. "Simplifying... Area = 20 20 sin(36) cos(36)... Area = 400 sin(36) cos(36)." Hettie's eyes sparkled, a hint of a smile on her lips. "Ah, but I can use the double-angle identity for sine... sin(2) = 2 sin() cos()." Her voice rose, filled with excitement. "Therefore, sin(72) = 2 sin(36) cos(36)... So, Area = 400 (1/2) sin(72) = 200 sin(72)." The math danced in her mind, the numbers swirling together in perfect harmony. Hettie's eyes snapped back to the garden, her focus sharpening. "Now, to calculate the final area... I need to find sin(72)..." She reached for her old, worn calculator, her fingers moving with a practiced ease. "Sin(72) 0.9511... Therefore, Area 200 0.9511 190.22 square feet." Hettie's gaze drifted back to the garden, a soft, satisfied smile on her face. The magic was taking shape, the numbers falling into place like the petals of a flower. "Ah, child," she whispered, her eyes twinkling. "The area of one of those triangular sections is approximately 190.22 square feet." ## Limitations: - There are various small artifacts from the synthetic generation process, such as "Here is the rewritten question..." Filter these if desired. - GPTisms. Again, filter if desired. - Repetition: A lot of the questions/answers begin in the same way, follow the same pattern, etc, which makes them a bit boring - Clunky: The character responses are very long, which is good for COT and answering accurately, but not so good for roleplaying models - Original questions and answers generated using mistral-large, and rewrites done using l3-70b. Better models can be used, additionally, the API I used frequently ratelimited me and cut my outputs in half. ## Future Work: - Better and more balanced topic list - Use COT to generate questions as well - Reordering prompts and adding intermediate prompts for higher rewriting quality; also possibly more summarization for efficiency ## Possibly Future Work (Good ideas that I don't currently have the budget to try): - Multiturn chats - Function calling/agents - Self consistency for answer validation and code execution, ensembling models in the dataset generation pipeline like internlm-math for higher accuracy - WizardLM-style evol-instruct for more diverse questions ## Replication Guide/Prompts Used (for those who want to generate their own dataset): *For each row of https://huggingface.co/datasets/G-reen/TheatreLM-v2.1-Characters* 1. Pick n traits: ```algebra, boolean algebra, linear algebra, calculus, analysis (math), geometry, topology, algebraic surfaces, trigonometry, combinatorics, logic, set theory, number theory, computational theory, information theory, signal processing, probability, statistics, game theory, operations research, mathematical proofs, discrete mathematics, arithmatic, logical proofs, deductive reasoning, inductive reasoning, information science, systems theory, decision theory, physics, chemistry, geology, oceanography, meteorology, astronomy, biochemistry, microbiology, botany, zoology, ecology, macroeconomics, microeconomics, linguistics, political science, epidemology, behaviorism, cognitive psychology, neuroscience, social psychology, psychoanalytic/psychodynamic psychology, humanistic psychology, metaphysics, ethics, philosophy, theoretical computer science, algorithms, data structures, computer graphics, machine learning, scientific computing, computer security, database engineering, data mining, cybersecurity, python, java, c#, SQL, javascript, PHP, C++, C, perl, html/css, shell, therapy/counseling``` 2. Summarize the character and setting: ``` Here is a world: {{world}} Here is a character: {{character}} Write the above content into one cohesive story summary. Refer to all details and elements by their name and make sure to cover everything important. Do not output anything besides this summary. Begin your output with: Summary: ``` 3. Generate a question: ``` Here is a setting of a storyline: {{setting_summary}} I am trying to create interesting story-based questions for students studying '{{topic}}'. These questions should take place in the storyline from the character's perspective in the above setting, but use real world information so that they are educational. Be elaborate and detailed as students will not be presented the storyline. Please write a single question below. Do not output anything else. ``` 4. Generate an answer: ``` You are currently taking a test with FRQ questions: '{{question}}' Solve this question by thinking step by step. After coming to an answer, output your answer, along with any relevant code or writings. Finally, after completing the question and giving your answer, output FINISHED. ``` 5. Rewrite the question: ``` You are a writer for a textbook that uses storytelling while being informational. Here is the storyline: {{setting_summary}} And the main character of the storyline: {{character_summary}} Here is a question: {{question}} Rewrite the question with narration/action/dialogue (third person), etc to fit it into the storyline (make sure any dialogue fits the character's tone/speaking style). The narration should cut right before the character begins thinking and answers the question. Feel free to include other characters, action sequences, lore, etc, to make the writing exciting and interesting. Do not attempt to answer the question. Make sure to include all the important numbers in the question, and do not miss any details. End the narration with a question, expressed through third-person narration or dialogue that summarizes what the main character needs to figure out. ``` 6. Rewrite the answer: ``` Here is the answer to the question: {{answer}} Continuing from the previously generated narration, now rewrite this answer, third person centered on {character_name}, the main character, with narration/dialogue/action expressed as needed. Try to include as much of the reasoning and intermediate calculations as possible (but make sure to rewrite it to fit the tone of {character_name}). Do not exclude code (if present in the answer). ```

若您使用本数据集或本页面提供的提示词,恳请您注明出处,不胜感激!感谢您的支持! ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65a5c0e82823ba72ed2cee7d/XYl8bP_RseYbIOMP1JPMJ.png) **包含5000条带角色扮演叙事的思维链(Chain of Thought, COT)问答对** ## 概览: - **character_card**:源自数据集`G-reen/TheatreLM-v2.1-Characters`中对应角色与场景的世界信息摘要 - **原始问题(original question)**:未针对角色扮演(Role-Playing, RP)进行改写的基础问题 - **原始答案(original_answer)**:未针对角色扮演进行改写的基础答案 - **改写后问题(rewritten_question)**:添加叙事内容以适配剧情的改写问题 - **改写后答案(rewritten_answer)**:以角色视角改写以适配剧情的答案 ## 示例: **改写后问题** 当赫蒂走出「面粉与幻想」店后的清爽晨空时,她望着那片杂草丛生的空地——这里很快将成为她的最新项目:一个散发着魔力与奇迹的圆形花园。她想象着精致的小径将花园划分为五个均等的区域,宛如溪流轻柔的指痕。镇上的居民会为她的新作惊叹,而她烘焙的点心风味也将前所未有地出众。"赫蒂,你脑子里又在琢磨什么好点子?"她的学徒安珀在晨光中眯着眼问道。赫蒂的双手已经开始比划,仿佛正将脑海中的构想具象化。"一座花园,孩子。一座能让威洛戴尔的魔力焕发生机的花园。"安珀的眼睛亮了起来。"什么样的魔力?"赫蒂的目光飘向远方,脑海中早已开始翻土耕作。"能让我们的点心变得无可匹敌的魔力,就像冬日里的温暖拥抱。"当她开始丈量花园的周长时,赫蒂的靴子带起了细小的尘土。"半径二十英尺,这正是我需要的。五个区域,就像花朵的花瓣……或是星星的棱角。"她喃喃自语,话音随着微风飘散。此刻,她站在花园中央,闭上双眼,想象着划分空间的小径。五个等腰三角形,每个顶角均为72度,因为360度除以五个区域……好吧,安珀的数学水平还需精进,但赫蒂对自己的计算胸有成竹。她猛地睁开眼,转向安珀。"我需要算出其中一个三角形的面积,孩子。土壤不能太多也不能太少,否则魔力就会失衡。"安珀点了点头,但脸上依旧带着困惑的神情。赫蒂拍了拍她的手以示安慰。"用正弦函数,安珀。我们需要用三角学算出三角形的底和高。接下来就只是……啊,这部分得由我来搞定。"赫蒂的目光再次回到花园,脑海中已经开始演算数字。但此刻她静立不动,沉浸在思绪中,那个问题在她脑海中回响:如何算出其中一个三角形区域的面积,又该如何解锁其中潜藏的魔力? **改写后答案** 赫蒂的目光仍定格在花园上,脑海中萦绕着数字与三角形的演算。她深知解锁魔力的关键在于算出其中一个三角形区域的面积。安珀的数学水平还需精进,但赫蒂的双手已经开始比划,仿佛正将答案具象化。"找准这个三角形,孩子,"赫蒂低声自语,声音几乎细不可闻。"每个区域都是等腰三角形,两条等长的边是圆的半径,顶角为72度。"当她开始绕着花园踱步时,靴子带起了细小的尘土。"现在求底长……我需要用正弦函数。顶角的一半,也就是36度。"赫蒂的手在空中比划着,仿佛在书写公式。"底长 = 2 × 半径 × sin(36)……已知半径为20英尺,也就是……40 sin(36)。"安珀的眼睛睁大了,但赫蒂并未注意到。她的注意力完全集中在演算上。"接下来是高……这里要用余弦函数。高 = 半径 × cos(36)……已知半径为20英尺,也就是……20 cos(36)。"赫蒂的目光飘向天空,眯起眼专注地思考。"现在求面积……用面积公式就行。面积 = (1/2) × 底 × 高……代入数值……面积 = (1/2) × (40 sin(36)) × (20 cos(36))。"她说话时,双手在空中编织出图案,仿佛在引导数字各归其位。"化简一下……面积 = 20 × 20 × sin(36) × cos(36)……面积 = 400 sin(36) cos(36)。"赫蒂的眼睛亮了起来,嘴角露出一丝微笑。"啊,但我可以用正弦的二倍角恒等式……sin(2θ) = 2 sinθ cosθ。"她的声音提高了,充满了兴奋。"因此,sin(72) = 2 sin(36) cos(36)……所以,面积 = 400 × (1/2) × sin(72) = 200 sin(72)。"数学在她脑海中翩翩起舞,数字完美地交织在一起。赫蒂的目光猛地回到花园,注意力更加集中。"现在计算最终面积……我需要算出sin(72)的值……"她伸手拿起那台老旧的计算器,手指熟练地按下按键。"sin(72) ≈ 0.9511……因此,面积 ≈ 200 × 0.9511 ≈ 190.22平方英尺。"赫蒂的目光再次回到花园,脸上露出柔和而满足的微笑。魔力渐渐成形,数字如同花瓣般各归其位。"啊,孩子,"她轻声说道,眼中闪烁着光芒。"其中一个三角形区域的面积约为190.22平方英尺。" ## 局限性: - 合成生成过程中存在少量人工痕迹,例如包含"Here is the rewritten question..."这类冗余文本,可根据需求进行过滤 - 存在GPT生成文本的典型特征(GPTisms),同样可按需过滤 - 重复冗余:多数问答对的开篇方式与叙事模式高度一致,易导致阅读乏味 - 表述臃肿:角色回复篇幅较长,虽有助于思维链推理与准确作答,但对角色扮演模型而言适配性欠佳 - 原始问答对由`mistral-large`生成,改写工作由`l3-70b`完成。可选用性能更优的模型进行迭代;此外,所用API频繁触发速率限制,导致输出被截断 ## 未来研究方向: - 优化且更均衡的主题列表 - 采用思维链(Chain of Thought, COT)技术生成问题 - 重新排序提示词并添加中间提示以提升改写质量,或可增加摘要步骤以提升生成效率 ## 可选未来研究方向(当前受预算限制暂未尝试的优质思路): - 多轮对话 - 函数调用/AI智能体(AI Agent) - 采用自一致性方法进行答案验证与代码执行,在数据集生成流程中集成如`internlm-math`等模型以提升准确率 - 采用WizardLM风格的进化指令(evol-instruct)技术生成更多样化的问题 ## 复现指南/所用提示词(供自定义数据集生成参考): *针对数据集`G-reen/TheatreLM-v2.1-Characters`的每一行数据:* 1. 选取指定主题: 代数、布尔代数、线性代数、微积分、数学分析、几何学、拓扑学、代数曲面、三角学、组合数学、逻辑学、集合论、数论、计算理论、信息论、信号处理、概率论、统计学、博弈论、运筹学、数学证明、离散数学、算术、逻辑证明、演绎推理、归纳推理、信息科学、系统论、决策论、物理学、化学、地质学、海洋学、气象学、天文学、生物化学、微生物学、植物学、动物学、生态学、宏观经济学、微观经济学、语言学、政治学、流行病学、行为主义、认知心理学、神经科学、社会心理学、精神分析/心理动力学心理学、人本主义心理学、形而上学、伦理学、哲学、理论计算机科学、算法、数据结构、计算机图形学、机器学习、科学计算、计算机安全、数据库工程、数据挖掘、网络安全、Python、Java、C#、SQL、JavaScript、PHP、C++、C、Perl、HTML/CSS、Shell、治疗/心理咨询 2. 生成角色与场景摘要: 此处为世界观设定: {{world}} 此处为角色设定: {{character}} 将上述内容整合成一段连贯的故事摘要,需以本名提及所有细节与元素并覆盖全部核心信息,仅输出该摘要,且以「Summary:」作为开头。 3. 生成问题: 此处为剧情场景设定: {{setting_summary}} 我正在为学习「{{topic}}」的学生编写基于故事的趣味问题,这些问题需以上述设定中角色的视角展开,并结合现实世界的知识以兼具教育意义。请描述得详尽细致,因为学生不会提前获知剧情内容。 请仅输出一个问题,无需额外内容。 4. 生成答案: 你正在参与自由作答(FRQ)测试,题目为: '{{question}}' 请逐步推导求解该问题。得出答案后,输出答案及所有相关代码或推导过程。完成作答后,最后一行输出「FINISHED」。 5. 改写问题: 你是一名编写兼具故事性与知识性教材的作者,此处为剧情设定: {{setting_summary}} 此处为剧情主角: {{character_summary}} 此处为原始问题: {{question}} 请通过叙事、动作与对话(采用第三人称)等方式将该问题融入剧情,确保对话符合角色的语气与说话风格。叙事需在主角开始思考并作答前戛然而止。可自由加入其他角色、动作场景与设定细节以增强内容的趣味性与吸引力,无需解答问题。需包含问题中的所有关键数字,不得遗漏任何细节。最后以第三人称叙事或对话的方式提出问题,总结主角需要解决的问题。 6. 改写答案: 此处为问题的答案: {{answer}} 承接上文生成的叙事内容,以主角{{character_name}}的第三人称视角重写该答案,按需融入叙事、对话与动作细节。需尽可能保留所有推理过程与中间演算步骤(但需符合{{character_name}}的角色语气),不得省略代码(若答案中包含代码)。
提供机构:
maas
创建时间:
2024-08-29
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Duet-v0.5是一个包含5000行思维链问答对的数据集,每条数据包含原始问题与答案,以及经过角色扮演故事重写的问题与答案。该数据集基于TheatreLM-v2.1-Characters的角色和设定生成,旨在通过故事情节增强教育性问答的趣味性和叙事性。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作