Ground truth and raw data for conceptual dependency knowledge graphs

Name: Ground truth and raw data for conceptual dependency knowledge graphs
Creator: SMU Research Data Repository (RDR)
Published: 2024-07-18 00:00:00
License: 暂无描述

researchdata.smu.edu.sg2024-07-18 更新2025-01-15 收录

下载链接：

https://researchdata.smu.edu.sg/articles/dataset/Ground_truth_and_raw_data_for_conceptual_dependency_knowledge_graphs/26106256/1

下载链接

链接失效反馈

官方服务：

资源简介：

The data used for this research is scraped from TVTropes.org, specifically from the “Playing With A Trope” subwiki2 and the trope description pages. The ”Playing With A Trope” subwiki provides a detailed guide on the various ways tropes can be manipulated or altered in creative works. It categorizes the methods of using tropes into several distinct approaches:Played Straight: The trope is used in its typical manner without alteration.Justified: A logical in-universe reason is given for the trope’s occurrence.Inverted: The trope is reversed or its expected elements are flipped.Subverted: The trope is set up to happen but then deliberately avoided or contradicted.Double Subverted: The trope is subverted, but then ultimately occurs any- way in an unexpected way.Parodied: The trope is exaggerated humorously to mock or satirize it.Deconstructed: The trope is analyzed critically, exposing its flaws or unreal-istic elements.Reconstructed: After deconstruction, the trope is pieced back together in a way that addresses previous criticisms but retains its essence.Zig Zagged: The use of the trope is inconsistent, combining several of the above methods.Averted: The trope is completely absent from the work.Enforced: The trope is used because of external pressures or requirements.Implied: The trope is hinted at but occurs off-screen.Played for Laughs/Drama/Horror: The trope is used specifically to evoke humor, drama, or horror.Exploited: Characters in the story leverage the trope for their own benefits.Defied: Characters actively avoid the trope’s occurrence.Discussed: Characters talk about the trope, acknowledging its presence or typicality.Conversed: The trope is discussed in relation to other works, often breaking the fourth wall.Lampshaded: The presence of the trope is pointed out within the work by the characters.This subwiki meticulously describes the permutations available for each trope based on these categories, illustrating the versatility and depth with which tropes can be ”played.” Combined with the structured relationships among tropes as indicated on their description pages, this provides a valuable dataset for constructing knowledge graphs that capture the complex interplay of narrative elements. In testing the effectiveness of the CD-based parser, 15 trope ideas are ran- domly sampled from TVTropes.org. For each trope, sentences are selected from the trope description pages containing information of trope-to-trope relationships and are manually annotated for triples. This results in a total data sample of 63 sentences and 941 triples, which is used as the ground truth for testing the CD concept extractor against ClauCy, a SpaCy implementation of ClauseIE by Chourdakis and Reiss 2018. To explore the idea recommendation approach, the original dataset of tropes is expanded from 15 to 123 tropes, scraping a selection of tropes the list of tropes cited by the original 15 tropes, resulting in 3,153 sentences. After generating triples as per the procedure in Section 3.5, 26,993 unique concepts with 160,959 extracted triples were extracted. The dataset contains information about the tropes and their relationship with other concepts in the English language. Trope names are written in PascalCase and are embedded within the text using the markup language of PmWiki. Being an online crowdsourcing knowledge base, the style of natural language use is largely informal, though the TVTropes community share a common jargon of explicating trope relationships.

本研究所使用的数据源于 TVTropes.org，具体为“Playing With A Trope”子维基2及其相关成语描述页面。该“Playing With A Trope”子维基提供了关于成语在创意作品中被操纵或改变的多种方式的详细指南。它将使用成语的方法划分为几个独特的途径：直接使用：成语以典型的形式使用，未经修改。合理化：为成语的出现提供了宇宙内的逻辑理由。反转：成语被颠倒或其预期元素被翻转。颠覆：成语被设置为发生，但随后故意避免或与之矛盾。双重颠覆：成语被颠覆，但最终以意想不到的方式发生。讽刺：成语被夸张地幽默化，以嘲讽或讽刺之。解构：成语被批判性地分析，揭露其缺陷或不真实元素。重构：在解构之后，成语以解决先前批评的同时保留其本质的方式重新组合。交错使用：成语的使用不一致，结合了上述几种方法。避免：成语在作品中完全不存在。强制使用：由于外部压力或要求而使用成语。暗示：成语被暗示，但发生在屏幕之外。用于搞笑/戏剧/恐怖：成语的使用专门为了唤起幽默、戏剧或恐怖。利用：故事中的角色利用成语为自己谋取利益。违反：角色积极避免成语的发生。讨论：角色谈论成语，承认其存在或典型性。对话：成语与其他作品相关联地被讨论，常常打破第四面墙。灯泡式：成语的存在由角色在作品中指出。该子维基详细描述了基于这些类别的每个成语的可供选择的变体，展示了成语“使用”的多样性和深度。结合成语描述页面中指示的成语之间的结构化关系，这为构建知识图谱提供了宝贵的资源，以捕捉叙事元素之间的复杂互动。在测试基于CD的解析器效果时，从TVTropes.org随机抽取了15个成语想法。对于每个成语，从成语描述页面中选择包含成语与成语之间关系信息的句子，并手动标注为三元组。这产生了总共63个句子和941个三元组的数据样本，用作测试CD概念提取器与Chourdakis和Reiss 2018年实现的SpaCy版本ClauseIE的基准。为了探索想法推荐方法，原始的成语数据集从15个扩展到123个成语，抓取了原始15个成语引用的成语列表，结果产生了3,153个句子。按照第3.5节中的程序生成三元组后，提取了26,993个独特概念和160,959个提取的三元组。该数据集包含关于成语及其与英语中其他概念之间关系的信息。成语名称以 PascalCase 格式书写，并使用 PmWiki 标记语言嵌入到文本中。作为一个在线众包知识库，自然语言的使用风格大多是非正式的，尽管TVTropes社区共享一种解释成语关系的共同术语。

提供机构：

SMU Research Data Repository (RDR)

5,000+

优质数据集

54 个

任务类型

进入经典数据集