reddit-logic
收藏Reddit Argument Logic 数据集概述
基本信息
- 许可证: Creative Commons Attribution 4.0 International (CC-BY-4.0)
- 任务类别: 文本分类、特征提取
- 语言: 英语 (en)
- 标签: 社交媒体、Reddit、逻辑、论点、修辞、证据、辩论、话语、说服、逻辑有效性、批判性思维、推理数据集竞赛
数据集描述
该数据集专注于研究在线日常讨论中人们如何构建和表达逻辑论点。数据来源于Reddit的r/ChangeMyView子论坛,以其注重理性辩论而闻名。
数据集构建与标注
- 源数据: 从"HuggingFaceGECLM/REDDIT_comments"数据集的"changemyview"部分中选取了10,000个帖子,每个条目至少包含1,000个字符。
- 初始标注: 使用ChatGPT手动标注了五个种子案例,以建立识别论点中关键推理组件的框架。
- 可扩展标注: 使用agentlans/Llama3.1-LexiHermes-SuperStorm语言模型进行少量提示标注。
数据新颖性与领域相关性
该数据集专注于非正式在线讨论中的清晰、一致推理,填补了形式逻辑问题或学术文本之外的真实世界论证模式数据集的空白。
数据质量与结构
- 格式: 结构化JSON
- 字段:
- text: 原始Reddit帖子或评论
- claims: 论点中明确的声明
- ambiguous_terms: 含义不明确或依赖上下文的术语或短语
- assumptions: 论点中的隐含前提
- premises: 支持声明的陈述理由或证据
- evidence: 支持证据的可信度、相关性和充分性评估
- additional_data: 支持或反驳论点的补充信息
- issues: 识别的逻辑缺陷或矛盾
- competing_explanations: 替代解释或反驳论点
- validity: 论点的逻辑有效性评估
- soundness: 论点的整体强度和真实性评估
- recommendations: 提高论点质量的建议
示例条目
json { "text": "I know this is a generalization, but I think its a pretty valid one:
Does anyone else find it interesting that, judging by how the voting typically goes, the general population of Reddit seems to be very pro gun-control, but anti-CISPA/similar legislation?
It seems to be incredibly hypocritical, in my opinion. Im not pointing fingers at you, after all I have no idea how you feel about gun control, but when it comes to protesting CISPA the #1 contention is, "Its violate the 4th amendment!"
Yet, when it comes to gun control, nobody wants to hear about the 2nd amendment. Comments like that on /r/politics get downvoted to hell and the commenter gets called a loon.
If one can fear that the government will abuse its power with CISPA, or at abuse any power at all, then why is it so ridiculous to think that the government might abuse its ability to prohibit certain citizens from buying firearms?
I think the underlying reason for most opposition to CISPA is a fear that ones ability to access free media via piracy will be inhibited. After all, if you cant trust a United States that passes CISPA, how can you trust one that controls firearms, operates your healthcare, or guarantees your finances?
This is a rant directed at no particular person in the community, but I just dont see how its possible to oppose CISPA and support gun control legislation at the same time. Sorry for being so off-topic and ranty.", "claims": [ "Reddits users are inconsistent in their views on government power and individual rights.", "The opposition to CISPA and support for gun control are hypocritical." ], "ambiguous_terms": [ "Hypocritical", "inhibited", "guarantees" ], "assumptions": [ "Inconsistency reveals a double standard.", "Individual rights are prioritized differently based on context." ], "premises": [ "Users fear government abuse in CISPA but not in gun control.", "Media access, healthcare, and finances are trusted over gun control." ], "evidence": { "credibility": "Low to moderate", "relevance": "High", "sufficiency": "Limited; anecdotal and observational" }, "additional_data": "Surveys of Reddit users opinions on gun control, comparative analysis with other online communities.", "issues": [ "Lack of evidence for Reddits general views on gun control.", "Double standard might be contextual, not necessarily hypocritical." ], "competing_explanations": [ "Different contexts call for distinct rights and freedoms.", "Support for gun control and opposition to CISPA may stem from different values." ], "validity": "Valid", "soundness": "Moderate to strong", "recommendations": [ "Support claims with empirical data on Reddits user views.", "Explore the underlying values and contexts driving these views." ] }
局限性
- 分析范围:
- 仅限于单个帖子,未考虑更广泛的对话上下文。
- 主要关注逻辑结构(logos),而非情感诉求(pathos)或可信度(ethos)。
- 数据完整性因素:
- 论点中的引用和参考文献未独立验证。
- 非正式语言可能导致推理表达中的模糊性或含糊不清。
- 上下文偏见:
- 子论坛的特定人口统计和文化可能影响论证风格和数据中的偏见。
- 自动标注可能仍反映用于标注的语言模型的固有偏见。
许可证
Creative Commons Attribution 4.0 International (CC-BY-4.0),允许在适当署名的情况下进行研究和商业使用。




