five

shoochoon/what-is-art-rst

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/shoochoon/what-is-art-rst
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - philosophy - rst - argumentation --- # RST-Inspired Rhetorical Annotation of Tolstoy's *What Is Art?* A discourse annotation dataset for Leo Tolstoy's *What Is Art?* (1904), produced using a functionally adapted RST scheme designed for argumentative and philosophical prose. The annotations were created as part of a study on stance-conditioned fallacy judgment in philosophical argumentation, currently under review. ## Source Text Tolstoy, L. (1904). *What is art?* Funk & Wagnalls. Retrieved from Project Gutenberg: https://www.gutenberg.org/ebooks/64908 The annotated corpus covers approximately 62,395 words, drawn from the English translation (1904), excluding the preface and the concluding chapter, where sustained argumentation is limited. ## Annotation Methodology ### Theoretical basis Annotation follows a functionally adapted RST scheme (Mann & Thompson, 1988), operationalized for argumentative rather than narrative or expository discourse. The core RST distinction between **Nucleus (N)** and **Satellite (S)** is retained as a structural heuristic for separating claims from supporting material: - **Nucleus**: units that introduce new argumentative content or develop the text's overarching theses (functioning as claims) - **Satellite**: dependent or elaborative material that primarily serves to justify, warrant, or back the associated Nucleus ### Adaptations to standard RST Standard RST requires completeness, connectedness, uniqueness, and adjacency of text spans. The adjacency and uniqueness constraints were deliberately relaxed: - **Non-adjacency**: Nuclei and Satellites are linked across non-contiguous text spans in order to capture long-range argumentative dependencies, which are frequent in Tolstoy's discursive style - **Coarse segmentation**: EDUs primarily correspond to full sentences rather than clauses, following Kobayashi et al. (2019), who report improved model performance with less granular segmentation - **Completeness deprioritized**: accuracy of argumentative link identification was prioritized over strict RST tree coherence, as the goal is argument mining rather than discourse structure visualization ### Relation set The original RST taxonomy (20+ relations) was collapsed into 12 functional categories suited to philosophical argumentation. Relations with closely aligned argumentative functions were merged; the *Justify* relation was omitted as its function is embedded in the broader argumentative structure of philosophical prose. | Label | Original RST relation(s) | Function | |-------|--------------------------|----------| | Elaboration | Elaboration, Explanation | Elaborative support to a claim | | Evaluation | Evaluation | Assessment or judgment of a situation or opinion | | Context | Circumstance, Background | Objective contextual information enhancing the central idea | | Contrast | Antithesis, Contrast, Concession | Differences or opposing perspectives relative to another statement | | Contingency | Condition, Otherwise | Restriction on the application of a statement | | Enablement | Enablement | Indication that a preceding statement enables or supports a claim | | Evidence | Evidence | Support based on evidence | | Motive | Motivation, Purpose | Support based on motivation or purpose | | Outcome | Cause, Result, Sequence | Causal or resultant effect of a statement | | Solutionhood | Solutionhood | Proposed solution or answer to a question | | Interpretation | Interpretation | Author's explanatory perspective on a statement | | Restatement | Restatement, Summary | Reiteration or summary of a claim | ### Annotation tool and annotator Annotation was carried out in **Label Studio** (Tkachenko et al., 2020–2025). Due to resource constraints and the specialized nature of the task, the corpus was annotated by a single expert annotator trained in both philosophical and linguistic analysis. ## Dataset Statistics | Metric | Value | |--------|------:| | Corpus size | ~62,395 words | | Total annotated spans | 1,256 | | Nucleus (N) spans | 134 | | Satellite (S) spans | 1,121 | | S:N ratio | 8.4:1 | | Total relations | 1,238 | ### Relation type distribution | Relation | Count | % | |----------|------:|--:| | Elaboration | 679 | 54.8% | | Evaluation | 134 | 10.8% | | Contrast | 132 | 10.7% | | Restatement | 83 | 6.7% | | Outcome | 54 | 4.4% | | Evidence | 50 | 4.0% | | Interpretation | 27 | 2.2% | | Context | 25 | 2.0% | | Solutionhood | 19 | 1.5% | | Motive | 13 | 1.1% | | Contingency | 10 | 0.8% | | Enablement | 5 | 0.4% | ### Nucleus satellite load Each Nucleus carries on average 2.7 Satellites (median: 2, maximum: 9), reflecting Tolstoy's characteristic argumentative pattern of anchoring a central claim with multiple elaborative, evaluative, and contrastive satellites. ## File Format The dataset is provided as a single JSON file exported from Label Studio (`annotation_pretty.json`). Each annotation record contains: - `id`: Label Studio internal task ID - `annotations[].result`: list of annotation objects, each of type `labels` (text span with N/S label) or `relation` (directed link between two spans with a relation label) - Span objects include `value.start`, `value.end`, `value.text`, and `value.labels` - Relation objects include `from_id`, `to_id`, `direction`, and `labels` ## Intended Uses - Argument mining in philosophical and literary prose - RST-based discourse analysis with relaxed structural constraints - Stance-conditioned fallacy detection (primary downstream task in the associated paper) - Study of elaboration chain structure in non-standard argumentative texts - Benchmark development for coarse-grained rhetorical annotation ## Associated Paper Zhang, [Name]. Stance-dependent fallacy judgments and rhetorical structure: a computationally assisted case study of Tolstoy's *What Is Art?* *Argumentation*, under review. ## References Kobayashi, N., Hirao, T., Kamigaito, H., Okumura, M., & Nagata, M. (2019). Split or merge: Which is better for unsupervised RST parsing? *Proceedings of EMNLP-IJCNLP 2019*, 3626–3636. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. *Text*, 8(3), 243–281. Tkachenko, M., Malyuk, M., Holmberg, A., Liubimov, N., & Fabricius-Hansen, C. (2020–2025). Label Studio: Data labeling software. https://github.com/HumanSignal/label-studio Tolstoy, L. (1904). *What is art?* Funk & Wagnalls. https://www.gutenberg.org/ebooks/64908 ## License The annotation is original scholarly work. The source text (*What Is Art?*, 1904) is in the public domain.
提供机构:
shoochoon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作