shoochoon/what-is-art-rst
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/shoochoon/what-is-art-rst
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- philosophy
- rst
- argumentation
---
# RST-Inspired Rhetorical Annotation of Tolstoy's *What Is Art?*
A discourse annotation dataset for Leo Tolstoy's *What Is Art?* (1904), produced using a functionally adapted RST scheme designed for argumentative and philosophical prose. The annotations were created as part of a study on stance-conditioned fallacy judgment in philosophical argumentation, currently under review.
## Source Text
Tolstoy, L. (1904). *What is art?* Funk & Wagnalls. Retrieved from Project Gutenberg: https://www.gutenberg.org/ebooks/64908
The annotated corpus covers approximately 62,395 words, drawn from the English translation (1904), excluding the preface and the concluding chapter, where sustained argumentation is limited.
## Annotation Methodology
### Theoretical basis
Annotation follows a functionally adapted RST scheme (Mann & Thompson, 1988), operationalized for argumentative rather than narrative or expository discourse. The core RST distinction between **Nucleus (N)** and **Satellite (S)** is retained as a structural heuristic for separating claims from supporting material:
- **Nucleus**: units that introduce new argumentative content or develop the text's overarching theses (functioning as claims)
- **Satellite**: dependent or elaborative material that primarily serves to justify, warrant, or back the associated Nucleus
### Adaptations to standard RST
Standard RST requires completeness, connectedness, uniqueness, and adjacency of text spans. The adjacency and uniqueness constraints were deliberately relaxed:
- **Non-adjacency**: Nuclei and Satellites are linked across non-contiguous text spans in order to capture long-range argumentative dependencies, which are frequent in Tolstoy's discursive style
- **Coarse segmentation**: EDUs primarily correspond to full sentences rather than clauses, following Kobayashi et al. (2019), who report improved model performance with less granular segmentation
- **Completeness deprioritized**: accuracy of argumentative link identification was prioritized over strict RST tree coherence, as the goal is argument mining rather than discourse structure visualization
### Relation set
The original RST taxonomy (20+ relations) was collapsed into 12 functional categories suited to philosophical argumentation. Relations with closely aligned argumentative functions were merged; the *Justify* relation was omitted as its function is embedded in the broader argumentative structure of philosophical prose.
| Label | Original RST relation(s) | Function |
|-------|--------------------------|----------|
| Elaboration | Elaboration, Explanation | Elaborative support to a claim |
| Evaluation | Evaluation | Assessment or judgment of a situation or opinion |
| Context | Circumstance, Background | Objective contextual information enhancing the central idea |
| Contrast | Antithesis, Contrast, Concession | Differences or opposing perspectives relative to another statement |
| Contingency | Condition, Otherwise | Restriction on the application of a statement |
| Enablement | Enablement | Indication that a preceding statement enables or supports a claim |
| Evidence | Evidence | Support based on evidence |
| Motive | Motivation, Purpose | Support based on motivation or purpose |
| Outcome | Cause, Result, Sequence | Causal or resultant effect of a statement |
| Solutionhood | Solutionhood | Proposed solution or answer to a question |
| Interpretation | Interpretation | Author's explanatory perspective on a statement |
| Restatement | Restatement, Summary | Reiteration or summary of a claim |
### Annotation tool and annotator
Annotation was carried out in **Label Studio** (Tkachenko et al., 2020–2025). Due to resource constraints and the specialized nature of the task, the corpus was annotated by a single expert annotator trained in both philosophical and linguistic analysis.
## Dataset Statistics
| Metric | Value |
|--------|------:|
| Corpus size | ~62,395 words |
| Total annotated spans | 1,256 |
| Nucleus (N) spans | 134 |
| Satellite (S) spans | 1,121 |
| S:N ratio | 8.4:1 |
| Total relations | 1,238 |
### Relation type distribution
| Relation | Count | % |
|----------|------:|--:|
| Elaboration | 679 | 54.8% |
| Evaluation | 134 | 10.8% |
| Contrast | 132 | 10.7% |
| Restatement | 83 | 6.7% |
| Outcome | 54 | 4.4% |
| Evidence | 50 | 4.0% |
| Interpretation | 27 | 2.2% |
| Context | 25 | 2.0% |
| Solutionhood | 19 | 1.5% |
| Motive | 13 | 1.1% |
| Contingency | 10 | 0.8% |
| Enablement | 5 | 0.4% |
### Nucleus satellite load
Each Nucleus carries on average 2.7 Satellites (median: 2, maximum: 9), reflecting Tolstoy's characteristic argumentative pattern of anchoring a central claim with multiple elaborative, evaluative, and contrastive satellites.
## File Format
The dataset is provided as a single JSON file exported from Label Studio (`annotation_pretty.json`). Each annotation record contains:
- `id`: Label Studio internal task ID
- `annotations[].result`: list of annotation objects, each of type `labels` (text span with N/S label) or `relation` (directed link between two spans with a relation label)
- Span objects include `value.start`, `value.end`, `value.text`, and `value.labels`
- Relation objects include `from_id`, `to_id`, `direction`, and `labels`
## Intended Uses
- Argument mining in philosophical and literary prose
- RST-based discourse analysis with relaxed structural constraints
- Stance-conditioned fallacy detection (primary downstream task in the associated paper)
- Study of elaboration chain structure in non-standard argumentative texts
- Benchmark development for coarse-grained rhetorical annotation
## Associated Paper
Zhang, [Name]. Stance-dependent fallacy judgments and rhetorical structure: a computationally assisted case study of Tolstoy's *What Is Art?* *Argumentation*, under review.
## References
Kobayashi, N., Hirao, T., Kamigaito, H., Okumura, M., & Nagata, M. (2019). Split or merge: Which is better for unsupervised RST parsing? *Proceedings of EMNLP-IJCNLP 2019*, 3626–3636.
Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. *Text*, 8(3), 243–281.
Tkachenko, M., Malyuk, M., Holmberg, A., Liubimov, N., & Fabricius-Hansen, C. (2020–2025). Label Studio: Data labeling software. https://github.com/HumanSignal/label-studio
Tolstoy, L. (1904). *What is art?* Funk & Wagnalls. https://www.gutenberg.org/ebooks/64908
## License
The annotation is original scholarly work. The source text (*What Is Art?*, 1904) is in the public domain.
提供机构:
shoochoon



