jjrussell10/storyscope
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/jjrussell10/storyscope
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: StoryScope
task_categories:
- text-classification
language:
- en
license: mit
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: train
path: stories_train.parquet
- split: validation
path: stories_val.parquet
- split: test
path: stories_test.parquet
- split: dev
path: stories_dev.parquet
---
# StoryScope
- `stories_train.parquet`, `stories_val.parquet`, `stories_test.parquet`, `stories_dev.parquet`: prompt metadata plus AI-generated stories from GPT-5.4, Claude Sonnet 4.6, DeepSeek V3.2, Kimi K2.5, and Gemini 3 Flash
- `storyscope_features.parquet`: 304 extracted narrative features for 61,575 story rows
- `taxonomy.json`: the 304-feature taxonomy spanning 10 narrative dimensions
- `models/`: trained XGBoost classifiers for binary human-vs-AI detection and 6-way authorship attribution
## Notes
- Human story text is excluded for copyright reasons.
## Story Split Schema
Columns:
- `prompt_id`
- `split`
- `title`
- `prompt`
- `human_author`
- `human_anthology`
- `human_word_count`
- `story_gpt`
- `story_deepseek`
- `story_kimi`
- `story_gemini`
- `story_claude`
Split sizes:
- train: 7,383 prompts
- validation: 1,405 prompts
- test: 1,384 prompts
- dev: 100 prompts
## Feature File Schema
`storyscope_features.parquet` contains:
- `prompt_id`
- `story_title`
- `source`
- 304 feature columns such as `REV_SUS_001`, `PER_POV_001`, and `SOC_REL_024`
Feature values are encoded as strings for categorical, ordinal, binary, and multi-select outputs, with scale values stored as numeric-style entries.
提供机构:
jjrussell10



