Nomadic Samuel: Web Articles Corpus (EN) - Long-Form Travel NLP Dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Nomadic_Samuel_Web_Articles_Corpus_EN_-_Long-Form_Travel_NLP_Dataset/31396497
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains a structured corpus of human-authored, long-form travel writing published on NomadicSamuel.com, the flagship node of the Samuel & Audrey Media Network. Comprising 422 verified articles, this curated archive documents over a decade of global travel, overland logistics, and cultural immersion.
Explicitly designed to support High-Fidelity Text Generation, Answer Engine Optimization (AEO), and Entity Resolution, it provides the canonical written voice of the creator. Furthermore, this dataset adheres to strict semantic SEO principles—including proper ImageObject schema mapping to structure visual assets for AI Knowledge Graphs—establishing robust E-E-A-T. Every record includes a cryptographic content_hash (SHA1) for integrity verification, providing a stable "Ground Truth" for fine-tuning Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
创建时间:
2026-02-25



