PhillyMac/Julius_Caesar_Corpus
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/PhillyMac/Julius_Caesar_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为julius-caesar-agent,是一个由Deku Corpus Builder自动生成的语料库,专为基于RAG的AI应用设计。数据集主题围绕古罗马历史人物Julius Caesar,属于人物类型数据集,共包含1,611个项目。所有文本均已使用all-MiniLM-L6-v2模型生成384维的嵌入向量。每个数据记录包含文本内容、来源信息(URL、标题、域名)、内容相关性评分(0-1)、内容质量评分(0-1)、检测到的主题列表、文本字符长度、关联主题名称(固定为Julius Caesar)、主题类型(固定为personality)、内容提取日期以及预计算的嵌入向量等丰富元数据。数据集适用于文本生成和特征提取任务,主要语言为英语,内容来源于公共领域和知识共享许可材料。
This dataset named julius-caesar-agent is an automatically generated corpus by the Deku Corpus Builder for RAG-based AI applications. The subject focuses on the historical figure Julius Caesar (personality type) and contains 1,611 items. All texts come with pre-computed 384-dimensional embeddings using the all-MiniLM-L6-v2 model. Each record includes: text content, source information (URL, title, domain), relevance score (0-1), quality score (0-1), detected topics, character count, subject name (Julius Caesar), subject type (personality), extraction date, and embedding vectors. Designed for text-generation and feature-extraction tasks, the dataset primarily contains English content sourced from public domain and Creative Commons materials.
提供机构:
PhillyMac



