PhillyMac/Ida_B._Wells_Corpus
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/PhillyMac/Ida_B._Wells_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ida-b-wells-agent,是由Deku Corpus Builder自动生成的,用于基于RAG的AI应用。数据集主题是关于Ida B. Wells(艾达·B·威尔斯)的个人资料,属于personality类型,包含1,501个项目,所有项目都包含预计算的嵌入向量(使用all-MiniLM-L6-v2模型)。每个记录包含文本内容、来源URL、来源标题、来源域名、相关性评分(0-1)、质量评分(0-1)、检测到的主题JSON数组、文本长度、相关主题名称、主题类型(personality或topic)、提取日期以及384维的嵌入向量。数据集设计用于与现有嵌入语料库集成,嵌入使用sentence-transformers/all-MiniLM-L6-v2模型,与FAISS索引兼容。内容来源于公共领域和Creative Commons许可的材料。
This dataset, named ida-b-wells-agent, was automatically generated by the Deku Corpus Builder for use in RAG-based AI applications. The subject of the dataset is Ida B. Wells, categorized as personality type, containing 1,501 items, all with pre-computed embeddings (using all-MiniLM-L6-v2 model). Each record includes: text content, source URL, source title, source domain, relevance score (0-1), quality score (0-1), JSON array of detected topics, text length, subject name, subject type (personality or topic), extraction date, and a 384-dimensional embedding vector. The dataset is designed to integrate with existing embedded corpuses, with embeddings using the sentence-transformers/all-MiniLM-L6-v2 model, compatible with FAISS indexing. Content is sourced from public domain and Creative Commons licensed materials.
提供机构:
PhillyMac



