krt1k/wikipedia_space_data
收藏Hugging Face2025-11-13 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/krt1k/wikipedia_space_data
下载链接
链接失效反馈官方服务:
资源简介:
本数据集包含了从维基百科关于太阳系及其相关天文主题(包括行星、卫星、彗星和太阳现象)的文章中提取的语义丰富的文本嵌入。文章内容经过提取和清洗,然后使用sentence-transformers/all-MiniLM-L6-v2模型进行编码。每个段落被表示为一个384维的嵌入向量,这支持了在太阳系概念之间的语义搜索、聚类和基于相似性的推理。
This dataset contains semantically rich text embeddings generated from Wikipedia articles about the Solar System and related astronomical topics, including planets, moons, comets, and solar phenomena. The text was extracted and cleaned from 32 Wikipedia pages, encoded using the sentence-transformers/all-MiniLM-L6-v2 model, and each passage is represented as a 384-dimensional embedding vector, enabling semantic search, clustering, and similarity-based reasoning across Solar System concepts.
提供机构:
krt1k



