five

philipp-zettl/vrom-ml-training

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/philipp-zettl/vrom-ml-training
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为vROM: ML Training Stack (TRL + PEFT + Datasets),是一种向量只读存储器,包含预计算的HNSW索引,用于即时浏览器内的RAG(检索增强生成)。它包含ML训练堆栈(TRL、PEFT和Datasets)的预嵌入文档。该数据集设计用于VecDB-WASM,无需客户端计算嵌入即可进行向量搜索。数据集包含629个向量,384维,总计约100K令牌,索引大小为5.8 MB。使用的嵌入模型是Xenova/all-MiniLM-L6-v2,距离度量采用余弦相似度。该数据集是vROM生态系统的一部分,包含index.json、chunks.json和manifest.json等文件,以及用于构建自定义vROM的工具。

The dataset named vROM: ML Training Stack (TRL + PEFT + Datasets) is a Vector Read-Only Memory containing pre-computed HNSW index for instant in-browser RAG (Retrieval-Augmented Generation). It includes pre-embedded documentation for the ML training stack, specifically TRL, PEFT, and Datasets. The dataset is designed for use with VecDB-WASM, enabling vector search without client-side embedding computation. The dataset contains 629 vectors with 384 dimensions, totaling approximately 100K tokens, and has an index size of 5.8 MB. The embedding model used is Xenova/all-MiniLM-L6-v2 with cosine distance metric. The dataset is part of the vROM ecosystem and includes files like index.json, chunks.json, and manifest.json, along with a builder tool for custom vROMs.
提供机构:
philipp-zettl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作