philipp-zettl/vrom-ml-training

Name: philipp-zettl/vrom-ml-training
Creator: philipp-zettl
Published: 2026-04-24 10:04:22
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/philipp-zettl/vrom-ml-training

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为vROM: ML Training Stack (TRL + PEFT + Datasets)，是一种向量只读存储器，包含预计算的HNSW索引，用于即时浏览器内的RAG（检索增强生成）。它包含ML训练堆栈（TRL、PEFT和Datasets）的预嵌入文档。该数据集设计用于VecDB-WASM，无需客户端计算嵌入即可进行向量搜索。数据集包含629个向量，384维，总计约100K令牌，索引大小为5.8 MB。使用的嵌入模型是Xenova/all-MiniLM-L6-v2，距离度量采用余弦相似度。该数据集是vROM生态系统的一部分，包含index.json、chunks.json和manifest.json等文件，以及用于构建自定义vROM的工具。

The dataset named vROM: ML Training Stack (TRL + PEFT + Datasets) is a Vector Read-Only Memory containing pre-computed HNSW index for instant in-browser RAG (Retrieval-Augmented Generation). It includes pre-embedded documentation for the ML training stack, specifically TRL, PEFT, and Datasets. The dataset is designed for use with VecDB-WASM, enabling vector search without client-side embedding computation. The dataset contains 629 vectors with 384 dimensions, totaling approximately 100K tokens, and has an index size of 5.8 MB. The embedding model used is Xenova/all-MiniLM-L6-v2 with cosine distance metric. The dataset is part of the vROM ecosystem and includes files like index.json, chunks.json, and manifest.json, along with a builder tool for custom vROMs.

提供机构：

philipp-zettl

5,000+

优质数据集

54 个

任务类型

进入经典数据集