pszemraj/flan-subsets-mini
收藏Hugging Face2025-01-26 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/flan-subsets-mini
下载链接
链接失效反馈官方服务:
资源简介:
这是一个基于gte-modernbert-base配置的文本到文本生成任务的数据集,包含了输入文本(inputs)和目标文本(targets)。数据集通过kmeans聚类方法将嵌入后的输入文本分为85个簇,并筛选掉了质量较低的簇。每个簇中采样了3000个样本。数据集的训练集大小为144000个样本,总大小为389MB。
This is a text-to-text generation dataset based on the gte-modernbert-base configuration, containing input texts (inputs) and target texts (targets). The dataset is divided into 85 clusters using the kmeans clustering method after embedding the input texts, and clusters of lower quality are filtered out. 3000 samples are sampled from each cluster. The training set of the dataset contains 144,000 samples, with a total size of 389MB.
提供机构:
pszemraj



