crumb/Wizard-EvolInstruct70k-k4

Name: crumb/Wizard-EvolInstruct70k-k4
Creator: crumb
Published: 2023-07-20 03:25:04
License: 暂无描述

Hugging Face2023-07-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/crumb/Wizard-EvolInstruct70k-k4

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: output dtype: string - name: cluster dtype: int64 splits: - name: train num_bytes: 131460545 num_examples: 70000 download_size: 69206496 dataset_size: 131460545 --- # Dataset Card for "Wizard-EvolInstruct70k-k4" `centers.pt` in the files is a 4x384 matrix including the centers of each cluster. I use `sentence-transformers/all-MiniLM-L6-v2` to encode text. ```python import torch from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') embeddings = torch.tensor(model.encode(sentences)) centers = torch.load("centers.pt") # mse based cluster choice clusters = (embeddings - centers).pow(2).mean(1).argmin().tolist() # or you could load the sklearn kmeans classifier # todo: documentation for that # todo: figure out how to do that # todo: cant you push sklearn classifiers to the hub with some weird code introduced earlier this year or something ```

提供机构：

crumb

原始信息汇总

数据集概述

数据集名称

名称: Wizard-EvolInstruct70k-k4

数据集特征

特征1: instruction
- 数据类型: string
特征2: output
- 数据类型: string
特征3: cluster
- 数据类型: int64

数据集拆分

拆分名称: train
- 示例数量: 70000
- 数据大小: 131460545字节

数据集大小

下载大小: 69206496字节
数据集总大小: 131460545字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集