crumb/Wizard-EvolInstruct70k-k4
收藏Hugging Face2023-07-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/crumb/Wizard-EvolInstruct70k-k4
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: output
dtype: string
- name: cluster
dtype: int64
splits:
- name: train
num_bytes: 131460545
num_examples: 70000
download_size: 69206496
dataset_size: 131460545
---
# Dataset Card for "Wizard-EvolInstruct70k-k4"
`centers.pt` in the files is a 4x384 matrix including the centers of each cluster. I use `sentence-transformers/all-MiniLM-L6-v2` to encode text.
```python
import torch
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embeddings = torch.tensor(model.encode(sentences))
centers = torch.load("centers.pt")
# mse based cluster choice
clusters = (embeddings - centers).pow(2).mean(1).argmin().tolist()
# or you could load the sklearn kmeans classifier
# todo: documentation for that
# todo: figure out how to do that
# todo: cant you push sklearn classifiers to the hub with some weird code introduced earlier this year or something
```
提供机构:
crumb
原始信息汇总
数据集概述
数据集名称
- 名称: Wizard-EvolInstruct70k-k4
数据集特征
- 特征1: instruction
- 数据类型: string
- 特征2: output
- 数据类型: string
- 特征3: cluster
- 数据类型: int64
数据集拆分
- 拆分名称: train
- 示例数量: 70000
- 数据大小: 131460545字节
数据集大小
- 下载大小: 69206496字节
- 数据集总大小: 131460545字节



