five

crumb/Wizard-EvolInstruct70k-k4

收藏
Hugging Face2023-07-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/crumb/Wizard-EvolInstruct70k-k4
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: instruction dtype: string - name: output dtype: string - name: cluster dtype: int64 splits: - name: train num_bytes: 131460545 num_examples: 70000 download_size: 69206496 dataset_size: 131460545 --- # Dataset Card for "Wizard-EvolInstruct70k-k4" `centers.pt` in the files is a 4x384 matrix including the centers of each cluster. I use `sentence-transformers/all-MiniLM-L6-v2` to encode text. ```python import torch from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') embeddings = torch.tensor(model.encode(sentences)) centers = torch.load("centers.pt") # mse based cluster choice clusters = (embeddings - centers).pow(2).mean(1).argmin().tolist() # or you could load the sklearn kmeans classifier # todo: documentation for that # todo: figure out how to do that # todo: cant you push sklearn classifiers to the hub with some weird code introduced earlier this year or something ```
提供机构:
crumb
原始信息汇总

数据集概述

数据集名称

  • 名称: Wizard-EvolInstruct70k-k4

数据集特征

  • 特征1: instruction
    • 数据类型: string
  • 特征2: output
    • 数据类型: string
  • 特征3: cluster
    • 数据类型: int64

数据集拆分

  • 拆分名称: train
    • 示例数量: 70000
    • 数据大小: 131460545字节

数据集大小

  • 下载大小: 69206496字节
  • 数据集总大小: 131460545字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作