renumics/beans-outlier
收藏Hugging Face2023-06-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/renumics/beans-outlier
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- expert-generated
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- extended
task_categories:
- image-classification
task_ids:
- multi-class-image-classification
pretty_name: Beans
dataset_info:
features:
- name: image_file_path
dtype: string
- name: image
dtype: image
- name: labels
dtype:
class_label:
names:
'0': angular_leaf_spot
'1': bean_rust
'2': healthy
- name: embedding_foundation
sequence: float32
- name: embedding_ft
sequence: float32
- name: outlier_score_ft
dtype: float64
- name: outlier_score_foundation
dtype: float64
- name: nn_image
dtype: image
splits:
- name: train
num_bytes: 293531811.754
num_examples: 1034
download_size: 0
dataset_size: 293531811.754
---
# Dataset Card for "beans-outlier"
📚 This dataset is an enhancved version of the [ibean project of the AIR lab](https://github.com/AI-Lab-Makerere/ibean/).
The workflow is described in the medium article: [Changes of Embeddings during Fine-Tuning of Transformers](https://medium.com/@markus.stoll/changes-of-embeddings-during-fine-tuning-c22aa1615921).
## Explore the Dataset
The open source data curation tool [Renumics Spotlight](https://github.com/Renumics/spotlight) allows you to explorer this dataset. You can find a Hugging Face Space running Spotlight with this dataset here: <https://huggingface.co/spaces/renumics/beans-outlier>

Or you can explorer it locally:
```python
!pip install renumics-spotlight datasets
from renumics import spotlight
import datasets
ds = datasets.load_dataset("renumics/beansoutlier", split="train")
df = ds.to_pandas()
df["label_str"] = df["labels"].apply(lambda x: ds.features["labels"].int2str(x))
dtypes = {
"nn_image": spotlight.Image,
"image": spotlight.Image,
"embedding_ft": spotlight.Embedding,
"embedding_foundation": spotlight.Embedding,
}
spotlight.show(
df,
dtype=dtypes,
layout="https://spotlight.renumics.com/resources/layout_pre_post_ft.json",
)
```
提供机构:
renumics
原始信息汇总
数据集概述
基本信息
- 名称: Beans
- 语言: 英语 (en)
- 许可证: MIT
- 多语言性: 单语种
- 大小: 1K<n<10K
- 来源: 扩展自其他数据集
任务类型
- 类别: 图像分类
- 具体任务: 多类别图像分类
数据集特征
- 图像文件路径 (image_file_path): 字符串类型
- 图像 (image): 图像类型
- 标签 (labels): 类别标签,包括:
- 0: angular_leaf_spot
- 1: bean_rust
- 2: healthy
- 嵌入基础 (embedding_foundation): 序列,浮点32位
- 嵌入微调 (embedding_ft): 序列,浮点32位
- 异常分数微调 (outlier_score_ft): 浮点64位
- 异常分数基础 (outlier_score_foundation): 浮点64位
- 近邻图像 (nn_image): 图像类型
数据集拆分
- 训练集 (train):
- 样本数: 1034
- 数据大小: 293531811.754字节



