five

Strangefiction/ScentSet

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Strangefiction/ScentSet
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-generation language: - en tags: - Smell - chemistry - biology - medical - synthetic - climate size_categories: - 100K<n<1M --- # ScentSet: A Synthetic Dataset for Smell Description and Classification **ScentSet** is a synthetic dataset containing **572,293 entries** and approximately **15 million tokens**. Each entry is a short natural language description in simple english of a smell, often followed by a hint or guess about its source. The dataset is designed to support machine learning research in **scent recognition**, **classification**, and **multimodal representation learning**. ### Format ```json {"text": "There's a bright citrus smell layered over something minty. It might be toothpaste."} ``` ### Use Cases - Training models to generate or classify smell descriptions. - Embedding olfactory descriptions for cross-modal tasks. - Exploring synthetic sensory data in NLP. ### Data Statistics - Entries: 572,293 - Tokens: ~15 million - Language: English (simple, descriptive) - Generated: Synthetically via language modeling and structured prompt templates. ### Limitations - Synthetic data: All content was generated by a language model and may contain factual inaccuracies, biases, or hallucinations. - No human verification: The dataset was not manually reviewed. - Simplified language: Sentence structure and vocabulary were constrained to maximize tiny LM performance. ### Citation ```json @misc{ScentSet_2025, author = {David S.}, title = {ScentSet: A Synthetic Dataset for Smell Description and Classification}, year = {2025}, publisher = {Hugging Face Datasets}, howpublished = {\url{https://huggingface.co/datasets/sixf0ur/ScentSet}}, note = {Generated with language models (e.g. Gemini 2) for research on olfactory language modeling} } ```
提供机构:
Strangefiction
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作