five

seungheondoh/musical-word-embedding

收藏
Hugging Face2024-04-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/seungheondoh/musical-word-embedding
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: token dtype: string - name: content dtype: string - name: vector sequence: float32 splits: - name: tag num_bytes: 2740380 num_examples: 2227 - name: artist num_bytes: 46025354 num_examples: 37002 - name: track num_bytes: 898952880 num_examples: 697812 download_size: 1387722409 dataset_size: 947718614 configs: - config_name: default data_files: - split: tag path: data/tag-* - split: artist path: data/artist-* - split: track path: data/track-* tags: - music --- # Musical Word Embedding > [**Musical Word Embedding for Music Tagging and Retrieval**](https://arxiv.org/abs/2404.13569) > SeungHeon Doh, Jongpil Lee, Dasaem Jeong, Juhan Nam > To appear IEEE Transactions on Audio, Speech and Language Processing (submitted) <p align = "center"> <img src = "https://i.imgur.com/Yw4UPnM.png"> </p> Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach called Musical Word Embedding (MWE), which involves learning from various types of texts, including both everyday and music-related vocabulary. ### Resources: Using Musical Word Embedding - [Pre-trained Embedding Vector](https://huggingface.co/datasets/seungheondoh/musical-word-embedding) - [Paper](https://arxiv.org/abs/2404.13569) - [Blog](https://seungheondoh.github.io/musical_word_embedding_demo/) - [**notebook**-query_recommendation](https://github.com/seungheondoh/musical-word-embedding/blob/main/notebook/query_recommendation.ipynb) - [**notebook**-music_retrieval](https://github.com/seungheondoh/musical-word-embedding/blob/main/notebook/music_retrieval.ipynb) ### Run the download script for embedding vector: Check our huggingface dataset: You can download important embedding vectors such as tag, artist, and track from the Hugging Face dataset. ```python from datasets import load_dataset dataset = load_dataset("seungheondoh/musical-word-embedding") ``` ``` { "token": "happy", "content": "happy", "vector": [0.011484057642519474, -0.07818693667650223, -0.02778349258005619, 0.052311971783638, -0.1324823945760727, 0.03757447376847267, 0.007125925272703171, ...] },{ "token": "ARYZTJS1187B98C555", "content": "Faster Pussycat", "vector": [-0.13004058599472046, -1.3509420156478882, -0.3012666404247284, -0.34076201915740967, -0.8142353296279907, 0.3902665972709656, -0.1903497576713562, 0.6163021922111511, ...] } ``` For other general 10M word vectors, you can also download them using the script below. ``` bash scripts/download.sh ``` ### Citation If you find this work useful, please cite it as: ``` @article{doh2024musical, title={Musical Word Embedding for Music Tagging and Retrieval}, author={Doh, SeungHeon and Lee, Jongpil and Jeong, Dasaem and Nam, Juhan}, journal={update_soon}, year={2024} } @inproceedings{doh2021million, title={Million song search: Web interface for semantic music search using musical word embedding}, author={Doh, S and Lee, Jongpil and Nam, Juhan}, booktitle={International Society for Music Information Retrieval Conference, ISMIR}, year={2021} } @article{doh2020musical, title={Musical word embedding: Bridging the gap between listening contexts and music}, author={Doh, Seungheon and Lee, Jongpil and Park, Tae Hong and Nam, Juhan}, journal={arXiv preprint arXiv:2008.01190}, year={2020} } ``` Feel free to reach out for any questions or feedback!
提供机构:
seungheondoh
原始信息汇总

数据集概述

数据集特征

  • token: 数据类型为字符串。
  • content: 数据类型为字符串。
  • vector: 数据类型为浮点数序列。

数据集分割

  • tag: 包含2227个样本,总大小为2740380字节。
  • artist: 包含37002个样本,总大小为46025354字节。
  • track: 包含697812个样本,总大小为898952880字节。

数据集大小

  • 下载大小: 1387722409字节。
  • 数据集总大小: 947718614字节。

配置文件

  • default: 包含针对tag、artist和track的文件路径配置。

标签

  • music: 数据集与音乐相关。

数据示例

json { "token": "happy", "content": "happy", "vector": [0.011484057642519474, -0.07818693667650223, -0.02778349258005619, 0.052311971783638, -0.1324823945760727, 0.03757447376847267, 0.007125925272703171, ...] }, { "token": "ARYZTJS1187B98C555", "content": "Faster Pussycat", "vector": [-0.13004058599472046, -1.3509420156478882, -0.3012666404247284, -0.34076201915740967, -0.8142353296279907, 0.3902665972709656, -0.1903497576713562, 0.6163021922111511, ...] }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作