enzoescipy/wikipedia-longest-stride-chunked-500-embed-intfloat-multilingual-e5-base
收藏Hugging Face2026-03-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/enzoescipy/wikipedia-longest-stride-chunked-500-embed-intfloat-multilingual-e5-base
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# Wikipedia-Longest-Stride-Chunked-500-Embed-intfloat-multilingual-e5-base
This is the embed processed version of [Wikipedia-Longest-Stride-Chunked-500 HF](https://huggingface.co/datasets/enzoescipy/wikipedia-longest-stride-chunked-500) Dataset.
1. Computational Resources
- Used Computational Resources : Colab Pro+ H100 Instance
- Total Consumed Time : 24h
2. Dataset processing
- train : 974720 sequences, (1 ~ 512) range length of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) embeddings.
- test, val : each 1000 sequences, (1 ~ 512) range length of [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) embeddings.
embedding strategy is in the `scripts/embeddings.py`.
detailed description will be provided later. Please stay tuned!
提供机构:
enzoescipy



