timescale/wikipedia-22-12-simple-embeddings
收藏Hugging Face2024-03-14 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/timescale/wikipedia-22-12-simple-embeddings
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: wiki.csv
license: apache-2.0
task_categories:
- text-retrieval
language:
- en
---
# wikipedia-22-12-simple-embeddings
A modified version of [Cohere/wikipedia-22-12-simple-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings)
meant for use with PostgreSQL with pgvector and Timescale Vector.
## Dataset Details
This dataset was created for exploring time-based filtering and semantic search in PostgreSQL with pgvector and Timescale Vector.
This is a modified version of the [Cohere wikipedia-22-12-simple-embeddings dataset hosted on Huggingface](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings).
It contains embeddings of [Simple English Wikipedia](https://simple.wikipedia.org/) entries.
We added synthetic data: a time column, category, and tags.
We loaded the data into a postgres table and exported it to a CSV file; therefore, the format has changed.
The original dataset is available under the Apache 2.0 license, and thus, our modified version is also subject to the Apache 2.0 license.
提供机构:
timescale
原始信息汇总
wikipedia-22-12-simple-embeddings
数据集详情
- 创建目的:用于在PostgreSQL中探索基于时间的过滤和语义搜索,结合pgvector和Timescale Vector。
- 数据来源:基于Cohere wikipedia-22-12-simple-embeddings数据集的修改版本,包含Simple English Wikipedia条目的嵌入。
- 数据修改:添加了合成数据,包括时间列、类别和标签。
- 数据格式:数据被加载到postgres表中并导出为CSV文件,因此格式有所变化。
- 许可:原始数据集和修改后的版本均遵循Apache 2.0许可。



