five

timescale/wikipedia-22-12-simple-embeddings

收藏
Hugging Face2024-03-14 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/timescale/wikipedia-22-12-simple-embeddings
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: wiki.csv license: apache-2.0 task_categories: - text-retrieval language: - en --- # wikipedia-22-12-simple-embeddings A modified version of [Cohere/wikipedia-22-12-simple-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings) meant for use with PostgreSQL with pgvector and Timescale Vector. ## Dataset Details This dataset was created for exploring time-based filtering and semantic search in PostgreSQL with pgvector and Timescale Vector. This is a modified version of the [Cohere wikipedia-22-12-simple-embeddings dataset hosted on Huggingface](https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings). It contains embeddings of [Simple English Wikipedia](https://simple.wikipedia.org/) entries. We added synthetic data: a time column, category, and tags. We loaded the data into a postgres table and exported it to a CSV file; therefore, the format has changed. The original dataset is available under the Apache 2.0 license, and thus, our modified version is also subject to the Apache 2.0 license.
提供机构:
timescale
原始信息汇总

wikipedia-22-12-simple-embeddings

数据集详情

  • 创建目的:用于在PostgreSQL中探索基于时间的过滤和语义搜索,结合pgvector和Timescale Vector。
  • 数据来源:基于Cohere wikipedia-22-12-simple-embeddings数据集的修改版本,包含Simple English Wikipedia条目的嵌入。
  • 数据修改:添加了合成数据,包括时间列、类别和标签。
  • 数据格式:数据被加载到postgres表中并导出为CSV文件,因此格式有所变化。
  • 许可:原始数据集和修改后的版本均遵循Apache 2.0许可。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作