specter
收藏魔搭社区2025-11-12 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/specter
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Specter
This dataset is a collection of title-related-unrelated triplets from Scientific Publications on Specter. See [Specter](https://github.com/allenai/specter) for additional information.
This dataset can be used directly with Sentence Transformers to train embedding models.
## Dataset Subsets
### `triplet` subset
* Columns: "anchor", "positive", "negative"
* Column types: `str`, `str`, `str`
* Examples:
```python
{
'anchor': "Integrating children's contributions in the interaction design process",
'positive': 'Designing for or designing with? Informant design for interactive learning environments',
'negative': 'Power Operation in ISD: Technological Frames Perspectives.',
}
```
* Collection strategy: Reading the Specter dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data), followed by deduplication.
* Deduplified: Yes
### `pair` subset
* Columns: "anchor", "positive"
* Column types: `str`, `str`
* Examples:
```python
{
'anchor': 'Time-dependent trajectory regression on road networks via multi-task learning',
'positive': 'Convex multi-task feature learning',
}
```
* Collection strategy: Reading the Specter dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data), only taking the title and related title, and then performing deduplication.
* Deduplified: Yes
# 数据集卡片:Specter
本数据集是科学出版物中标题相关与不相关三元组的集合,更多信息可参见[Specter](https://github.com/allenai/specter)。该数据集可直接配合Sentence Transformers(句子变换器)用于嵌入模型的训练。
## 数据集子集
### `triplet` 子集
* 列名:锚样本(anchor)、正样本(positive)、负样本(negative)
* 列类型:均为字符串(str)
* 示例:
python
{
'anchor': "Integrating children's contributions in the interaction design process",
'positive': 'Designing for or designing with? Informant design for interactive learning environments',
'negative': 'Power Operation in ISD: Technological Frames Perspectives.',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)数据集加载Specter数据集,随后执行去重操作。
* 已去重:是
### `pair` 子集
* 列名:锚样本(anchor)、正样本(positive)
* 列类型:均为字符串(str)
* 示例:
python
{
'anchor': 'Time-dependent trajectory regression on road networks via multi-task learning',
'positive': 'Convex multi-task feature learning',
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)数据集加载Specter数据集,仅提取标题及相关标题后进行去重。
* 已去重:是
提供机构:
maas
创建时间:
2025-01-06



