anshgupta123/moloptrl-sft-data

Name: anshgupta123/moloptrl-sft-data
Creator: anshgupta123
Published: 2026-04-25 19:31:55
License: 暂无描述

Hugging Face2026-04-25 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/anshgupta123/moloptrl-sft-data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含7,000个训练示例，每个示例具有五个特征：text（文本字符串）、messages（消息字符串）、input_selfies（输入自引用字符串）、output_selfies（输出自引用字符串）和tanimoto（Tanimoto系数，浮点类型）。数据集主要用于处理与自引用字符串和相似性度量相关的任务，可能涉及化学信息学或自然语言处理领域，其中Tanimoto系数常用于分子相似性比较。数据以train分割组织，总大小约为6.56 MB。

This dataset consists of 7,000 training examples, each with five features: text (a string of text), messages (a string of messages), input_selfies (input self-referencing strings), output_selfies (output self-referencing strings), and tanimoto (Tanimoto coefficient, a float64 type). It is designed for tasks related to self-referencing strings and similarity metrics, potentially in fields such as cheminformatics or natural language processing, where the Tanimoto coefficient is commonly used for molecular similarity comparison. The data is organized into a train split with a total size of approximately 6.56 MB.

提供机构：

anshgupta123

5,000+

优质数据集

54 个

任务类型

进入经典数据集