Thecoder3281f/chanlam-majorminorproduct-mixed-final
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Thecoder3281f/chanlam-majorminorproduct-mixed-final
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: "chanlam_train_final.csv"
- split: test
path: "chanlam_test_final.csv"
- split: val
path: "chanlam_val_final.csv"
- split: train_augmented
path: "chanlam_train_augmented_final.csv"
license: mit
task_categories:
- translation
language:
- en
tags:
- chemistry
pretty_name: Chan Lam Dataset from Open Reaction Database
size_categories:
- 10K<n<100K
---
# Dataset Card for [Dataset Name]
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Original Authors:** []()
- **Original Paper:** []()
- **Open Reaction Database Repo:** [https://]()
### Dataset Summary
The order of addition is as follows:
> *Reactants*>/.*Reagents*
> i.e.
> *Sulfonamide*.*Boronic acid*>/.*Catalyst*.*Base*
I processed the original dataset to extract only the reactants, reagents and products (yields are in another dataset)
### Supported Tasks and Leaderboards
Any SMILES-based Seq2Seq model can use this dataset.
### Languages
SMILES strings.
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
Input: Reactants + Reagents
Target: Products
### Data Splits
train: Default train split
val: Used in eval loop of Trainer
test: Hold out set for separate evaluation
train_augmented: train set but augmented with 4 extra random SMILES per row, totalling a dataset with 5x more rows
## Dataset Creation
### Curation Rationale
This is a small dataset, good for testing out the effectiveness of LoRA vs full finetuning
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset.
提供机构:
Thecoder3281f



