Tongda/bid-announcement-zh-v1.0
收藏Hugging Face2024-09-20 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Tongda/bid-announcement-zh-v1.0
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 11607613
num_examples: 2000
download_size: 3783597
dataset_size: 11607613
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
license: mit
language:
- zh
size_categories:
- 1K<n<10K
---
# 🦙 Bid Announcement Dataset - Alpaca Format
Welcome to the **Bid Announcement Dataset** page! This dataset contains **2,000** tender and bid announcements from China, preprocessed in **Alpaca format**, making it suitable for direct use in fine-tuning natural language processing (NLP) models.
## 📄 Dataset Overview
This dataset consists of carefully selected **bid announcements** from various industries and procurement projects within China. The dataset is particularly useful for training NLP models to better understand the structure, linguistic characteristics, and specialized terminology commonly found in such documents.
## 🚀 Application Scenarios
This dataset is particularly suited for the following applications:
- **Model Fine-Tuning**: The dataset is ideal for fine-tuning large language models (LLMs), enhancing their ability to generate and comprehend professional texts in Chinese, specifically for tasks such as text generation, classification, and information extraction in specialized domains.
- **Text Summarization**: Tender and bid announcements can be lengthy. This dataset enables training models to generate concise summaries of such documents.
- **Information Extraction**: Fine-tuning models to automatically extract key information such as project names, budget amounts, and dates from bid announcements.
- **Text Classification**: Classify announcements into different types of procurement projects or industry sectors to optimize tender information management.
## 🔧 How to Use
1. **Load the Dataset**: You can load the dataset directly using the `datasets` library from Hugging Face:
```python
from datasets import load_dataset
dataset = load_dataset("your-dataset-name")
```
2. **Model Fine-Tuning**: Using the `transformers` library, you can fine-tune your model with this dataset.
```python
from transformers import AutoModelForCausalLM, Trainer, TrainingArguments
model = AutoModelForCausalLM.from_pretrained("your-pretrained-model")
trainer = Trainer(
model=model,
train_dataset=dataset['train'],
args=TrainingArguments(
output_dir="./results",
per_device_train_batch_size=4,
num_train_epochs=3,
logging_dir="./logs",
)
)
trainer.train()
```
## 📊 Dataset Statistics
- **Total Entries**: 2,000
- **Industry Coverage**: Multiple industries and sectors
- **Fields**: instruction, input, output
## 📜 Source
This dataset is provided by **Tongda**, derived from publicly available bid announcement documents.
## 📄 License
This dataset is released under the **[MIT License](https://opensource.org/licenses/MIT)**. You are free to use, modify, and distribute the dataset in accordance with the terms of the license. Please make sure to comply with any applicable privacy regulations when using this dataset.
## 💡 Notes
- **Privacy and Compliance**: Ensure that you adhere to relevant privacy policies and data usage regulations when using this dataset. All information in this dataset is sourced from publicly available announcements.
- **Language**: The dataset is in Chinese, so a model with strong Chinese language understanding and generation capabilities is recommended.
## 🤝 Contribution and Feedback
If you have any questions or suggestions, feel free to submit an issue on [Hugging Face Community](https://huggingface.co/) or contact us through other channels. We appreciate your contributions and support! 😊
---
提供机构:
Tongda



