Tongda/bid-announcement-zh-v1.0

Name: Tongda/bid-announcement-zh-v1.0
Creator: Tongda
Published: 2024-09-20 09:15:27
License: 暂无描述

Hugging Face2024-09-20 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Tongda/bid-announcement-zh-v1.0

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 11607613 num_examples: 2000 download_size: 3783597 dataset_size: 11607613 configs: - config_name: default data_files: - split: train path: data/train-* license: mit language: - zh size_categories: - 1K<n<10K --- # 🦙 Bid Announcement Dataset - Alpaca Format Welcome to the **Bid Announcement Dataset** page! This dataset contains **2,000** tender and bid announcements from China, preprocessed in **Alpaca format**, making it suitable for direct use in fine-tuning natural language processing (NLP) models. ## 📄 Dataset Overview This dataset consists of carefully selected **bid announcements** from various industries and procurement projects within China. The dataset is particularly useful for training NLP models to better understand the structure, linguistic characteristics, and specialized terminology commonly found in such documents. ## 🚀 Application Scenarios This dataset is particularly suited for the following applications: - **Model Fine-Tuning**: The dataset is ideal for fine-tuning large language models (LLMs), enhancing their ability to generate and comprehend professional texts in Chinese, specifically for tasks such as text generation, classification, and information extraction in specialized domains. - **Text Summarization**: Tender and bid announcements can be lengthy. This dataset enables training models to generate concise summaries of such documents. - **Information Extraction**: Fine-tuning models to automatically extract key information such as project names, budget amounts, and dates from bid announcements. - **Text Classification**: Classify announcements into different types of procurement projects or industry sectors to optimize tender information management. ## 🔧 How to Use 1. **Load the Dataset**: You can load the dataset directly using the `datasets` library from Hugging Face: ```python from datasets import load_dataset dataset = load_dataset("your-dataset-name") ``` 2. **Model Fine-Tuning**: Using the `transformers` library, you can fine-tune your model with this dataset. ```python from transformers import AutoModelForCausalLM, Trainer, TrainingArguments model = AutoModelForCausalLM.from_pretrained("your-pretrained-model") trainer = Trainer( model=model, train_dataset=dataset['train'], args=TrainingArguments( output_dir="./results", per_device_train_batch_size=4, num_train_epochs=3, logging_dir="./logs", ) ) trainer.train() ``` ## 📊 Dataset Statistics - **Total Entries**: 2,000 - **Industry Coverage**: Multiple industries and sectors - **Fields**: instruction, input, output ## 📜 Source This dataset is provided by **Tongda**, derived from publicly available bid announcement documents. ## 📄 License This dataset is released under the **[MIT License](https://opensource.org/licenses/MIT)**. You are free to use, modify, and distribute the dataset in accordance with the terms of the license. Please make sure to comply with any applicable privacy regulations when using this dataset. ## 💡 Notes - **Privacy and Compliance**: Ensure that you adhere to relevant privacy policies and data usage regulations when using this dataset. All information in this dataset is sourced from publicly available announcements. - **Language**: The dataset is in Chinese, so a model with strong Chinese language understanding and generation capabilities is recommended. ## 🤝 Contribution and Feedback If you have any questions or suggestions, feel free to submit an issue on [Hugging Face Community](https://huggingface.co/) or contact us through other channels. We appreciate your contributions and support! 😊 ---

提供机构：

Tongda

5,000+

优质数据集

54 个

任务类型

进入经典数据集