five

ARSynopsis/Combined_ROO_Liquidity_Dataset

收藏
Hugging Face2024-09-30 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/ARSynopsis/Combined_ROO_Liquidity_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: document dtype: string - name: summary dtype: string - name: source dtype: string - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 1143746144 num_examples: 83254 - name: validation num_bytes: 142815263 num_examples: 10405 - name: test num_bytes: 143020108 num_examples: 10405 download_size: 637677002 dataset_size: 1429581515 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* task_categories: - summarization language: - en tags: - finance size_categories: - 100K<n<1M --- # Dataset Card for Dataset Name <!-- Provide a quick summary of the dataset. --> This dataset is designed for text summarization tasks, specifically focusing on financial and liquidity data. It combines structured text from different segments of financial reports, allowing for both automatic and human evaluation in text summarization tasks. <!-- This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). --> ## Dataset Details This dataset was built using the dataset presented in the research paper "**Long Text and Multi-Table Summarization: Dataset and Method**". The dataset consists of financial documents with detailed reports and their corresponding summaries, which aim to condense lengthy documents into shorter, coherent summaries. Paper Reference: [Long Text and Multi-Table Summarization: Dataset and Method](https://arxiv.org/abs/2302.03815) ### Dataset Description **Dataset Structure** The dataset is divided into: - Train: The primary dataset for model training. - Validation: Used for validation during training. - Test: Used for final evaluation of the summarization models. Each entry consists of: - text: The full input document, which is around 2500 words in length. - summary: A condensed version of the document, around 350 words long.
提供机构:
ARSynopsis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作