Vedanshu

Name: Vedanshu
Creator: maas
Published: 2025-10-01 18:10:21
License: 暂无描述

魔搭社区2025-10-01 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/zhangjinyu1228/Vedanshu

下载链接

链接失效反馈

官方服务：

资源简介：

license: Apache License 2.0 #用户自定义标签 tags: - Alibaba - arxiv:1810.99999 - my free-style tag text: #二级只能属于一个task_categories fill_mask: #三级可以多选 languages: - en multilinguality: - monolingual audio: automatic_speech_recognition: languages: - en - fr sampling_rates: - 16000  - 64000 image: Image-to-Text: resolutions: - 640 x 480 - 1024 x 720 color_space: - rgb encoding: - jpeg video: Object-Detection: resolutions: - 640 x 480 - 1024 x 720 encoding: - mpeg multi-modal: Feature Extraction: resolutions: - 640 x 480 encoding: - H264 languages: - en multilinguality: - monolingual configs: # 配置数据集的子数据集和划分 - config_name: default data_files: - split: train path: "train_data.csv" - split: test path: "test_data.csv" ---   ## 数据集描述数据集整体描述。 ### 数据集简介提供对于数据集的介绍，支持的使用场景（包括支持的语言等）。 ### 数据集支持的任务该数据集支持的训练任务，以及相关benchmark结果。 ## 数据集的格式和结构 ### 数据格式对数据的格式进行描述，包括数据的schema，以及提供必要的数据样本示范。如果数据集内含多个子数据集的话，每个字数据集都应该提供相对应的数据格式描述。 ### 数据集加载方式通过代码范例等方式，提供数据集通过git/SDK进行加载和使用的详细说明。 ### 数据分片数据集可以被切分成`train/test/validation`的数据分片，以便于训练和测试模型。您可以通过编辑README.md中的configs标签，来配置自定义数据分片。您可以使用configs标签，对数据集的自定义分片进行描述。其中，config_name为分片的名称，即子数据集的名称；data_files为该子数据集的数据文件分片，包括split和path两个属性，分别表示数据集的划分和数据文件的路径。 ## 数据集生成的相关信息 ### 原始数据描述原始数据的来源以及数据的初步收集是如何进行的，是否经过归一化等处理流程。 ### 数据集标注该数据集是否包含标注，若有的话，相关信息描述。 #### 标注过程标注是通过什么方式实现的，流程如何。 #### 标注者标注者相关信息，尤其是当标着和原始数据提供者有所区别时。 ## 数据集版权信息数据集相关的版权信息，授权使用的场景和用户。是否开源，以及采用哪个开源协议等等。 ## 引用方式数据集是否有相关联的文章，以及如果在研究论文中要引用该数据集是否有推荐的引用格式等等。 ## 其他相关信息该数据集可能包含的个人和敏感信息，使用数据集需要考虑的相关背景；数据集可能包含的社会意义以及其中可能包含的bias信息和可能的局限性等等。

license: Apache License 2.0 # User-defined tags tags: - Alibaba - arxiv:1810.99999 - my free-style tag text: # Secondary level can only belong to one task_categories fill_mask: # Tertiary level supports multiple selections languages: - en multilinguality: - monolingual audio: automatic_speech_recognition: languages: - en - fr sampling_rates: - 16000  - 64000 image: Image-to-Text: resolutions: - 640 x 480 - 1024 x 720 color_space: - rgb encoding: - jpeg video: Object-Detection: resolutions: - 640 x 480 - 1024 x 720 encoding: - mpeg multi-modal: Feature Extraction: resolutions: - 640 x 480 encoding: - H264 languages: - en multilinguality: - monolingual configs: # Configure sub-datasets and splits of the dataset - config_name: default data_files: - split: train path: "train_data.csv" - split: test path: "test_data.csv" ---   ## Dataset Description Overall description of the dataset. ### Dataset Overview Provides an introduction to the dataset, including its supported usage scenarios (such as supported languages, etc.). ### Supported Tasks of the Dataset Training tasks supported by the dataset, as well as relevant benchmark results. ## Format and Structure of the Dataset ### Data Format Describes the format of the data, including the data schema and necessary sample data demonstrations. If the dataset contains multiple sub-datasets, corresponding data format descriptions should be provided for each sub-dataset. ### Dataset Loading Method Provides detailed instructions on loading and using the dataset via Git/SDK through code examples, etc. ### Data Splits The dataset can be split into `train/test/validation` data splits to facilitate model training and testing. You can configure custom data splits by editing the `configs` tag in the README.md. You can use the `configs` tag to describe custom splits, where `config_name` is the name of the split, i.e., the name of the sub-dataset; `data_files` is the data file split for this sub-dataset, including two attributes `split` and `path`, which represent the dataset split and the path of the data file respectively. ## Relevant Information for Dataset Generation ### Raw Data Describes the source of the raw data, how the initial data collection was conducted, and whether normalization and other processing procedures were performed. ### Dataset Annotation Whether the dataset contains annotations, and if so, relevant information descriptions. #### Annotation Process How the annotation was implemented and the workflow. #### Annotators Relevant information about the annotators, especially when the annotators differ from the original data provider. ## Copyright Information of the Dataset Copyright information related to the dataset, authorized usage scenarios and users, whether it is open source, and which open source license is adopted, etc. ## Citation Method Whether the dataset has associated articles, and if citing the dataset in a research paper, whether there is a recommended citation format, etc. ## Other Relevant Information Personal and sensitive information that may be contained in the dataset, relevant background to consider when using the dataset; social significance that the dataset may contain, as well as possible bias information and limitations, etc.

提供机构：

maas

创建时间：

2025-06-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集