Vedanshu
收藏魔搭社区2025-10-01 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/zhangjinyu1228/Vedanshu
下载链接
链接失效反馈官方服务:
资源简介:
license: Apache License 2.0
#用户自定义标签
tags:
- Alibaba
- arxiv:1810.99999
- my free-style tag
text:
#二级只能属于一个task_categories
fill_mask:
#三级可以多选
languages:
- en
multilinguality:
- monolingual
audio:
automatic_speech_recognition:
languages:
- en
- fr
sampling_rates:
- 16000 <!--- integer --->
- 64000
image:
Image-to-Text:
resolutions:
- 640 x 480
- 1024 x 720
color_space:
- rgb
encoding:
- jpeg
video:
Object-Detection:
resolutions:
- 640 x 480
- 1024 x 720
encoding:
- mpeg
multi-modal:
Feature Extraction:
resolutions:
- 640 x 480
encoding:
- H264
languages:
- en
multilinguality:
- monolingual
configs: # 配置数据集的子数据集和划分
- config_name: default
data_files:
- split: train
path: "train_data.csv"
- split: test
path: "test_data.csv"
---
<!--- 以上YAML section提供属性/tags描述--->
<!--- 以下为markdown格式的dataset描述--->
## 数据集描述
数据集整体描述。
### 数据集简介
提供对于数据集的介绍,支持的使用场景(包括支持的语言等)。
### 数据集支持的任务
该数据集支持的训练任务,以及相关benchmark结果。
## 数据集的格式和结构
### 数据格式
对数据的格式进行描述,包括数据的schema,以及提供必要的数据样本示范。
如果数据集内含多个子数据集的话,每个字数据集都应该提供相对应的数据格式描述。
### 数据集加载方式
通过代码范例等方式,提供数据集通过git/SDK进行加载和使用的详细说明。
### 数据分片
数据集可以被切分成`train/test/validation`的数据分片,以便于训练和测试模型。您可以通过编辑README.md中的configs标签,来配置自定义数据分片。
您可以使用configs标签,对数据集的自定义分片进行描述。其中,config_name为分片的名称,即子数据集的名称;data_files为该子数据集的数据文件分片,包括split和path两个属性,
分别表示数据集的划分和数据文件的路径。
## 数据集生成的相关信息
### 原始数据
描述原始数据的来源以及数据的初步收集是如何进行的,是否经过归一化等处理流程。
### 数据集标注
该数据集是否包含标注,若有的话,相关信息描述。
#### 标注过程
标注是通过什么方式实现的,流程如何。
#### 标注者
标注者相关信息,尤其是当标着和原始数据提供者有所区别时。
## 数据集版权信息
数据集相关的版权信息,授权使用的场景和用户。是否开源,以及采用哪个开源协议等等。
## 引用方式
数据集是否有相关联的文章,以及如果在研究论文中要引用该数据集是否有推荐的引用格式等等。
## 其他相关信息
该数据集可能包含的个人和敏感信息,使用数据集需要考虑的相关背景;
数据集可能包含的社会意义以及其中可能包含的bias信息和可能的局限性等等。
license: Apache License 2.0
# User-defined tags
tags:
- Alibaba
- arxiv:1810.99999
- my free-style tag
text:
# Secondary level can only belong to one task_categories
fill_mask:
# Tertiary level supports multiple selections
languages:
- en
multilinguality:
- monolingual
audio:
automatic_speech_recognition:
languages:
- en
- fr
sampling_rates:
- 16000 <!--- integer --->
- 64000
image:
Image-to-Text:
resolutions:
- 640 x 480
- 1024 x 720
color_space:
- rgb
encoding:
- jpeg
video:
Object-Detection:
resolutions:
- 640 x 480
- 1024 x 720
encoding:
- mpeg
multi-modal:
Feature Extraction:
resolutions:
- 640 x 480
encoding:
- H264
languages:
- en
multilinguality:
- monolingual
configs: # Configure sub-datasets and splits of the dataset
- config_name: default
data_files:
- split: train
path: "train_data.csv"
- split: test
path: "test_data.csv"
---
<!--- The above YAML section provides attribute/tags description --->
<!--- The following is the dataset description in markdown format --->
## Dataset Description
Overall description of the dataset.
### Dataset Overview
Provides an introduction to the dataset, including its supported usage scenarios (such as supported languages, etc.).
### Supported Tasks of the Dataset
Training tasks supported by the dataset, as well as relevant benchmark results.
## Format and Structure of the Dataset
### Data Format
Describes the format of the data, including the data schema and necessary sample data demonstrations. If the dataset contains multiple sub-datasets, corresponding data format descriptions should be provided for each sub-dataset.
### Dataset Loading Method
Provides detailed instructions on loading and using the dataset via Git/SDK through code examples, etc.
### Data Splits
The dataset can be split into `train/test/validation` data splits to facilitate model training and testing. You can configure custom data splits by editing the `configs` tag in the README.md. You can use the `configs` tag to describe custom splits, where `config_name` is the name of the split, i.e., the name of the sub-dataset; `data_files` is the data file split for this sub-dataset, including two attributes `split` and `path`, which represent the dataset split and the path of the data file respectively.
## Relevant Information for Dataset Generation
### Raw Data
Describes the source of the raw data, how the initial data collection was conducted, and whether normalization and other processing procedures were performed.
### Dataset Annotation
Whether the dataset contains annotations, and if so, relevant information descriptions.
#### Annotation Process
How the annotation was implemented and the workflow.
#### Annotators
Relevant information about the annotators, especially when the annotators differ from the original data provider.
## Copyright Information of the Dataset
Copyright information related to the dataset, authorized usage scenarios and users, whether it is open source, and which open source license is adopted, etc.
## Citation Method
Whether the dataset has associated articles, and if citing the dataset in a research paper, whether there is a recommended citation format, etc.
## Other Relevant Information
Personal and sensitive information that may be contained in the dataset, relevant background to consider when using the dataset; social significance that the dataset may contain, as well as possible bias information and limitations, etc.
提供机构:
maas
创建时间:
2025-06-29



