VLAA-Thinking|视觉-语言模型数据集|推理数据集

魔搭社区2025-11-03 更新2025-04-26 收录

视觉-语言模型

推理

下载链接：

https://modelscope.cn/datasets/UCSC-VLAA/VLAA-Thinking

下载链接

链接失效反馈

资源简介：

# SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models 🌐 <a href="https://ucsc-vlaa.github.io/VLAA-Thinking/" target="_blank">Project Page</a> • 📄 <a href="https://huggingface.co/papers/2504.11468" target="_blank">Arxiv</a> • 💻 <a href="https://github.com/UCSC-VLAA/VLAA-Thinking" target="_blank">Code</a> 🤗 <a href="https://huggingface.co/collections/UCSC-VLAA/vlaa-thinker-67eda033419273423d77249e" target="_blank">VLAA-Thinker Family</a> • 🤔 <a href="https://huggingface.co/datasets/UCSC-VLAA/VLAA-Thinking" target="_blank">VLAA-Thinking Dataset</a>  🤗 <a href="https://huggingface.co/UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-3B" target="_blank">VLAA-Thinker-Qwen2.5-3B</a> • 🤗 <a href="https://huggingface.co/UCSC-VLAA/VLAA-Thinker-Qwen2.5VL-7B" target="_blank">VLAA-Thinker-Qwen2.5-7B</a>  Both **VLAA-Thinker-Qwen2.5-3B** and **VLAA-Thinker-Qwen2.5-7B** achieve **SOTA** performance on [OpenCompass Multimodal Reasoning Leaderboard](https://rank.opencompass.org.cn/leaderboard-multimodal-reasoning/?m=REALTIME) as of April 7th, 2025. <img src="assets/opencompass_4b_box.png" width = "640" alt="pipeline" align=center /> ----- <img src="assets/opencompass_7b_box.png" width = "640" alt="pipeline" align=center /> ## Contents - [Quick Start 🚀](#quick-start-🚀) - [Dataset Generation Pipeline 🏭](#vlaa-thinking-data-generation-🏭) - [Dataset Card 📚](#dataset-card-📚) - [Examples 📝](#examples-📝) - [GRPO with Mixed Reward 💡](#grpo-with-mixed-reward-💡) - [Contributors 🙌](#contributors-🙌) ## Quick Start 🚀 ### Inference Please check [here](https://github.com/UCSC-VLAA/VLAA-Thinking?tab=readme-ov-file#inference) for more details. ### Dataset Download Please check [here](https://github.com/UCSC-VLAA/VLAA-Thinking?tab=readme-ov-file#dataset-download) for scripts. The dataset should be organized as follows: ``` ├── train-grpo.json ├── train-sft.json └── images ├── allava_laion ├── arxivqa ├── chartqa ├── clevr_math ├── coco │ └── train2017 ├── docvqa ├── geoqa170k ├── synthesis ├── vg │ ├── VG_100K │ └── VG_100K_2 └── vizwiz ``` ## VLAA-Thinking Data Generation 🏭 <img src="assets/data_generation.png" width = "640" alt="pipeline" align=center /> <details> <summary>Step 1: Metadata Collection</summary> We gather metadata from 9 distinct vision-language datasets, each comprising either closed- or open-ended visual questions. Unique images are sampled and processed through our comprehensive pipeline from datasets including CLEVR-Math, Math PUMA, ArxivQA, DocVQA, VizWiz, and ALLaVA. We directly utilize COCO and VisualGenome data as provided by LLaVA-CoT. Note that GeoQA170K is included exclusively in the RL dataset due to captioning hallucination challenges. Dataset statistics are summarized in the provided table. </details> <details> <summary>Step 2: Visual Captioning and Additional Context</summary> Each datapoint begins with an image-question-answer triplet. To effectively connect visual content with textual reasoning, we generate detailed, structured captions for each image using GPT-4o. Additionally, we leverage dataset-specific annotations to enrich the visual understanding: CLEVR-Math includes scene synthesis instructions, Math PUMA provides descriptive textual math problems, and ALLaVA-LAION offers carefully verified GPT-4V captions (prompt <a href="assets/prompts/1.captioning.txt" target="_blank">here</a>). </details> <details> <summary>Step 3: Reasoning Answer Distillation</summary> Utilizing the text-only reasoning model DeepSeek-R1, we generate structured and logical reasoning steps alongside final answers. The model reasons based on image captions, visual questions, and supplementary dataset-specific information, outputting a step-by-step rationale enclosed by explicit <think> tags for clarity (prompt <a href="assets/prompts/2.r1cot.txt" target="_blank">here</a>). </details> <details> <summary>Step 4: Answer Refinement and Rewriting</summary> To ensure clarity, consistency, and modality independence, we refine the reasoning outputs through a rewriting module powered by GPT-3.5-turbo. This refinement removes unnecessary modality-specific phrases and standardizes the answers. Samples exhibiting large textual deviations post-rewriting (more than 15 words difference) are filtered out to preserve minimal textual alteration (prompt <a href="assets/prompts/3.rewrite.txt" target="_blank">here</a>). </details> <details> <summary>Step 5: Automated Verification</summary> We verify the rewritten answers against the original ground-truth answers through an automated validation module. Only samples with verified correct answers are retained in the final training set, ensuring accuracy and logical coherence (prompt <a href="assets/prompts/4.verify.txt" target="_blank">here</a>). </details> <details> <summary>Step 6: Curating Data Splits for SFT and RL</summary> We partition the dataset into two mutually exclusive subsets tailored specifically for supervised fine-tuning (SFT) and reinforcement learning (RL). Following insights from recent studies highlighting RL's strength in deeper reasoning challenges, we use explicit self-reflective cues (termed "aha moments") in reasoning as a proxy for difficulty. Simpler examples without these cues populate the SFT set, while more challenging examples containing reflective "aha moments" form the dedicated RL dataset. </details>     ## Dataset Card 📚 | Name | Data Type | # Original | # Pipeline | # Final SFT | **# Final RL** | | --- | --- | --- | --- | --- | --- | | <a href="https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/tree/main/CLEVR-Math(MathV360K)" target="_blank">CLEVR_Math</a> | Closed-end | 35,000 | 28,018 | 5,923 | **2,000** | | <a href="https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data/tree/main/geo170k(qa)" target="_blank">GeoQA170K</a> | Closed-end | - | - | - | **6,499** | | <a href="https://huggingface.co/datasets/Math-PUMA/Math-PUMA_Data_Stage2/tree/main/Synthesis" target="_blank">Math PUMA</a> | Closed-end | 30,000 | 26,672 | 19,258 | **6,696** | | <a href="https://huggingface.co/datasets/MMInstruction/ArxivQA?row=0" target="_blank">ArxivQA</a> | Closed-end | 54,399 | 51,348 | 34,604 | **1,000** | | <a href="https://www.docvqa.org/datasets" target="_blank">DocVQA</a> | Closed-end | 10,194 | 8,206 | 4,897 | **1,000** | | <a href="https://vizwiz.org/tasks-and-datasets/vqa/" target="_blank">VizWiz</a> | Closed-end | 20,523 | 6,528 | 4,266 | **1,000** | | <a href="https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V/tree/main/allava_laion" target="_blank">ALLaVA-LAION</a> | Open-end | 47,066 | 18,123 | 10,496 | **3,000** | | <a href="https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k" target="_blank">LLaVA-CoT-COCO</a> | Closed-end | 3,000 | 3,000 | 8,727 | **2,000** | | <a href="https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k" target="_blank">LLaVA-CoT-VisualGenome</a> | Closed-end | 3,000 | 3,000 | 38,242 | **2,000** | | Total | Closed- & Open-end | 203,182 | 144,895 | 126,413 | **25,195** | **Data statistics of VLAA-Thinking**. We present the original volume of metadata (#Original), the data size after the distillation pipeline (#Pipeline), the size of sampled examples for SFT (#Final SFT) and RL (#Final RL), respectively. Note that we only use GeoQA170K with verifiable answers for the RL split. ## Examples 📝 <details><summary>CLEVR-Math</summary> <img src="assets/examples/clevr_math-CLEVR_train_027977.png" align=center /> ```plaintext { "id": "clevr_math-CLEVR_train_027977.png", "image": "clevr_math/CLEVR_train_027977.png", "question": "Subtract all balls. How many objects are left?", "gt": 3, "ds_answer": "<think> Okay, let's see. The user is asking to subtract all balls and determine how many objects are left. First, I need to figure out which objects are considered \"balls\" in the image. The image shows a large blue metal sphere. A sphere is a three-dimensional circle, which is essentially a ball shape. So the blue metal sphere is definitely a ball. Next, the surrounding objects are two small red metal cylinders and a single small brown metal cylinder in the foreground. Cylinders are not balls; they have a different shape—circular bases and straight sides. So those would not be subtracted. The question says to subtract all balls, so only the sphere (ball) needs to be removed. Originally, there's the sphere (1 object), two red cylinders (2 objects), and one brown cylinder (1 object). Total objects before subtraction: 1 + 2 + 1 = 4. Subtracting the one ball leaves 3 objects: the two red cylinders and the brown cylinder. Wait, but I should make sure there are no other balls mentioned. The image only refers to the sphere as a ball-like object. The cylinders are clearly not balls. So subtracting the sphere leaves three cylinders. The answer should be 3. </think> In the scene, there is one large blue sphere (a ball-shaped object) centrally positioned. The remaining objects consist of two small red metal cylinders flanking it symmetrically and a single brown cylinder in the foreground. Removing the sphere leaves only the three cylindrical objects visible. **Answer:** There are 3 objects remaining. ", "caption": "The image depicts an indoor setting with a smooth, neutral-colored surface background. Positioned centrally is a large blue metal sphere with a glossy finish, reflecting slight highlights. Surrounding it are two small red metal cylinders placed symmetrically on either side, also showing a metallic sheen. In the foreground, a single small brown metal cylinder is situated, completing the arrangement. The metals' shiny surfaces create a play of subtle reflections and shadows on the even surface beneath.", "meta": { "caption_model_id": "gpt-4o-0806-nofilter-global", "rationale": "There are a large blue metal sphere, a small red metal cylinder, a small red metal cylinder, a small brown metal cylinder in the image." } } ``` </details> <details><summary>GeoQA</summary> <img src="assets/examples/geoqa170k-geoqa_plus-2905.png" align=center /> ```plaintext { "id": "geoqa170k-geoqa_plus-2905-63590", "image": "geoqa170k/geoqa_plus-2905.png", "question": "Given a semicircular figure with the diameter represented by AB and intersecting chords AD and BC at point P, where angle DPB measures 60.0 degrees, calculate the value of tan(angle ADC) under the condition that D denotes the midpoint of the arc BC. Choices: A: \frac{1}{2} B: 2 C: \frac{√{3}}{3} D: √{3}", "gt": "C", "ds_answer": "<think> Okay, so I'm trying to solve this geometry problem. Let me see if I can visualize it correctly. There's a semicircle with diameter AB. Points C and D are on the curved edge, and chords AC and BD intersect at point P. Below AB, there's the center O. The angle DPB is 60 degrees, and D is the midpoint of arc BC. I need to find the value of tan(angle ADC). First, I should probably draw this out to get a better sense. I imagine a semicircle with AB as the diameter. O is the center, so AO = OB. Points C and D are on the semicircle, with D being the midpoint of arc BC. That means arc BD is equal to arc DC. Since it's a semicircle, the total arc from B to C is 180 degrees, so each arc BD and DC would be 90 degrees each. Wait, no, because D is the midpoint of arc BC, which is part of the semicircle. So if BC is some arc, D is halfway along it. But I'm not sure about the exact measure yet. Chords AC and BD intersect at P, and angle DPB is 60 degrees. I remember that when two chords intersect, the products of the segments are equal. So AP * PC = BP * PD. But I'm not sure if that's directly useful here. Since D is the midpoint of arc BC, it might have some special properties. In a semicircle, the midpoint of an arc would be equidistant from B and C. Also, angles subtended by the same arc are equal. Maybe I can use some properties of cyclic quadrilaterals here, but since it's a semicircle, all points lie on the circumference, so ABCD is a cyclic quadrilateral. Wait, but ABCD is a quadrilateral inscribed in a semicircle, which makes it a cyclic quadrilateral. In such cases, opposite angles sum to 180 degrees. But I'm not sure if that's applicable here because we're dealing with a semicircle, not a full circle. Let me think about the angles. Angle DPB is 60 degrees. Since P is the intersection of chords AC and BD, maybe I can use the property that the angle between two chords is equal to half the sum of the measures of the intercepted arcs. So angle DPB would be half the sum of arcs DB and AC. But I'm not sure about the exact arcs involved. Alternatively, maybe I can use triangle properties. Since D is the midpoint of arc BC, triangle DBC might be isosceles. Wait, no, because D is on the semicircle, so triangle DBC is actually a right triangle because any triangle inscribed in a semicircle is a right triangle. So angle BDC is 90 degrees. Wait, that's an important point. Since D is on the semicircle, triangle BDC is right-angled at D. So angle BDC is 90 degrees. That might help in finding angle ADC. I need to find tan(angle ADC). Let's denote angle ADC as θ. So tan(θ) = opposite/adjacent. To find this, I might need to find the lengths of the sides of triangle ADC or use some trigonometric identities. Let me consider the coordinates approach. Let's place the semicircle on a coordinate system with center O at (0,0), and AB as the diameter along the x-axis from (-r,0) to (r,0), where r is the radius. Then point D, being the midpoint of arc BC, would be somewhere on the semicircle. But without knowing the exact position of C, this might be tricky. Alternatively, maybe I can use power of a point. Since P is the intersection of chords AC and BD, the power of point P with respect to the circle is equal for both chords. So PA * PC = PB * PD. But without specific lengths, I'm not sure how to proceed. Wait, maybe I can assign some variables. Let's let the radius of the semicircle be r. Then AB = 2r. Let me assume r = 1 for simplicity, so AB = 2. Then O is at (0,0), A is at (-1,0), B is at (1,0). Let me place point C somewhere on the semicircle, say at an angle θ from the x-axis. Then point D, being the midpoint of arc BC, would be at an angle (θ + 180)/2 from the x-axis, but I'm not sure. Alternatively, since D is the midpoint of arc BC, the arc BD equals arc DC. So if I consider the central angles, angle BOD equals angle DOC. Since the total arc BC is some measure, say 2α, then each arc BD and DC would be α. But I'm not sure about the exact measure. Wait, maybe I can use the fact that angle DPB is 60 degrees. Using the intersecting chords theorem, angle DPB is equal to half the sum of the measures of the intercepted arcs. So angle DPB = 1/2 (arc DB + arc AC) = 60 degrees. Therefore, arc DB + arc AC = 120 degrees. But I'm not sure about the measure of arc AC. Alternatively, maybe I can express arc AC in terms of other arcs. Since the total semicircle is 180 degrees, arc AB is 180 degrees. Arc AC is part of that, so arc AC = 180 - arc CB. But I'm not sure. Alternatively, since D is the midpoint of arc BC, arc BD = arc DC. Let's denote arc BD = arc DC = x. Then arc BC = 2x. Since the total semicircle is 180 degrees, arc AB = 180 degrees, so arc AC = 180 - arc BC = 180 - 2x. Now, from the intersecting chords theorem, angle DPB = 1/2 (arc DB + arc AC) = 60 degrees. So 1/2 (x + (180 - 2x)) = 60. Simplifying, 1/2 (180 - x) = 60. Multiply both sides by 2: 180 - x = 120. Therefore, x = 60 degrees. So arc BD = arc DC = 60 degrees, and arc BC = 120 degrees. Then arc AC = 180 - 120 = 60 degrees. Now, I need to find tan(angle ADC). Let's consider triangle ADC. Since D is on the semicircle, angle ADC is an inscribed angle. The measure of angle ADC is half the measure of arc AC. Since arc AC is 60 degrees, angle ADC = 30 degrees. Therefore, tan(angle ADC) = tan(30 degrees) = 1/√3 = √3/3. Wait, but let me double-check. If angle ADC is 30 degrees, then tan(30) is indeed √3/3. So the answer should be option C: √3/3. </think> The value of tan(angle ADC) is \(\frac{\sqrt{3}}{3}\). **Step-by-Step Explanation:** 1. **Understanding the Problem:** - We have a semicircle with diameter AB and center O. - Points C and D are on the semicircle, with D being the midpoint of arc BC. - Chords AC and BD intersect at point P, forming an angle DPB of 60 degrees. - We need to find tan(angle ADC). 2. **Key Properties and Theorems:** - **Intersecting Chords Theorem:** The measure of an angle formed by two intersecting chords is half the sum of the measures of the intercepted arcs. - **Midpoint of Arc:** Since D is the midpoint of arc BC, arcs BD and DC are equal. - **Inscribed Angle Theorem:** An angle inscribed in a circle is half the measure of its intercepted arc. 3. **Applying the Intersecting Chords Theorem:** - Angle DPB is formed by chords BD and AC, so: \[ \angle DPB = \frac{1}{2} (\text{arc } DB + \text{arc } AC) \] - Given \(\angle DPB = 60^\circ\), we have: \[ 60^\circ = \frac{1}{2} (\text{arc } DB + \text{arc } AC) \] \[ \text{arc } DB + \text{arc } AC = 120^\circ \] 4. **Determining Arc Measures:** - Since D is the midpoint of arc BC, arcs BD and DC are equal. Let each be \(x\). - Thus, arc BC = \(2x\). - The total semicircle is \(180^\circ\), so: \[ \text{arc } AC = 180^\circ - \text{arc } BC = 180^\circ - 2x \] - Substituting into the earlier equation: \[ x + (180^\circ - 2x) = 120^\circ \] \[ 180^\circ - x = 120^\circ \] \[ x = 60^\circ \] - Therefore, arcs BD and DC are each \(60^\circ\), and arc AC is \(60^\circ\). 5. **Finding Angle ADC:** - Angle ADC is an inscribed angle intercepting arc AC. - By the Inscribed Angle Theorem: \[ \angle ADC = \frac{1}{2} \times \text{arc } AC = \frac{1}{2} \times 60^\circ = 30^\circ \] 6. **Calculating the Tangent:** - \(\tan(30^\circ) = \frac{\sqrt{3}}{3}\). **Answer:** \(\boxed{\frac{\sqrt{3}}{3}}\) (Option C)", "caption": "The image depicts a geometric figure set against a plain background. The main components of this figure include a semicircle with its diameter labeled \"AB.\" Inside the semicircle, two additional points \"C\" and \"D\" are marked on the curved edge, creating a chord, with lines \"AC\" and \"BD\" intersecting at point \"P.\" Below the diameter \"AB,\" there is a point \"O\" which appears to be the center of the semicircle. The semicircle is drawn with bold, continuous lines. No colors or textures are visible, and there are no textual annotations or numerical labels other than the letters.", "meta": { "caption_model_id": "gpt-4o-0806-nofilter-global", "rationale": "" } } ``` </details> <details><summary>Synthesis</summary> <img src="assets/examples/synthesis-gen_func-func_sine_22135407_qtmb.png" align=center /> ```plaintext { "id": "synthesis-2", "image": "synthesis/gen_func-func_sine_22135407_qtmb.png", "question": "What is the function's equation?", "gt": "Step 1: Substitute the point (pi/10, 5) into the function: 5 = a * sin(-5 * (pi/10)). Step 2: Simplify the equation: 5 = a * sin(-pi/2). Step 3: Since sin(-pi/2) = -1, we have: 5 = a * -1. Step 4: Solve for a: a = -5. Step 5: Substitute the value of a back into the original function: y = -5 * sin(-5 * x). †Answer: y = -5 * sin(-5 * x).", "ds_answer": "<think> First, I recognize that the function given is \( y = a \cdot \sin(-5x) \), and it passes through the point \( \left( \frac{\pi}{10}, 5 \right) \). To find the value of \( a \), I'll substitute the coordinates of the point into the equation. Substituting \( x = \frac{\pi}{10} \) and \( y = 5 \), the equation becomes: \[ 5 = a \cdot \sin\left(-5 \cdot \frac{\pi}{10}\right) \] Simplifying the argument of the sine function: \[ -5 \cdot \frac{\pi}{10} = -\frac{\pi}{2} \] So, the equation now is: \[ 5 = a \cdot \sin\left(-\frac{\pi}{2}\right) \] I know that \( \sin\left(-\frac{\pi}{2}\right) = -1 \), so substituting this value in: \[ 5 = a \cdot (-1) \] Solving for \( a \): \[ a = -5 \] Finally, substituting \( a = -5 \) back into the original function gives: \[ y = -5 \cdot \sin(-5x) \] </think> To determine the equation of the function \( y = a \cdot \sin(-5x) \) that passes through the point \( \left( \frac{\pi}{10}, 5 \right) \), follow these steps: 1. **Substitute the Point into the Equation:** Plug \( x = \frac{\pi}{10} \) and \( y = 5 \) into the equation: \[ 5 = a \cdot \sin\left(-5 \cdot \frac{\pi}{10}\right) \] 2. **Simplify the Argument of the Sine Function:** \[ -5 \cdot \frac{\pi}{10} = -\frac{\pi}{2} \] So, the equation becomes: \[ 5 = a \cdot \sin\left(-\frac{\pi}{2}\right) \] 3. **Evaluate the Sine Function:** \[ \sin\left(-\frac{\pi}{2}\right) = -1 \] Substituting this back into the equation: \[ 5 = a \cdot (-1) \] 4. **Solve for \( a \):** \[ a = -5 \] 5. **Write the Final Equation:** Substitute \( a = -5 \) back into the original function: \[ y = -5 \cdot \sin(-5x) \] 6. **Simplify the Negative Sign (Optional):** Since \( \sin(-\theta) = -\sin(\theta) \), the equation can also be written as: \[ y = 5 \cdot \sin(5x) \] **Final Answer:** \[ \boxed{y = -5 \sin(-5x)} \]", "caption": "", "meta": { "text_question": "The function y = a * sin(-5 * x) goes through the points (pi/10,5). What is the function's equation?" } } ``` </details> <details><summary>ArxivQA</summary> <img src="assets/examples/arxivqa-1907.10497_1.jpg" align=center /> ```plaintext { "id": "arxivqa-cond-mat-23815", "image": "arxivqa/1907.10497_1.jpg", "question": "In figure (b), how does the repetition rate affect the ratio of 9th to 11th harmonic intensity for the 1030 nm wavelength? Choose from the following options A) The ratio increases as the repetition rate increases. B) The ratio decreases as the repetition rate increases. C) The ratio remains constant as the repetition rate increases. D) The ratio fluctuates unpredictably with changes in repetition rate.", "gt": "B", "ds_answer": "<think> Okay, so I'm looking at figure (b) here, and I need to figure out how the repetition rate affects the ratio of 9th to 11th harmonic intensity for the 1030 nm wavelength. Let me start by understanding what's in the image. First, the image is

提供机构：

maas

创建时间：

2025-04-21

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4099个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

MeSH

MeSH（医学主题词表）是一个用于索引和检索生物医学文献的标准化词汇表。它包含了大量的医学术语和概念，用于描述医学文献中的主题和内容。MeSH数据集包括主题词、副主题词、树状结构、历史记录等信息，广泛应用于医学文献的分类和检索。

www.nlm.nih.gov 收录

学生课堂行为数据集 (SCB-dataset3)

学生课堂行为数据集(SCB-dataset3)由成都东软学院创建，包含5686张图像和45578个标签，重点关注六种行为：举手、阅读、写作、使用手机、低头和趴桌。数据集覆盖从幼儿园到大学的不同场景，通过YOLOv5、YOLOv7和YOLOv8算法评估，平均精度达到80.3%。该数据集旨在为学生行为检测研究提供坚实基础，解决教育领域中学生行为数据集的缺乏问题。

arXiv 收录

中国1km分辨率逐月降水量数据集（1901-2024）

该数据集为中国逐月降水量数据，空间分辨率为0.0083333°（约1km），时间为1901.1-2024.12。数据格式为NETCDF，即.nc格式。该数据集是根据CRU发布的全球0.5°气候数据集以及WorldClim发布的全球高分辨率气候数据集，通过Delta空间降尺度方案在中国降尺度生成的。并且，使用496个独立气象观测点数据进行验证，验证结果可信。本数据集包含的地理空间范围是全国主要陆地（包含港澳台地区），不含南海岛礁等区域。为了便于存储，数据均为int16型存于nc文件中，降水单位为0.1mm。 nc数据可使用ArcMAP软件打开制图; 并可用Matlab软件进行提取处理，Matlab发布了读入与存储nc文件的函数，读取函数为ncread，切换到nc文件存储文件夹，语句表达为：ncread (‘XXX.nc’,‘var’, [i j t],[leni lenj lent])，其中XXX.nc为文件名，为字符串需要’’；var是从XXX.nc中读取的变量名，为字符串需要’’；i、j、t分别为读取数据的起始行、列、时间，leni、lenj、lent i分别为在行、列、时间维度上读取的长度。这样，研究区内任何地区、任何时间段均可用此函数读取。Matlab的help里面有很多关于nc数据的命令，可查看。数据坐标系统建议使用WGS84。

国家青藏高原科学数据中心收录

BDD100K

数据集推动了视觉的进步，但现有的驾驶数据集在视觉内容和支持任务方面缺乏研究，以研究自动驾驶的多任务学习。研究人员通常只能在一个数据集上研究一小组问题，而现实世界的计算机视觉应用程序需要执行各种复杂的任务。我们构建了最大的驾驶视频数据集 BDD100K，包含 10 万个视频和 10 个任务，以评估图像识别算法在自动驾驶方面的令人兴奋的进展。该数据集具有地理、环境和天气的多样性，这对于训练不太可能对新条件感到惊讶的模型很有用。基于这个多样化的数据集，我们为异构多任务学习建立了一个基准，并研究了如何一起解决这些任务。我们的实验表明，现有模型需要特殊的训练策略来执行此类异构任务。 BDD100K 为未来在这个重要场所的学习打开了大门。更多详细信息请参见数据集主页。

OpenDataLab 收录

PDT Dataset

PDT数据集是由山东计算机科学中心（国家超级计算济南中心）和齐鲁工业大学（山东省科学院）联合开发的无人机目标检测数据集，专门用于检测树木病虫害。该数据集包含高分辨率和低分辨率两种版本，共计5775张图像，涵盖了健康和受病虫害影响的松树图像。数据集的创建过程包括实地采集、数据预处理和人工标注，旨在为无人机在农业中的精准喷洒提供高精度的目标检测支持。PDT数据集的应用领域主要集中在农业无人机技术，旨在提高无人机在植物保护中的目标识别精度，解决传统检测模型在实际应用中的不足。

arXiv 收录